item functioning results: Topics by Science.gov

Sample records for item functioning results

Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning

PubMed Central

Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert

2016-01-01

Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
Item Information and Discrimination Functions for Trinary PCM Items.

ERIC Educational Resources Information Center

Akkermans, Wies; Muraki, Eiji

1997-01-01

For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)
The Dutch-Flemish PROMIS Physical Function item bank exhibited strong psychometric properties in patients with chronic pain.

PubMed

Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D

2017-07-01

The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

ERIC Educational Resources Information Center

Kaskowitz, Gary S.; De Ayala, R. J.

2001-01-01

Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
Rasch analysis of the Italian Lower Extremity Functional Scale: insights on dimensionality and suggestions for an improved 15-item version.

PubMed

Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano

2017-04-01

To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
Differential item functioning in the Cambridge Mental Disorders in the Elderly (CAMDEX) Depression Scale across middle age and late life.

PubMed

Estabrook, Ryne; Sadler, Michael E; McGue, Matt

2015-12-01

A long-standing and critical problem in the study of aging and depression is the comparability of measurement across age groups. While psychological measures of depression typically show increased incidence of symptoms with increasing age, rates of depression diagnosis do not show the same age trend. This analysis presents tests of differential item functioning on the depression section of the CAMDEX interview schedule, using factor analysis-derived affective and somatic subscales (McGue & Christensen, 1997). Results for the affective subscale show significant differences in item functioning in the majority of the affective items as a function of age (items "Happy Life," "Lonely," "Nervous" "Worthless," and "Future": χ6(2) = [30.193, 255.971] across items, all p < .0001). Analyses for the somatic subscale show differential item functioning is limited to a single item relating to coping (χ6(2) = 180.754, p < .0001). These results indicate that differences in depression symptoms across age groups are not entirely consistent with a unidimensional depression trait, and that the measurement structure of depression varies over the life span. (c) 2015 APA, all rights reserved).
Better assessment of physical function: item improvement is neglected but essential

PubMed Central

2009-01-01

Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Individuals with knee impairments identify items in need of clarification in the Patient Reported Outcomes Measurement Information System (PROMIS®) pain interference and physical function item banks - a qualitative study.

PubMed

Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J

2016-05-11

The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
Improving measurement of injection drug risk behavior using item response theory.

PubMed

Janulis, Patrick

2014-03-01

Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Development and assessment of floor and ceiling items for the PROMIS physical function item bank

PubMed Central

2013-01-01

Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166
Development of autonomous grasping and navigating robot

NASA Astrophysics Data System (ADS)

Kudoh, Hiroyuki; Fujimoto, Keisuke; Nakayama, Yasuichi

2015-01-01

The ability to find and grasp target items in an unknown environment is important for working robots. We developed an autonomous navigating and grasping robot. The operations are locating a requested item, moving to where the item is placed, finding the item on a shelf or table, and picking the item up from the shelf or the table. To achieve these operations, we designed the robot with three functions: an autonomous navigating function that generates a map and a route in an unknown environment, an item position recognizing function, and a grasping function. We tested this robot in an unknown environment. It achieved a series of operations: moving to a destination, recognizing the positions of items on a shelf, picking up an item, placing it on a cart with its hand, and returning to the starting location. The results of this experiment show the applicability of reducing the workforce with robots.
Female Sexual Function Index Short Version: A MsFLASH Item Response Analysis.

PubMed

Carpenter, Janet S; Jones, Salene M W; Studts, Christina R; Heiman, Julia R; Reed, Susan D; Newton, Katherine M; Guthrie, Katherine A; Larson, Joseph C; Cohen, Lee S; Freeman, Ellen W; Jane Lau, R; Learman, Lee A; Shifren, Jan L

2016-11-01

The Female Sexual Function Index (FSFI) is a psychometrically sound and popular 19-item self-report measure, but its length may preclude its use in studies with multiple outcome measures, especially when sexual function is not a primary endpoint. Only one attempt has been made to create a shorter scale, resulting in the Italian FSFI-6, later translated into Spanish and Korean without further psychometric analysis. Our study evaluated whether a subset of items on the 19-item English-language FSFI would perform as well as the full-length FSFI in peri- and postmenopausal women. We used baseline data from 898 peri- and postmenopausal women recruited from multiple communities, ages 42-62 years, and enrolled in randomized controlled trials for vasomotor symptom management. Goals were to (1) create a psychometrically sound, shorter version of the FSFI for use in peri- and postmenopausal women as a continuous measure and (2) compare it to the Italian FSFI-6. Results indicated that a 9-item scale provided more information than the FSFI-6 across a spectrum of sexual functioning, was able to capture sample variability, and showed sufficient range without floor or ceiling effects. All but one of the items from the Italian 6-item version were included in the 9-item version. Most omitted FSFI items focused on frequency of events or experiences. When assessment of sexual function is a secondary endpoint and subject burden related to questionnaire length is a priority, the 9-item FSFI may provide important information about sexual function in English-speaking peri- and postmenopausal women.
Assessing Unidimensionality and Differential Item Functioning in Qualifying Examination for Senior Secondary School Students, Osun State, Nigeria

ERIC Educational Resources Information Center

Ajeigbe, Taiwo Oluwafemi; Afolabi, Eyitayo Rufus Ifedayo

2017-01-01

This study assessed unidimensionality and occurrence of Differential Item Functioning (DIF) in Mathematics and English Language items of Osun State Qualifying Examination. The study made use of secondary data. The results showed that OSQ Mathematics (-0.094 = r = 0.236) and English Language items (-0.095 = r = 0.228) were unidimensional. Also,…
Improving Measures of Work-Related Physical Functioning

PubMed Central

McDonough, Christine M.; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E.; Marino, Molly E.; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E.; Jette, Alan M; Chan, Leighton

2016-01-01

Purpose To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Methods Newly developed questions were administered to 3,532 recent SSA applicants for work disability benefits and 2,025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to existing WD-FAB, and computer-adaptive test simulations were conducted. Results Factor and IRT analyses supported integration of 44 new items into 3 existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. Conclusions The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants, identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability. PMID:28005243
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

PubMed

Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

2018-02-02

In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

PubMed Central

Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

2011-01-01

Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure

PubMed Central

McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.

2013-01-01

Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Development and initial evaluation of the SCI-FI/AT

PubMed Central

Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-01-01

Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.

PubMed

Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-05-01

To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis.

PubMed

Tarrant, Marie; Ware, James; Mohammed, Ahmed M

2009-07-07

Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.

Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

ERIC Educational Resources Information Center

Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.

2003-01-01

Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Iterative Purification and Effect Size Use with Logistic Regression for Differential Item Functioning Detection

ERIC Educational Resources Information Center

French, Brian F.; Maller, Susan J.

2007-01-01

Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

PubMed

Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

2015-08-01

The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.

PubMed

Gibbons, C J; Skevington, S M

2018-04-01

Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Vegetable parenting practices scale. Item response modeling analyses

PubMed Central

Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

2015-01-01

Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
[Item function analysis on the Quality of Life-Alzheimer's Disease(QOL-AD)Chinese version, based on the Item Response Theory(IRT)].

PubMed

Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei

2013-07-01

To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
Better assessment of physical function: item improvement is neglected but essential.

PubMed

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
A Rasch-Validated Version of the Upper Extremity Functional Index for Interval-Level Measurement of Upper Extremity Function

PubMed Central

Chesworth, Bert M.

2013-01-01

Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity. PMID:23813086
More relevant, precise, and efficient items for assessment of physical function and disability: moving beyond the classic instruments

PubMed Central

Fries, J F; Bruce, B; Bjorner, J; Rose, M

2006-01-01

Objectives Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of “physical function/disability” items and thus the performance of resulting instruments. Methods The present study included1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. Results A “present tense” time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the “past tense” had 80–90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of “disability” suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81–0.99. Conclusions Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials. PMID:17038464
Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

PubMed

Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

2017-11-01

The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size

ERIC Educational Resources Information Center

Garrett, Phyllis

2009-01-01

The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur…
Equal Area Logistic Estimation for Item Response Theory

NASA Astrophysics Data System (ADS)

Lo, Shih-Ching; Wang, Kuo-Chang; Chang, Hsin-Li

2009-08-01

Item response theory (IRT) models use logistic functions exclusively as item response functions (IRFs). Applications of IRT models require obtaining the set of values for logistic function parameters that best fit an empirical data set. However, success in obtaining such set of values does not guarantee that the constructs they represent actually exist, for the adequacy of a model is not sustained by the possibility of estimating parameters. In this study, an equal area based two-parameter logistic model estimation algorithm is proposed. Two theorems are given to prove that the results of the algorithm are equivalent to the results of fitting data by logistic model. Numerical results are presented to show the stability and accuracy of the algorithm.
Readability and Comprehension of the Geriatric Depression Scale and PROMIS® Physical Function Items in Older African Americans and Latinos

PubMed Central

Paz, Sylvia H.; Jones, Loretta; Calderón, José L.; Hays, Ron D.

2016-01-01

Background Depression and physical function are especially important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Function Item Bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. Objective To estimate the readability of the GDS and PROMIS® Physical Function items and to assess their comprehensibility by a sample of African American and Latino elderly. Methods Readability was estimated using the Flesch-Kincaid (F-K) and Flesch-Reading-Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS items by minority elderly was evaluated with 30 cognitive interviews. Results Readability estimates of a number of items in English and Spanish of the GDS and PROMIS physical functioning items exceed the recommended 5th grade level, or were rated as fairly difficult, difficult, or very difficult to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS items was considered confusing and responses potentially uninterpretable because they were based on physical aids. Conclusions Problems with item wording and response options of the GDS and PROMIS Physical Function items may negatively affect reliability and validity of measurement when used with minority elderly. PMID:27599978
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

PubMed

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Dutch translation and cross-cultural adaptation of the PROMIS® physical function item bank and cognitive pre-test in Dutch arthritis patients.

PubMed

Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj

2012-03-05

Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.
Expansion of a physical function item bank and development of an abbreviated form for clinical research.

PubMed

Bode, Rita K; Lai, Jin-shei; Dineen, Kelly; Heinemann, Allen W; Shevrin, Daniel; Von Roenn, Jamie; Cella, David

2006-01-01

We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT.
Scale Refinement and Initial Evaluation of a Behavioral Health Function Measurement Tool for Work Disability Evaluation

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.

2014-01-01

Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trial.

PubMed

Hahn, Elizabeth A; Bode, Rita K; Du, Hongyan; Cella, David

2006-01-01

In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
Tracking functional status across the spinal cord injury lifespan: linking pediatric and adult patient-reported outcome scores.

PubMed

Tian, Feng; Ni, Pengsheng; Mulcahey, M J; Hambleton, Ronald K; Tulsky, David; Haley, Stephen M; Jette, Alan M

2014-11-01

To use item response theory (IRT) methods to link scores from 2 recently developed contemporary functional outcome measures, the adult Spinal Cord Injury-Functional Index (SCI-FI) and the Pedi SCI (both the parent version and the child version). Secondary data analysis of the physical functioning items of the adult SCI-FI and the Pedi SCI instruments. We used a nonequivalent group design with items common to both instruments and the Stocking-Lord method for the linking. Linking was conducted so that the adult SCI-FI and Pedi SCI scaled scores could be compared. Community. This study included a total sample of 1558 participants. Pedi SCI items were administered to a sample of children (n=381) with SCI aged 8 to 21 years, and of parents/caregivers (n=322) of children with SCI aged 4 to 21 years. Adult SCI-FI items were administered to a sample of adults (n=855) with SCI aged 18 to 92 years. Not applicable. Five scales common to both instruments were included in the analysis: Wheelchair, Daily Routine/Self-care, Daily Routine/Fine Motor, Ambulation, and General Mobility functioning. Confirmatory factor analysis and exploratory factor analysis results indicated that the 5 scales are unidimensional. A graded response model was used to calibrate the items. Misfitting items were identified and removed from the item banks. Items that function differently between the adult and child samples (ie, exhibit differential item functioning) were identified and removed from the common items used for linking. Domain scores from the Pedi SCI instruments were transformed onto the adult SCI-FI metric. This IRT linking allowed estimation of adult SCI-FI scale scores based on Pedi SCI scale scores and vice versa; therefore, it provides clinicians with a means of tracking long-term functional data for children with an SCI across their entire lifespan. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Aggregating Polytomous DIF Results over Multiple Test Administrations

ERIC Educational Resources Information Center

Zwick, Rebecca; Ye, Lei; Isham, Steven

2018-01-01

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…

Development of a Self-Report Physical Function Instrument for Disability Assessment: Item Pool Construction and Factor Analysis

PubMed Central

McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.

2014-01-01

Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402
Functional and Neuroanatomical Specificity of Episodic Memory Dysfunction in Schizophrenia: An fMRI study of the Relational and Item-Specific Encoding Task

PubMed Central

Ragland, J. Daniel; Ranganath, Charan; Harms, Michael P.; Barch, Deanna M.; Gold, James M.; Layher, Evan; Lesh, Tyler A.; MacDonald, Angus W.; Niendam, Tara A.; Phillips, Joshua; Silverstein, Steven M.; Yonelinas, Andrew P.; Carter, Cameron S.

2015-01-01

Importance Individuals with schizophrenia (SZ) can encode item-specific information to support familiarity-based recognition, but are disproportionately impaired encoding inter-item relationships (relational encoding) and recollecting information. The Relational and Item-Specific Encoding (RiSE) paradigm has been used to disentangle these encoding and retrieval processes, which may be dependent on specific medial temporal lobe (MTL) and prefrontal cortex (PFC) subregions. Functional imaging during RiSE task performance could help to specify dysfunctional neural circuits in SZ that can be targeted for interventions to improve memory and functioning in the illness. Objectives To use functional magnetic resonance imaging (fMRI) to test the hypothesis that SZ disproportionately affects MTL and PFC subregions during relational encoding and retrieval, relative to item-specific memory processes. Imaging results from healthy comparison subjects (HC) will also be used to establish neural construct validity for RiSE. Design, Setting, and Participants This multi-site, case-control, cross-sectional fMRI study was conducted at five CNTRACS sites. The final sample included 52 clinically stable outpatients with SZ, and 57 demographically matched HC. Main Outcomes and Measures Behavioral performance speed and accuracy (d’) on item recognition and associative recognition tasks. Voxelwise statistical parametric maps for a priori MTL and PFC regions of interest (ROI), testing activation differences between relational and item-specific memory during encoding and retrieval. Results Item recognition was disproportionately impaired in SZ patients relative to controls following relational encoding. The differential deficit was accompanied by reduced dorsolateral prefrontal cortex (DLPFC) activation during relational encoding in SZ, relative to HC. Retrieval success (hits > misses) was associated with hippocampal (HI) activation in HC during relational item recognition and associative recognition conditions, and HI activation was specifically reduced in SZ for recognition of relational but not item-specific information. Conclusions In this unique, multi-site fMRI study, HC results supported RiSE construct validity by revealing expected memory effects in PFC and MTL subregions during encoding and retrieval. Comparison of SZ and HC revealed disproportionate memory deficits in SZ for relational versus item-specific information, accompanied by regionally and functionally specific deficits in DLPFC and HI activation. PMID:26200928
Gender-, age-, and race/ethnicity-based differential item functioning analysis of the movement disorder society-sponsored revision of the Unified Parkinson's disease rating scale.

PubMed

Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng

2016-12-01

Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Optimal Linking Design for Response Model Parameters

ERIC Educational Resources Information Center

Barrett, Michelle D.; van der Linden, Wim J.

2017-01-01

Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…
EXTENDING THE FLOOR AND THE CEILING FOR ASSESSMENT OF PHYSICAL FUNCTION

PubMed Central

Fries, James F.; Lingala, Bharathi; Siemons, Liseth; Glas, Cees A. W.; Cella, David; Hussain, Yusra N; Bruce, Bonnie; Krishnan, Eswar

2014-01-01

Objective The objective of the current study was to improve the assessment of physical function by improving the precision of assessment at the floor (extremely poor function) and at the ceiling (extremely good health) of the health continuum. Methods Under the NIH PROMIS program, we developed new physical function floor and ceiling items to supplement the existing item bank. Using item response theory (IRT) and the standard PROMIS methodology, we developed 30 floor items and 26 ceiling items and administered them during a 12-month prospective observational study of 737 individuals at the extremes of health status. Change over time was compared across anchor instruments and across items by means of effect sizes. Using the observed changes in scores, we back-calculated sample size requirements for the new and comparison measures. Results We studied 444 subjects with chronic illness and/or extreme age, and 293 generally fit subjects including athletes in training. IRT analyses confirmed that the new floor and ceiling items outperformed reference items (p<0.001). The estimated post-hoc sample size requirements were reduced by a factor of two to four at the floor and a factor of two at the ceiling. Conclusion Extending the range of physical function measurement can substantially improve measurement quality, can reduce sample size requirements and improve research efficiency. The paradigm shift from Disability to Physical Function includes the entire spectrum of physical function, signals improvement in the conceptual base of outcome assessment, and may be transformative as medical goals more closely approach societal goals for health. PMID:24782194
Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions

PubMed Central

Liharska, Lora; Harvey, Philip D.; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S.E.

2017-01-01

Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms. PMID:29410935
Application of a Method of Estimating DIF for Polytomous Test Items.

ERIC Educational Resources Information Center

Camilli, Gregory; Congdon, Peter

1999-01-01

Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)
Functional recovery in patients with schizophrenia: recommendations from a panel of experts.

PubMed

Lahera, Guillermo; Gálvez, José L; Sánchez, Pedro; Martínez-Roig, Miguel; Pérez-Fuster, J V; García-Portilla, Paz; Herrera, Berta; Roca, Miquel

2018-06-05

The management of schizophrenia is evolving towards a more comprehensive model based on functional recovery. The concept of functional recovery goes beyond clinical remission and encompasses multiple aspects of the patient's life, making it difficult to settle on a definition and to develop reliable assessment criteria. In this consensus process based on a panel of experts in schizophrenia, we aimed to provide useful insights on functional recovery and its involvement in clinical practice and clinical research. After a literature review of functional recovery in schizophrenia, a scientific committee of 8 members prepared a 75-item questionnaire, including 6 sections: (I) the concept of functional recovery (9 items), (II) assessment of functional recovery (23 items), (III) factors influencing functional recovery (16 items), (IV) psychosocial interventions and functional recovery (8 items), (V) pharmacological treatment and functional recovery (14 items), and (VI) the perspective of patients and their relatives on functional recovery (5 items). The questionnaire was sent to a panel of 53 experts, who rated each item on a 9-point Likert scale. Consensus was achieved in a 2-round Delphi dynamics, using the median (interquartile range) scores to consider consensus in either agreement (scores 7-9) or disagreement (scores 1-3). Items not achieving consensus in the first round were sent back to the experts for a second consideration. After the two recursive rounds, consensus was achieved in 64 items (85.3%): 61 items (81.3%) in agreement and 3 (4.0%) in disagreement, all of them from section II (assessment of functional recovery). Items not reaching consensus were related to the concepts of functional recovery (1 item, 1.3%), functional assessment (5 items, 6.7%), factors influencing functional recovery (3 items, 4.0%), and psychosocial interventions (2 items, 5.6%). Despite the lack of a well-defined concept of functional recovery, we identified a trend towards a common archetype of the definition and factors associated with functional recovery, as well as its applicability in clinical practice and clinical research.
Factor structure and gender stability in the multidimensional condom attitudes scale.

PubMed

Starosta, Amy J; Berghoff, Christopher R; Earleywine, Mitch

2015-06-01

Sexually transmitted infections continue to trouble the United States and can be attenuated through increased condom use. Attitudes about condoms are an important multidimensional factor that can affect sexual health choices and have been successfully measured using the Multidimensional Condom Attitudes Scale (MCAS). Such attitudes have the potential to vary between men and women, yet little work has been undertaken to identify if the MCAS accurately captures attitudes without being influenced by underlying gender biases. We examined the factor structure and gender invariance on the MCAS using confirmatory factor analysis and item response theory, within-subscale differential item functioning analyses. More than 770 participants provided data via the Internet. Results of differential item functioning analyses identified three items as differentially functioning between the genders, and removal of these items is recommended. Findings confirmed the previously hypothesized multidimensional nature of condom attitudes and the five-factor structure of the MCAS even after the removal of the three problematic items. In general, comparisons across genders using the MCAS seem reasonable from a methodological standpoint. Results are discussed in terms of improving sexual health research and interventions. © The Author(s) 2014.
Age-related Differential Item Functioning for the Patient-Reported Outcomes Information System (PROMIS®) Physical Functioning Items.

PubMed

Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D

2013-03-29

To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

PubMed

Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

2017-03-01

Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

PubMed

Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

2015-08-19

Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Impact of Missing Data on the Detection of Differential Item Functioning: The Case of Mantel-Haenszel and Logistic Regression Analysis

ERIC Educational Resources Information Center

Robitzsch, Alexander; Rupp, Andre A.

2009-01-01

This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
The Spinal Cord Injury- Functional Index: Item Banks to Measure Physical Functioning of Individuals with Spinal Cord Injury

PubMed Central

Tulsky, David S.; Jette, Alan; Kisala, Pamela A.; Kalpakjian, Claire; Dijkers, Marcel P.; Whiteneck, Gale; Ni, Pengsheng; Kirshblum, Steven; Charlifue, Susan; Heinemann, Allen W.; Forchheimer, Martin; Slavin, Mary; Houlihan, Bethlyn; Tate, Denise; Dyson-Hudson, Trevor; Fyffe, Denise; Williams, Steve; Zanca, Jeanne

2012-01-01

Objective To develop a comprehensive set of patient reported items to assess multiple aspects of physical functioning relevant to the lives of people with spinal cord injury (SCI) and to evaluate the underlying structure of physical functioning. Design Cross-sectional Setting Inpatient and community Participants Item pools of physical functioning were developed, refined and field tested in a large sample of 855 individuals with traumatic spinal cord injury stratified by diagnosis, severity, and time since injury Interventions None Main Outcome Measure SCI-FI measurement system Results Confirmatory factor analysis (CFA) indicated that a 5-factor model, including basic mobility, ambulation, wheelchair mobility, self care, and fine motor, had the best model fit and was most closely aligned conceptually with feedback received from individuals with SCI and SCI clinicians. When just the items making up basic mobility were tested in CFA, the fit statistics indicate strong support for a unidimensional model. Similar results were demonstrated for each of the other four factors indicating unidimensional models. Conclusions Though unidimensional or 2-factor (mobility and upper extremity) models of physical functioning make up outcomes measures in the general population, the underlying structure of physical function in SCI is more complex. A 5-factor solution allows for comprehensive assessment of key domain areas of physical functioning. These results informed the structure and development of the SCI-FI measurement system of physical functioning. PMID:22609299
Developing an Initial Physical Function Item Bank from Existing Sources.

ERIC Educational Resources Information Center

Bode, Rita K.; Cella, David; Lai, Jin-shei; Heinemann, Allen W.

2003-01-01

Illustrates incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment (n=1,755 and n=1,544). Results support findings from previous studies that have equated separate instruments by co-calibrating their items. (SLD)
The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

PubMed

Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

2017-04-01

Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Methodology for the development and calibration of the SCI-QOL item banks

PubMed Central

Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David

2015-01-01

Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Physical function metric over measure: An illustration with the Patient-Reported Outcomes Measurement Information System (PROMIS) and the Functional Assessment of Cancer Therapy (FACT).

PubMed

Kaat, Aaron J; Schalet, Benjamin D; Rutsohn, Joshua; Jensen, Roxanne E; Cella, David

2018-01-01

Measuring patient-reported outcomes (PROs) is becoming an integral component of quality improvement initiatives, clinical care, and research studies in cancer, including comparative effectiveness research. However, the number of PROs limits comparability across studies. Herein, the authors attempted to link the Functional Assessment of Cancer Therapy-General Physical Well-Being (FACT-G PWB) subscale with the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) calibrated item bank. The also sought to augment a subset of the conceptually most similar FACT-G PWB items with PROMIS PF items to improve the linking. Baseline data from 5506 participants in the Measuring Your Health (MY-Health) study were used to identify the optimal items for linking FACT-G PWB with PROMIS PF. A mixed methods approach identified the optimal items for creating the 5-item FACT/PROMIS-PF5 scale. Both the linked and augmented relationships were cross-validated using the follow-up MY-Health data. A 5-item FACT-G PWB item subset was found to be optimal for linking with PROMIS PF. In addition, a 2-item subset, including only items that were conceptually very similar to the PROMIS item bank content, were augmented with 3 PROMIS PF items. This new FACT/PROMIS-PF5 provided superior score recovery. The PROMIS PF metric allows for the evaluation of the extent to which similar questionnaires can be linked and therefore expressed on the same metric. These results allow for the aggregation of existing data and provide an optimal measure for future studies wishing to use the FACT yet also report on the PROMIS PF metric. Cancer 2018;124:153-60. © 2017 American Cancer Society. © 2017 American Cancer Society.
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument

PubMed Central

Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.

2012-01-01

Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
Development of a new Rasch-based scoring algorithm for the National Eye Institute Visual Functioning Questionnaire to improve its interpretability.

PubMed

Petrillo, Jennifer; Bressler, Neil M; Lamoureux, Ecosse; Ferreira, Alberto; Cano, Stefan

2017-08-14

The NEI VFQ-25 has undergone psychometric evaluation in patients with varying ocular conditions and the general population. However, important limitations which may affect the interpretation of clinical trial results have been previously identified, such as concerns with reliability and validity. The purpose of this study was to evaluate the National Eye Institute Visual Functioning Questionnaire (NEI VFQ-25) and make recommendations for a revised scoring structure, with a view to improving its psychometric performance and interpretability. Rasch Measurement Theory analyses were conducted in two stages using pooled baseline NEI VFQ-25 data for 2487 participants with retinal diseases enrolled in six clinical trials. In stage 1, we examined: scale-to-sample targeting; thresholds for item response options; item fit statistics; stability; local dependence; and reliability. In stage 2, a post-hoc revision of the scoring structure (VFQ-28R) was created and psychometrically re-evaluated. In stage 1, we found that the NEI VFQ-25 was mis-targeted to the sample, and had disordered response thresholds (15/25 items) and mis-fitting items (8/25 items). However, items appeared to be stable (differential item functioning for three items), have minimal item dependency (one pair of items) and good reliability (person-separation index, 0.93). In stage 2, the modified Rasch-scored NEI VFQ-28-R was assessed. It comprised two broad domains: Activity Limitation (19 items) and Socio-Emotional Functioning (nine items). The NEI VFQ-28-R demonstrated improved performance with fewer disordered response thresholds (no items), less item misfit (three items) and improved population targeting (reduced ceiling effect) compared with the NEI VFQ-25. Compared with the original version, the proposed NEI VFQ-28-R, with Rasch-based scoring and a two-domain structure, appears to offer improved psychometric performance and interpretability of the vision-related quality of life scale for the population analysed.

A Differential Item Functional Analysis by Age of Perceived Interpersonal Discrimination in a Multi-racial/ethnic Sample of Adults.

PubMed

Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R

2015-11-05

We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
Acculturation and the Center For Epidemiological Studies-Depression Scale for Hispanic women.

PubMed

McCabe, Brian E; Vermeesch, Amber L; Hall, Rosemary F; Peragallo, Nilda P; Mitrani, Victoria B

2011-01-01

Culturally valid measures of depression for Spanish-speaking Hispanic women are important for developing and implementing effective interventions to reduce health disparities. The Center for Epidemiological Studies-Depression Scale (CES-D) is a widely used measure of depression. Differential item functioning has been studied using language preference as a proxy for acculturation, but it is unknown if the results were due to acculturation or the language of administration. The aim of this study was to evaluate the relationship of acculturation, defined with a dimensional measure, to Spanish CES-D item responses. Spanish-speaking Hispanic women (n = 504) were recruited for a randomized controlled trial of Salud, Educación, Prevención y Autocuidado (Health, Education, Prevention, and Self-Care). Acculturation, an important dimension of variation within the diverse U.S. Hispanic community, was defined by high or low scores on the Americanism subscale of the Bidimensional Acculturation Scale. Differential item functioning for each of the 20 CES-D items between more acculturated and less acculturated women was tested using ordinal logistic regression. No items on the Depressed Affect, Somatic Activity, or Positive Affect subscales showed meaningful differential item functioning, but 1 item ("People were unfriendly") on the Interpersonal subscale had small results (R = 1.1%). The majority of CES-D items performed similarly for Spanish-speaking Hispanic women with high and low acculturation. Less acculturated women responded more positively to "People were unfriendly," despite having an equivalent level of depression, than did more acculturated women. Possibilities for improving this item are proposed.
Assessing psychological well-being: self-report instruments for the NIH Toolbox.

PubMed

Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David

2014-02-01

Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
Mapping the Mayo-Portland adaptability inventory to the international classification of functioning, disability and health.

PubMed

Lexell, Jan; Malec, James F; Jacobsson, Lars J

2012-01-01

To examine the contents of the Mayo-Portland Adaptability Inventory (MPAI-4) by mapping it to the International Classification of Functioning, Disability and Health (ICF). Each of the 30 scoreable items in the MPAI-4 was mapped to the most precise ICF categories. All 30 items could be mapped to components and categories in the ICF. A total of 88 meaningful concepts were identified. There were, on average, 2.9 meaningful concepts per item, and 65% of all concepts could be mapped. Items in the Ability and Adjustment subscales mapped to categories in both the Body Functions and Activity/Participation components of the ICF, whereas all except 1 in the Participation subscale were to categories in the Activity/Participation component. The items could also be mapped to 34 (13%) of the 258 Environmental Factors in the ICF. This mapping provides better definition through more concrete examples (as listed in the ICF) of the types of body functions, activities, and participation indicators that are represented by the 30 scoreable MPAI-4 items. This may assist users throughout the world in understanding the intent of each item, and support further development and the possibility to report results in the form of an ICF categorical profile, making it universally interpretable.
A Rasch-validated version of the upper extremity functional index for interval-level measurement of upper extremity function.

PubMed

Hamilton, Clayon B; Chesworth, Bert M

2013-11-01

The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0-100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.
Mapping the Content of the Patient Reported Outcomes Measurement Information System (PROMIS®) Using the International Classification of Functioning, Health and Disability

PubMed Central

Tucker, Carole A; Escorpizo, Reuben; Cieza, Alarcos; Lai, Jin Shei; Stucki, Gerold; Ustun, T. Bedirhan; Kostanjsek, Nenad; Cella, David; Forrest, Christopher B.

2014-01-01

Background The Patient Reported Outcomes Measurement Information System (PROMIS®) is a U.S. National Institutes of Health initiative that has produced self-reported item banks for physical, mental, and social health. Objective To describe the content of PROMIS at the item level using the World Health Organization’s International Classification of Functioning, Disability and Health (ICF). Methods All PROMIS adult items (publicly available as of 2012) were assigned to relevant ICF concepts. The content of the PROMIS adult item banks were then described using the mapped ICF code descriptors. Results The 1006 items in the PROMIS instruments could all be mapped to ICF concepts at the second level of classification, with the exception of 3 items of global or general health that mapped across the first-level classification of ICF activity and participation component (d categories). Individual PROMIS item banks mapped from 1 to 5 separate ICF codes indicating one-to-one, one-to-many and many-to-one mappings between PROMIS item banks and ICF second level classification codes. PROMIS supports measurement of the majority of major concepts in the ICF Body Functions (b) and Activity & Participation (d) components using PROMIS item banks or subsets of PROMIS items that could, with care, be used to develop customized instruments. Given the focus of PROMIS is on measurement of person health outcomes, concepts in body structures (s) and some body functions (b), as well as many ICF environmental factor have minimal coverage in PROMIS. Discussion The PROMIS-ICF mapped items provide a basis for users to evaluate the ICF related content of specific PROMIS instruments, and to select PROMIS instruments in ICF based measurement applications. PMID:24760532
Measurement characteristics for two health-related quality of life measures in older adults: The SF-36 and the CDC Healthy Days items

PubMed Central

Barile, John P.; Horner-Johnson, Willi; Krahn, Gloria; Zack, Matthew; Miranda, David; DeMichele, Kimberly; Ford, Derek; Thompson, William W.

2017-01-01

Background The Short Form Health Survey (SF-36) and the Centers for Disease Control and Prevention (CDC) Healthy Days items are well known measures of health-related quality of life. The validity of the SF-36 for older adults and those with disabilities has been questioned. Objective Assess the extent to which the SF-36 and the Centers for Disease Control and Prevention (CDC) Healthy Days items measure the same aspects of health; whether the SF-36 and the CDC unhealthy days items are invariant across gender, functional status, or the presence of chronic health conditions of older adults; and whether each of the SF-36’s eight subscales is independently associated with the CDC Healthy Days items. Methods We analyzed data from 66,269 adult Medicare advantage members age 65 and older. We used confirmatory factor analyses and regression modeling to test associations between the CDC Healthy Days items and subscales of the SF-36. Results The CDC Healthy Days items were associated with the SF-36 global measures of physical and mental health. The CDC physically unhealthy days item was associated with the SF-36 subscales for bodily pain, physical role limitations, and general health, while the CDC mentally unhealthy days item was associated with the SF-36 subscales for mental health, emotional role limitations, vitality and social functioning. The SF-36 physical functioning subscale was not independently associated with either of the CDC Healthy Days items. Conclusions The CDC Healthy Days items measure similar domains as the SF-36 but appear to assess HRQOL without regard to limitations in functioning. PMID:27259343
Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions: Implications for Social, Linguistic, and Cultural Consistency.

PubMed

Khan, Anzalee; Liharska, Lora; Harvey, Philip D; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S E

2017-12-01

Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms.
An Effect Size Measure for Raju's Differential Functioning for Items and Tests

ERIC Educational Resources Information Center

Wright, Keith D.; Oshima, T. C.

2015-01-01

This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Thurman, Carol

2009-01-01

The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
Serial position functions in general knowledge.

PubMed

Kelley, Matthew R; Neath, Ian; Surprenant, Aimée M

2015-11-01

Serial position functions with marked primacy and recency effects are ubiquitous in episodic memory tasks. The demonstrations reported here explored whether bow-shaped serial position functions would be observed when people ordered exemplars from various categories along a specified dimension. The categories and dimensions were: actors and age; animals and weight; basketball players and height; countries and area; and planets and diameter. In all cases, a serial position function was observed: People were more accurate to order the youngest and oldest actors, the lightest and heaviest animals, the shortest and tallest basketball players, the smallest and largest countries, and the smallest and largest planets, relative to intermediate items. The results support an explanation of serial position functions based on relative distinctiveness, which predicts that serial position functions will be observed whenever a set of items can be sensibly ordered along a particular dimension. The serial position function arises because the first and last items enjoy a benefit of having no competitors on 1 side and therefore have enhanced distinctiveness relative to mid-dimension items, which suffer by having many competitors on both sides. (c) 2015 APA, all rights reserved).
Further evaluation of leisure items in the attention condition of functional analyses.

PubMed

Roscoe, Eileen M; Carreau, Abbey; MacDonald, Jackie; Pence, Sacha T

2008-01-01

Research suggests that including leisure items in the attention condition of a functional analysis may produce engagement that masks sensitivity to attention. In this study, 4 individuals' initial functional analyses indicated that behavior was maintained by nonsocial variables (n = 3) or by attention (n = 1). A preference assessment was used to identify items for subsequent functional analyses. Four conditions were compared, attention with and without leisure items and control with and without leisure items. Following this, either high- or low-preference items were included in the attention condition. Problem behavior was more probable during the attention condition when no leisure items or low-preference items were included, and lower levels of problem behavior were observed during the attention condition when high-preference leisure items were included. These findings suggest how preferred items may hinder detection of behavioral function.
Accuracy of 30-Day Recall for Components of Sexual Function and the Moderating Effects of Gender and Mood

PubMed Central

Lin, Li; Dombeck, Carrie B.; Broderick, Joan E.; Snyder, Denise C.; Williams, Megan S.; Fawzy, Maria R.; Flynn, Kathryn E.

2013-01-01

Introduction Despite the ubiquity of 1-month recall periods for measures of sexual function, there is limited evidence for how well recalled responses correspond to individuals’ actual daily experiences. Aim To characterize the correspondence between daily sexual experiences and 1-month recall of those experiences. Methods Following a baseline assessment of sexual functioning, health, and demographic characteristics, 202 adults from the general population (101 women, 101 men) were recruited to complete daily assessments of their sexual function online for 30 days and a single recall measures of sexual function at day 30. Main Outcome Measures At the baseline and 30-day follow-ups, participants answered items asking about sexual satisfaction, sexual activities, interest, interfering factors, orgasm, sexual functioning, and use of therapeutic aids during the previous 30 days. Participants also completed a measure of positive and negative affect at follow-up. The main outcome measures were agreement between the daily and 1-month recall versions of the sexual function items. Results Accuracy of recall varied depending on the item and on the gender and mood of the respondent. Recall was better (low bias and higher correlations) for sexual activities, vaginal discomfort, erectile function, and more frequently used therapeutic aids. Recall was poorer for interest, affectionate behaviors (eg, kissing), and orgasm-related items. Men more than women overestimated frequency of interest and masturbation. Concurrent mood was related to over- or underreporting for 6 items addressing the frequency of masturbation and vaginal intercourse, erectile function, and orgasm. Conclusions A 1-month recall period seems acceptable for many aspects of sexual function in this population, but recall for some items was poor. Researchers should be aware that concurrent mood can have a powerful biasing effect on reports of sexual function. PMID:23802907
Development of the NIH PROMIS® Sexual Function and Satisfaction Measures in Patients with Cancer

PubMed Central

Flynn, Kathryn E.; Lin, Li; Cyranowski, Jill M.; Reeve, Bryce B.; Reese, Jennifer Barsky; Jeffery, Diana D.; Smith, Ashley Wilder; Porter, Laura S.; Dombeck, Carrie B.; Bruner, Deborah Watkins; Keefe, Francis J.; Weinfurt, Kevin P.

2013-01-01

Introduction We describe the development and validation of the PROMIS Sexual Function and Satisfaction (PROMIS SexFS) measures version 1.0 for cancer populations. Aim To develop a customizable self-report measure of sexual function and satisfaction as part of the U.S. National Institutes of Health PROMIS® Network. Methods Our multidisciplinary working group followed a comprehensive protocol for developing psychometrically robust patient reported outcome (PRO) measures including qualitative (scale development) and quantitative (psychometric evaluation) development. We performed an extensive literature review, conducted 16 focus groups with cancer patients and multiple discussions with clinicians, and evaluated candidate items in cognitive testing with patients. We administered items to 819 cancer patients. Items were calibrated using item response theory and evaluated for reliability and validity. Main Outcome Measures The PROMIS Sexual Function and Satisfaction (PROMIS SexFS) measures version 1.0 include 79 items in 11 domains: interest in sexual activity, lubrication, vaginal discomfort, erectile function, global satisfaction with sex life, orgasm, anal discomfort, therapeutic aids, sexual activities, interfering factors, and screener questions. Results In addition to content validity (patients indicate that items cover important aspects of their experiences) and face validity (patients indicate that items measure sexual function and satisfaction), the measure shows evidence for discriminant validity (domains discriminate between groups expected to be different), convergent validity (strong correlations between scores on PROMIS and scores on conceptually-similar older measures of sexual function), as well as favorable test-retest reliability among people not expected to change (inter-class correlations from 2 administrations of the instrument, 1 month apart). Conclusions The PROMIS SexFS offers researchers a reliable and valid set of tools to measure self-reported sexual function and satisfaction among diverse men and women. The measures are customizable; researchers can select the relevant domains and items comprising those domains for their study. PMID:23387911
Efficient Algorithms for Segmentation of Item-Set Time Series

NASA Astrophysics Data System (ADS)

Chundi, Parvathi; Rosenkrantz, Daniel J.

We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time series segmentation techniques to analyze the temporal content of three different data sets—Enron email, stock market data, and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining the item set data at the time points, and can be used to analyze different types of temporal data.
Gender Invariance of the Gambling Behavior Scale for Adolescents (GBS-A): An Analysis of Differential Item Functioning Using Item Response Theory.

PubMed

Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina

2017-01-01

As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

PubMed

Waller, Niels G; Feuerstahler, Leah

2017-01-01

In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).
Item response theory detects differential item functioning between healthy and ill children in QoL measures

PubMed Central

Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

2008-01-01

Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Examination of the PROMIS upper extremity item bank.

PubMed

Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
The second version of the L. V. Prasad-functional vision questionnaire.

PubMed

Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K

2012-11-01

The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.

Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions

ERIC Educational Resources Information Center

Liang, Longjuan; Browne, Michael W.

2015-01-01

If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Health and role functioning: the use of focus groups in the development of an item bank.

PubMed

Anatchkova, Milena D; Bjorner, Jakob B

2010-02-01

Role functioning is an important part of health-related quality of life. However, assessment of role functioning is complicated by the wide definition of roles and by fluctuations in role participation across the life-span. The aim of this study is to explore variations in role functioning across the lifespan using qualitative approaches, to inform the development of a role functioning item bank and to pilot test sample items from the bank. Eight focus groups were conducted with a convenience sample of 38 English-speaking adults recruited in Rhode Island. Participants were stratified by gender and four age groups. Focus groups were taped, transcribed, and analyzed for thematic content. Participants of all ages identified family roles as the most important. There was age variation in the importance of social life roles, with younger and older adults rating them as more important. Occupational roles were identified as important by younger and middle-aged participants. The potential of health problems to affect role participation was recognized. Participants found the sample items easy to understand, response options identical in meaning and preferred five response choices. Participants identified key aspects of role functioning and provided insights on their perception of the impact of health on their role participation. These results will inform item bank generation.
Dual representation of item positions in verbal short-term memory: Evidence for two access modes.

PubMed

Lange, Elke B; Verhaeghen, Paul; Cerella, John

Memory sets of N = 1~5 digits were exposed sequentially from left-to-right across the screen, followed by N recognition probes. Probes had to be compared to memory list items on identity only (Sternberg task) or conditional on list position. Positions were probed randomly or in left-to-right order. Search functions related probe response times to set size. Random probing led to ramped, "Sternbergian" functions whose intercepts were elevated by the location requirement. Sequential probing led to flat search functions-fast responses unaffected by set size. These results suggested that items in STM could be accessed either by a slow search-on-identity followed by recovery of an associated location tag, or in a single step by following item-to-item links in study order. It is argued that this dual coding of location information occurs spontaneously at study, and that either code can be utilised at retrieval depending on test demands.
Application of Item Response Theory to Tests of Substance-related Associative Memory

PubMed Central

Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

2015-01-01

A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Rasch validation of the Arabic version of the lower extremity functional scale.

PubMed

Alnahdi, Ali H

2018-02-01

The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
Improving Assessment of Work Related Mental Health Function Using the Work Disability Functional Assessment Battery (WD-FAB).

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M

2018-03-01

Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Cross-cultural validation of a behavioral screener for executive functions: Guidelines for clinical use among Colombian children with and without ADHD.

PubMed

Garcia-Barrera, Mauricio A; Karr, Justin E; Duran, Victor; Direnfeld, Esther; Pineda, David A

2015-12-01

Garcia-Barrera, Kamphaus, and Bandalos (2011) derived a 25-item executive functioning screener from the Behavior Assessment System for Children (BASC), measuring 4 latent executive constructs: problem solving, attentional control, behavioral control, and emotional control. The current study included a cross-cultural examination of this screener in Colombian children with and without attention-deficit/hyperactivity disorder (ADHD). BASC teacher ratings were collected for Colombian children ages 6-11 years (848 healthy children [53% boys] and 155 children with ADHD [76% boys]). To examine the psychometric properties of the screener, a multistep procedure was implemented, including (a) confirmatory factor analysis (CFA) and factorial invariance testing across gender, age group (6-8 years, 9-11 years), and ADHD status to replicate and extend the original derivation; (b) item response theory (IRT) analysis to evaluate the information provided by individual items; and (c) given IRT results, a repeated CFA and invariance testing after the exclusion of 1 item from the problem-solving factor. The 24-item 4-factor model fit was adequate for controls and for ADHD participants. Results support the use of the 24-item executive functioning screener in a cross-cultural context. In turn, in supplemental material, normative data for the Colombian sample are reported along with bilingual guidelines (i.e., Spanish/English) for implementing the screener in clinical practice. Even though the screener is useful when examining executive functions, it was not designed as a diagnostic measure for developmental disorders such as ADHD; as such, it should only inform about status of executive functioning. (c) 2015 APA, all rights reserved).
Psychometrics of Multiple Choice Questions with Non-Functioning Distracters: Implications to Medical Education.

PubMed

Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah

2015-01-01

The functionality of distracters in a multiple choice question plays a very important role. We examined the frequency and impact of functioning and non-functioning distracters on psychometric properties of 5-option items in clinical disciplines. We analyzed item statistics of 1115 multiple choice questions from 15 summative assessments of undergraduate medical students and classified the items into five groups by their number of non-functioning distracters. We analyzed the effect of varying degree of non-functionality ranging from 0 to 4, on test reliability, difficulty index, discrimination index and point biserial correlation. The non-functionality of distracters inversely affected the test reliability and quality of items in a predictable manner. The non-functioning distracters made the items easier and lowered the discrimination index significantly. Three non-functional distracters in a 5-option MCQ significantly affected all psychometric properties (p < 0.5). The corrected point biserial correlation revealed that the items with 3 functional options were psychometrically as effective as 5-option items. Our study reveals that a multiple choice question with 3 functional options provides lower most limit of item format that has adequate psychometric property. The test containing items with less number of functioning options have significantly lower reliability. The distracter function analysis and revision of nonfunctioning distracters can serve as important methods to improve the psychometrics and reliability of assessment.
Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

NASA Astrophysics Data System (ADS)

Ilich, Maria O.

Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
An Examination of Differential Item Functioning on the Vanderbilt Assessment of Leadership in Education

ERIC Educational Resources Information Center

Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph

2009-01-01

The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.

PubMed

Chen, Senlin; Zhu, Xihe; Kang, Minsoo

2017-05-01

A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
Differential Item Functioning Analysis Using Rasch Item Information Functions

ERIC Educational Resources Information Center

Wyse, Adam E.; Mapuranga, Raymond

2009-01-01

Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Psychological distress in cancer survivors: the further development of an item bank.

PubMed

Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P

2013-02-01

Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
Solving the measurement invariance anchor item problem in item response theory.

PubMed

Meade, Adam W; Wright, Natalie A

2012-09-01

The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
Rasch measurement: the Arm Activity measure (ArmA) passive function sub-scale.

PubMed

Ashford, Stephen; Siegert, Richard J; Alexandrescu, Roxana

2016-01-01

To evaluate the conformity of the Arm Activity measure (ArmA) passive function sub-scale to the Rasch model. A consecutive cohort of patients (n = 92) undergoing rehabilitation, including upper limb rehabilitation and spasticity management, at two specialist rehabilitation units were included. Rasch analysis was used to examine scaling and conformity to the model. Responses were analysed using Rasch unidimensional measurement models (RUMM 2030). The following aspects were considered: overall model and individual item fit statistics and fit residuals, internal reliability, item response threshold ordering, item bias, local dependency and unidimensionality. ArmA contains both active and passive function sub-scales, but in this analysis only the passive function sub-scale was considered. Four of the seven items in the ArmA passive function sub-scale initially had disordered thresholds. These items were rescored to four response options, which resulted in ordered thresholds for all items. Once the items with disordered thresholds had been rescored, item bias was not identified for age, global disability level or diagnosis, but with a small difference in difficulty between males and females for one item of the scale. Local dependency was not observed and the unidimensionality of the sub-scale was supported and good fit to the Rasch model was identified. The person separation index (PSI) was 0.95 indicating that the scale is able to reliably differentiate at least two groups of patients. The ArmA passive function sub-scale was shown in this evaluation to conform to the Rasch model once disordered thresholds had been addressed. Using the logit scores produced by the Rasch model it was possible to convert this back to the original scale range. Implications for Rehabilitation The ArmA passive function sub-scale was shown, in this evaluation, to conform to the Rasch model once disordered thresholds had been addressed and therefore to be a clinically applicable and potentially useful hierarchical measure. Using Rasch logit scores it has be possible to convert back to the original ordinal scale range and provide an indication of real change to enable evaluation of clinical outcome of importance to patients and clinicians.
Detection of Differential Item Functioning Using the Lasso Approach

ERIC Educational Resources Information Center

Magis, David; Tuerlinckx, Francis; De Boeck, Paul

2015-01-01

This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Item response theory analysis of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised in the Pooled Resource Open-Access ALS Clinical Trials Database.

PubMed

Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M

2016-01-01

Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Development of an Instrument to Measure Behavioral Health Function for Work Disability: Item Pool Construction and Factor Analysis

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Haley, Stephen M.; Jette, Alan M.; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Brandt, Diane E.; Rasch, Elizabeth K.

2014-01-01

Objectives To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Design Cross-sectional. Setting Community. Participants Item pools of behavioral health functioning were developed, refined, and field-tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working due to mental or both mental and physical conditions. Interventions None. Main Outcome Measure Social Security Administration Behavioral Health (SSA-BH) measurement instrument Results Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, and social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the four scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these four distinct scales of function. Conclusion This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. PMID:23548542

Comparison of Alternate and Original Items on the Montreal Cognitive Assessment

PubMed Central

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-01-01

Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
Development and validation of brief scales to measure emotional and behavioural problems among Chinese adolescents

PubMed Central

Shen, Minxue; Hu, Ming; Sun, Zhenqiu

2017-01-01

Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Screening Test Items for Differential Item Functioning

ERIC Educational Resources Information Center

Longford, Nicholas T.

2014-01-01

A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Should the SCOPA-COG be modified? A Rasch analysis perspective.

PubMed

Forjaz, M J; Frades-Payo, B; Rodriguez-Blazquez, C; Ayala, A; Martinez-Martin, P

2010-02-01

The SCales for Outcomes in PArkinson's disease-Cognition (SCOPA-COG) is a specific measure of cognitive function for Parkinson's disease (PD) patients. Previous studies, under the frame of the classic test theory, indicate satisfactory psychometric properties. The Rasch model, an item response theory approach, provides new information about the scale, as well as results in a linear scale. This study aims at analysing the SCOPA-COG according to the Rasch model and, on the basis of results, suggesting modification to the SCOPA-COG. Fit to the Rasch model was analysed using a sample of 384 PD patients. A good fit was obtained after rescoring for disordered thresholds. The person separation index, a reliability measure, was 0.83. Differential item functioning was observed by age for three items and by gender for one item. The SCOPA-COG is a unidimensional measure of global cognitive function in PD patients, with good scale targeting and no empirical evidence for use of the subscale scores. Its adequate reliability and internal construct validity were supported. The SCOPA-COG, with the proposed scoring scheme, generates true linear interval scores.
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
Validity and measurement precision of the PROMIS physical function item bank and a content validity-driven 20-item short form in rheumatoid arthritis compared with traditional measures.

PubMed

Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Glas, Cees A W; Vonkeman, Harald E; Taal, Erik; Krishnan, Eswar; Bernelot Moens, Hein J; Boers, Maarten; Terwee, Caroline B; van Riel, Piet L C M; van de Laar, Mart A F J

2015-12-01

To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
On the relationship between executive functions of working memory and components derived from fluid intelligence measures.

PubMed

Ren, Xuezhu; Schweizer, Karl; Wang, Tengfei; Chu, Pei; Gong, Qin

2017-10-01

The aim of the current study is to provide new insights into the relationship between executive functions and intelligence measures in considering the item-position effect observed in intelligence items. Raven's Advanced Progressive Matrices (APM) and Horn's LPS reasoning test were used to assess fluid intelligence which served as criterion in investigating the relationship between intelligence and executive functions. A battery of six experimental tasks measured the updating, shifting, and inhibition processes of executive functions. Data were collected from 205 university students. Fluid intelligence showed substantial correlations with the updating and inhibition processes and no correlation with the shifting process without considering the item-position effect. Next, the fixed-link model was applied to APM and LPS data separately to decompose them into an ability component and an item-position component. The results of relating the components to executive functions showed that the updating and shifting processes mainly contributed to the item-position component whereas the inhibition process was mainly associated with the ability component of each fluid intelligence test. These findings suggest that improvements in the efficiency of updating and shifting processes are likely to occur during the course of completing intelligence measures and inhibition is important for intelligence in general. Copyright © 2017 Elsevier B.V. All rights reserved.
Reliability of the Melbourne assessment of unilateral upper limb function.

PubMed

Randall, M; Carlin, J B; Chondros, P; Reddihough, D

2001-11-01

This study examines the reliability of the Melbourne Assessment of Unilateral Upper Limb Function: a quantitative test of quality of movement in children with neurological impairment. The assessment was administered to 20 children aged from 5 to 16 years (mean age 9 years 10 months, SD 2 years 10 months) who had various types and degrees of cerebral palsy (CP). The performances of the 20 children during assessment were videotaped for subsequent scoring by 15 occupational therapists. Scores were analyzed for internal consistency of test items, inter- and intrarater reliability of scorings of the same videotapes, and test-retest reliability using repeat videotaping. Results revealed very high internal consistency of test items (alpha=0.96), moderate to high agreement both within and between raters for all test items (intraclass correlations of at least 0.7) apart from item 16 (hand to mouth and down), and high interrater reliability (0.95) and intrarater reliability (0.97) for total test scores. Test-retest results revealed moderate to high intrarater reliability for item totals (mean of 0.83 and 0.79) for each rater and high reliability for test totals (0.98 and 0.97). These findings indicate that the Melbourne Assessment of Unilateral Upper Limb Function is a reliable tool for measuring the quality of unilateral upper-limb movement in children with CP.
Assessment of Differential Item Functioning in the Experiences of Discrimination Index

PubMed Central

Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro

2011-01-01

The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
Psychometric properties of the PROMIS Physical Function item bank in patients receiving physical therapy.

PubMed

Crins, Martine H P; van der Wees, Philip J; Klausch, Thomas; van Dulmen, Simone A; Roorda, Leo D; Terwee, Caroline B

2018-01-01

The Patient-Reported Outcomes Measurement Information System (PROMIS) is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs), measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF) in Dutch patients receiving physical therapy. Cross-sectional study. 805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items). Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]). Reliability (standard errors of theta) was assessed. The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance). Some local dependence was found (8.2% of item pairs). The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33) and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85). Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI. The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of physical function of physiotherapy patients.
Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

ERIC Educational Resources Information Center

Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna

2014-01-01

Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…
Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect.

PubMed

Bjorner, Jakob Bue; Pejtersen, Jan Hyld

2010-02-01

To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.

PubMed

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-03-01

The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
Executive Functions Are Employed to Process Episodic and Relational Memories in Children With Autism Spectrum Disorders

PubMed Central

2013-01-01

Objective: Long-term memory functioning in autism spectrum disorders (ASDs) is marked by a characteristic pattern of impairments and strengths. Individuals with ASD show impairment in memory tasks that require the processing of relational and contextual information, but spared performance on tasks requiring more item-based, acontextual processing. Two experiments investigated the cognitive mechanisms underlying this memory profile. Method: A sample of 14 children with a diagnosis of high-functioning ASD (age: M = 12.2 years), and a matched control group of 14 typically developing (TD) children (age: M = 12.1 years), participated in a range of behavioral memory tasks in which we measured both relational and item-based memory abilities. They also completed a battery of executive function measures. Results: The ASD group showed specific deficits in relational memory, but spared or superior performance in item-based memory, across all tasks. Importantly, for ASD children, executive ability was significantly correlated with relational memory but not with item-based memory. No such relationship was present in the control group. This suggests that children with ASD atypically employed effortful, executive strategies to retrieve relational (but not item-specific) information, whereas TD children appeared to use more automatic processes. Conclusions: The relational memory impairment in ASD may result from a specific impairment in automatic associative retrieval processes with an increased reliance on effortful and strategic retrieval processes. Our findings allow specific neural predictions to be made regarding the interactive functioning of the hippocampus, prefrontal cortex, and posterior parietal cortex in ASD as a neural network supporting relational memory processing. PMID:24245930
Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

ERIC Educational Resources Information Center

Wang, Wen-Chung

2004-01-01

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
An analysis of the DuPage County Regional Office of Education physics exam

NASA Astrophysics Data System (ADS)

Muehsler, Hans

In 2009, the DuPage County Regional Office of Education (ROE) tasked volunteer physics teachers with creating a basic skills physics exam reflecting what the participants valued and shared in common across curricula. Mechanics, electricity & magnetism (E&M), and wave phenomena emerged as the primary constructs. The resulting exam was intended for first-exposure physics students. The most recently completed version was psychometrically assessed for unidimensionality within the constructs using a robust WLS structural equation model and for reliability. An item analysis using a 3-PL IRT model was performed on the mechanics items and a 2-PL IRT model was performed on the E&M and waves items; a distractor analysis was also performed on all items. Lastly, differential item functioning (DIF) and differential test functioning (DTF) analyses, using the Mantel-Haenszel procedure, were performed using gender, ethnicity, year in school, ELL, physics level, and math level as groupings.
Are cross-cultural comparisons of personality profiles meaningful? Differential item and facet functioning in the Revised NEO Personality Inventory.

PubMed

Church, A Timothy; Alvarez, Juan M; Mai, Nhu T Q; French, Brian F; Katigbak, Marcia S; Ortiz, Fernando A

2011-11-01

Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.
Differential item functional analysis on pedagogic and content knowledge (PCK) questionnaire for Indonesian teachers using RASCH model

NASA Astrophysics Data System (ADS)

Rahmani, B. D.

2018-01-01

The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Measuring everyday functional competence using the Rasch assessment of everyday activity limitations (REAL) item bank.

PubMed

Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J

2017-11-01

Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.

Crafting the TALE: construction of a measure to assess the functions of autobiographical remembering.

PubMed

Bluck, Susan; Alea, Nicole

2011-07-01

Theory suggests that autobiographical remembering serves several functions. This research builds on previous empirical efforts (Bluck, Alea, Habermas, & Rubin, 2005) with the aim of constructing a brief, valid measure of three functions of autobiographical memory. Participants (N=306) completed 28 theoretically derived items concerning the frequency with which they use autobiographical memory to serve a variety of functions. To examine convergent and discriminant validity, participants rated their tendency to think about and talk about the past, and measures of future time orientation, self-concept clarity, and trait personality. Confirmatory factor analysis of the function items resulted in a respecified model with 15 items in three factors. The newly developed Thinking about Life Experiences scale (TALE) shows good internal consistency as well as convergent validity for three subscales: Self-Continuity, Social-Bonding, and Directing-Behaviour. Analyses demonstrate factorial equivalence across age and gender groups. Potential use and limitations of the TALE are discussed.
Gender Differential Item Functioning on a National Field-Specific Test: The Case of PhD Entrance Exam of TEFL in Iran

ERIC Educational Resources Information Center

Ahmadi, Alireza; Bazvand, Ali Darabi

2016-01-01

Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…
Medial Temporal Lobe Contributions to Cued Retrieval of Items and Contexts

PubMed Central

Hannula, Deborah E.; Libby, Laura A.; Yonelinas, Andrew P.; Ranganath, Charan

2013-01-01

Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model – namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. PMID:23466350
Development and validation of a measure of pediatric oral health-related quality of life: the POQL

PubMed Central

Huntington, Noelle L; Spetter, Dante; Jones, Judith A.; Rich, Sharon E.; Garcia, Raul I.; Spiro, Avron

2011-01-01

Objective To develop a brief measure of oral health-related quality of life in children and demonstrate its reliability and validity in a diverse population. Methods We administered the initial 20-item POQL to children (Child Self-Report) and parents (Parent Report on Child) from diverse populations in both school-based and clinic-based settings. Clinical oral health status was measured on a subset of children. We used factor analysis to determine the underlying scales and then reduced the measure to 10 items based on several considerations. Multitrait analysis on the resulting 10-item POQL was used to reaffirm the discrimination of scales and assess the measure’s internal consistency and interscale correlations. We established discriminant and convergent validity with clinical status, perceived oral health and responses on the PedsQL and determined sensitivity to change with children undergoing ECC surgical repair. Results Factor analysis returned a four-scale solution for the initial items – Physical Functioning, Role Functioning, Social Functioning and Emotional Functioning. The reduced items represented the same four scales – two each on Physical and Role and three each on Social and Emotional. Good reliability and validity were shown for the POQL as a whole and for each of the scales. Conclusions The POQL is a valid and reliable measure of oral health-related quality of life for use in pre-school and school-aged children, with high utility for both clinical assessments and large-scale population studies. PMID:21972458
DIF Trees: Using Classification Trees to Detect Differential Item Functioning

ERIC Educational Resources Information Center

Vaughn, Brandon K.; Wang, Qiu

2010-01-01

A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Development of the functional vision questionnaire for children and young people with visual impairment: the FVQ_CYP.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2013-12-01

To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Development of a Computer-Adaptive Physical Function Instrument for Social Security Administration Disability Determination

PubMed Central

Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton

2014-01-01

Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
Assessing patient report of function: content validity of the Functional Performance Inventory-Short Form (FPI-SF) in patients with chronic obstructive pulmonary disease (COPD)

PubMed Central

Leidy, Nancy Kline; Hamilton, Alan; Becker, Karin

2012-01-01

Purpose The performance of daily activities is a major challenge for people with chronic obstructive pulmonary disease (COPD). The Functional Performance Inventory (FPI) was developed based on an analytical framework of functional status and qualitative interviews with COPD patients describing these difficulties. The 65-item FPI was reduced to a 32-item short form (SF) through a systematic process of qualitative and quantitative item reduction and formatted for greater clarity and ease of use. This study examined the content validity of the reduced, reformatted form of the instrument, the FPI-SF. Patients and methods Qualitative cognitive interviews were conducted with COPD patients recruited from three geographically diverse pulmonary clinics in the United States. Interviews were designed to assess respondent interpretation of the instrument, evaluate clarity and ease of completion, and identify any new activities participants found important and difficult to perform that were not represented by the existing items. Results Twenty subjects comprised the sample; 12 (60%) were male, 14 (70%) were Caucasian, the mean age was 63.0 ± 11.3 years, 12 (60%) were retired, the mean forced expiratory volume in 1 second (FEV1) was 1.5 ± 0.5 L, and the mean percent predicted FEV1 was 48.4% ± 13.1%. Participants understood the FPI-SF as intended, including instructions, items, and response options. Two minor formatting changes were suggested to improve clarity of presentation. Participants found the content of the FPI-SF to be comprehensive, with items covering activities they felt were important and often difficult to perform. Conclusion These results, together with its development history and previously tested quantitative properties, suggest that the FPI-SF is content valid for use in clinical studies of COPD. PMID:22969295
Analysis of Nonequivalent Assessments across Different Linguistic Groups Using a Mixed Methods Approach: Understanding the Causes of Differential Item Functioning by Cognitive Interviewing

ERIC Educational Resources Information Center

Benítez, Isabel; Padilla, José-Luis

2014-01-01

Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase…
Development of a Brief Questionnaire to Assess Contraceptive Intent

PubMed Central

Raine-Bennett, Tina R; Rocca, Corinne H

2015-01-01

Objective We sought to develop and validate an instrument that can enable providers to identify young women who may be at risk of contraceptive non-adherence. Methods Item response theory based methods were used to evaluate the psychometric properties of the Contraceptive Intent Questionnaire, a 15-item self-administered questionnaire, based on theory and prior qualitative and quantitative research. The questionnaire was administered to 200 women aged 15–24 years who were initiating contraceptives. We assessed item fit to the item response model, internal consistency, internal structure validity, and differential item functioning. Results All items fit a one-dimensional model. The separation reliability coefficient was 0.73. Participants’ overall scores covered the full range of the scale (0–15), and items appropriately matched the range of participants’ contraceptive intent. Items met the criteria for internal structure validity and most items functioned similarly between groups of women. Conclusion The Contraceptive Intent Questionnaire appears to be a reliable and valid tool. Future testing is needed to assess predictive ability and clinical utility. Practice Implications The Contraceptive Intent Questionnaire may serve as a valid tool to help providers identify women who may have problems with contraceptive adherence, as well as to pinpoint areas in which counseling may be directed. PMID:26104994
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

ERIC Educational Resources Information Center

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Process-specific analysis in episodic memory retrieval using fast optical signals and hemodynamic signals in the right prefrontal cortex

NASA Astrophysics Data System (ADS)

Dong, Sunghee; Jeong, Jichai

2018-02-01

Objective. Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. Approach. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. Main results. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. Significance. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
Electronic Quality of Life Assessment Using Computer-Adaptive Testing

PubMed Central

2016-01-01

Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.

ERIC Educational Resources Information Center

Wang, Ning; Lane, Suzanne

This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…
Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

PubMed

LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

2015-04-01

Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Upper-extremity and mobility subdomains from the Patient-Reported Outcomes Measurement Information System (PROMIS) adult physical functioning item bank.

PubMed

Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar

2013-11-01

To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Improving measures of work-related physical functioning.

PubMed

McDonough, Christine M; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E; Marino, Molly E; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M; Chan, Leighton

2017-03-01

To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Newly developed questions were administered to 3532 recent SSA applicants for work disability benefits and 2025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to the existing WD-FAB, and computer-adaptive test simulations were conducted. Factor and IRT analyses supported integration of 44 new items into three existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants; identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability.

Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models

ERIC Educational Resources Information Center

Liu, Qian

2011-01-01

For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…
Effects of Average Signed Area Between Two Item Characteristic Curves and Test Purification Procedures on the DIF Detection via the Mantel-Haenszel Method

ERIC Educational Resources Information Center

Wang, Wen-Chung; Su, Ya-Hui

2004-01-01

In this study we investigated the effects of the average signed area (ASA) between the item characteristic curves of the reference and focal groups and three test purification procedures on the uniform differential item functioning (DIF) detection via the Mantel-Haenszel (M-H) method through Monte Carlo simulations. The results showed that ASA,…
Complex versus Simple Modeling for DIF Detection: When the Intraclass Correlation Coefficient (?) of the Studied Item Is Less Than the ? of the Total Score

ERIC Educational Resources Information Center

Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon

2014-01-01

Previous research has demonstrated that differential item functioning (DIF) methods that do not account for multilevel data structure could result in too frequent rejection of the null hypothesis (i.e., no DIF) when the intraclass correlation coefficient (?) of the studied item was the same as the ? of the total score. The current study extended…
Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

PubMed Central

Sharafi, Zahra

2017-01-01

Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463
Analysis of Bilingual Children’s Performance on the English and Spanish Versions of the Woodcock-Muñoz Language Survey-R (WMLS-R)

PubMed Central

Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian

2015-01-01

The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400
Memory for faces and voices varies as a function of sex and expressed emotion.

PubMed

S Cortes, Diana; Laukka, Petri; Lindahl, Christina; Fischer, Håkan

2017-01-01

We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection ("remember" hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy.
Memory for faces and voices varies as a function of sex and expressed emotion

PubMed Central

Laukka, Petri; Lindahl, Christina; Fischer, Håkan

2017-01-01

We investigated how memory for faces and voices (presented separately and in combination) varies as a function of sex and emotional expression (anger, disgust, fear, happiness, sadness, and neutral). At encoding, participants judged the expressed emotion of items in forced-choice tasks, followed by incidental Remember/Know recognition tasks. Results from 600 participants showed that accuracy (hits minus false alarms) was consistently higher for neutral compared to emotional items, whereas accuracy for specific emotions varied across the presentation modalities (i.e., faces, voices, and face-voice combinations). For the subjective sense of recollection (“remember” hits), neutral items received the highest hit rates only for faces, whereas for voices and face-voice combinations anger and fear expressions instead received the highest recollection rates. We also observed better accuracy for items by female expressers, and own-sex bias where female participants displayed memory advantage for female faces and face-voice combinations. Results further suggest that own-sex bias can be explained by recollection, rather than familiarity, rates. Overall, results show that memory for faces and voices may be influenced by the expressions that they carry, as well as by the sex of both items and participants. Emotion expressions may also enhance the subjective sense of recollection without enhancing memory accuracy. PMID:28570691
Development of an item bank and computer adaptive test for role functioning.

PubMed

Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B

2012-11-01

Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
Medial temporal lobe contributions to cued retrieval of items and contexts.

PubMed

Hannula, Deborah E; Libby, Laura A; Yonelinas, Andrew P; Ranganath, Charan

2013-10-01

Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model-namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. Copyright © 2013 Elsevier Ltd. All rights reserved.
Real and Artificial Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Andrich, David; Hagquist, Curt

2015-01-01

Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Psychometric Properties and Performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Depression Short Forms in Ethnically Diverse Groups

PubMed Central

Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon

2017-01-01

Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. Methods DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. Results Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. Conclusions This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms. PMID:28553573
Item Analyses of Memory Differences

PubMed Central

Salthouse, Timothy A.

2017-01-01

Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
Developing the Communicative Participation Item Bank: Rasch Analysis Results From a Spasmodic Dysphonia Sample

PubMed Central

Baylor, Carolyn R.; Yorkston, Kathryn M.; Eadie, Tanya L.; Miller, Robert M.; Amtmann, Dagmar

2011-01-01

Purpose The purpose of this study was to conduct the initial psychometric analyses of the Communicative Participation Item Bank—a new self-report instrument designed to measure the extent to which communication disorders interfere with communicative participation. This item bank is intended for community-dwelling adults across a range of communication disorders. Method A set of 141 candidate items was administered to 208 adults with spasmodic dysphonia. Participants rated the extent to which their condition interfered with participation in various speaking communication situations. Questionnaires were administered online or in a paper version per participant preference. Participants also completed the Voice Handicap Index (B. H. Jacobson et al., 1997) and a demographic questionnaire. Rasch analyses were conducted using Winsteps software (J. M. Linacre, 1991). Results The results show that items functioned better when the 5-category response format was recoded to a 4-category format. After removing 8 items that did not fit the Rasch model, the remaining 133 items demonstrated strong evidence of sufficient unidimensionality, with the model accounting for 89.3% of variance. Item location values ranged from −2.73 to 2.20 logits. Conclusions Preliminary Rasch analyses of the Communicative Participation Item Bank show strong psychometric properties. Further testing in populations with other communication disorders is needed. PMID:19717652
Reliability of the Adult Myopathy Assessment Tool in Individuals with Myositis

PubMed Central

Harris-Love, Michael O.; Joe, Galen; Davenport, Todd E.; Koziol, Deloris; Rose, Kristen Abbett; Shrader, Joseph A.; Vasconcelos, Olavo M.; McElroy, Beverly; Dalakas, Marinos C.

2015-01-01

Objective The Adult Myopathy Assessment Tool (AMAT) is a 13-item performance-based battery developed to assess functional status and muscle endurance. The purpose of this study was to determine the intrarater and interrater reliability of the AMAT in adults with myosits. Methods Nineteen raters (13 physical therapists and 6 physicians) scored videotaped recordings of patients with myositis performing the AMAT for a total of 114 tests and 1,482 item observations per session. Raters rescored the AMAT test and item observations during a follow up session (19 ±6 days between scoring sessions). All raters completed a single, self-directed, electronic training module prior to the initial scoring session. Results Intrarater and interrater reliability correlation coefficients were .94 or greater for the AMAT Functional Subscale, Endurance Subscale, and Total score (all p < 0.02 for Ho:ρ ≤ 0.75). All AMAT items had satisfactory intrarater agreement (Kappa statistics with Fleiss-Cohen weights, Kw = .57-1.00). Interrater agreement was acceptable for each AMAT item (K = .56-.89) except the sit up (K = .16). The standard error of measurement and 95% confidence interval range for the AMAT Total scores did not exceed 2 points across all observations (AMAT Total score range = 0-45). Conclusions The AMAT is a reliable, domain-specific assessment of functional status and muscle endurance for adult subjects with myositis. Results of this study suggest that physicians and physical therapists may reliably score the AMAT following a single training session. The AMAT Functional Subscale, Endurance Subscale, and Total score exhibit interrater and intrarater reliability suitable for clinical and research use. PMID:25201624
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Development of Rasch-based item banks for the assessment of work performance in patients with musculoskeletal diseases.

PubMed

Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A

2013-12-01

This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.
Initial constructs for patient-centered outcome measures to evaluate brain-computer interfaces

PubMed Central

Andresen, Elena M.; Fried-Oken, Melanie; Peters, Betts; Patrick, Donald L.

2016-01-01

Purpose The authors describe preliminary work toward the creation of patient-centered outcome (PCO) measures to evaluate brain-computer interface (BCI) as an assistive technology for individuals with severe speech and physical impairments (SSPI). Method In Phase 1, 591 items from 15 existing measures were mapped to the International Classification of Functioning, Disability and Health (ICF). In Phase 2, qualitative interviews were conducted with eight people with SSPI and seven caregivers. Resulting text data were coded in an iterative analysis. Results Most items (79%) mapped to the ICF environmental domain; over half (53%) mapped to more than one domain. The ICF framework was well suited for mapping items related to body functions and structures, but less so for items in other areas, including personal factors. Two constructs emerged from qualitative data: Quality of Life (QOL) and Assistive Technology. Component domains and themes were identified for each. Conclusions Preliminary constructs, domains, and themes were generated for future PCO measures relevant to BCI. Existing instruments are sufficient for initial items but do not adequately match the values of people with SSPI and their caregivers. Field methods for interviewing people with SSPI were successful, and support the inclusion of these individuals in PCO research. PMID:25806719
Anorexia/cachexia-related quality of life for children with cancer.

PubMed

Lai, Jin-Shei; Cella, David; Peterman, Amy; Barocas, Joshua; Goldman, Stewart

2005-10-01

Anorexia is a common symptom in patients with cancer, which can lead to poor tolerance of treatment and can contribute to cachexia in extreme cases. Children with advanced-stage cancer are especially vulnerable to malnutrition resulting from anorexia and cachexia. Currently, there are no instruments that measure common concerns specifically associated with anorexia and cachexia in children with cancer. The purpose of the current article was to test the psychometric properties of a newly developed pediatric Functional Assessment of Anorexia and Cachexia Therapy (peds-FAACT) for children with cancer. Ninety-six patients (ages 7-17 yrs) receiving cancer treatment and their parents were asked to complete the 12-item peds-FAACT. The authors implemented both classical test theory and item response theory to evaluate the agreement between parents and patients, internal consistency and unidimensionality of the scale, and stability of items across subgroups. As a result, a patient-reported six-item scale was recommended as the core measure for all pediatric patients with cancer and four additional peripheral items were recommended for adolescent patients. The peds-FAACT demonstrated good psychometric properties, differentiated patients with different functional performance status, and was determined to be a useful tool for future clinical trials.
Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

ERIC Educational Resources Information Center

Penfield, Randall D.; Alvarez, Karina; Lee, Okhee

2009-01-01

The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…
Understanding Rasch Measurement: Rasch Techniques for Detecting Bias in Performance Assessments: An Example Comparing the Performance of Native and Non-native Speakers on a Test of Academic English.

ERIC Educational Resources Information Center

Elder, Catherine; McNamara, Tim; Congdon, Peter

2003-01-01

Used Rasch analytic procedures to study item bias or differential item functioning in both dichotomous and scalar items on a test of English for academic purposes. Results for 139 college students on a pilot English language test model the approach and illustrate the measurement challenges posed by a diagnostic instrument to measure English…

Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS).

PubMed

Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E

2008-01-01

The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Further differentiating item and order information in semantic memory: students' recall of words from the "CU Fight Song", Harry Potter book titles, and Scooby Doo theme song.

PubMed

Overstreet, Michael F; Healy, Alice F; Neath, Ian

2017-01-01

University of Colorado (CU) students were tested for both order and item information in their semantic memory for the "CU Fight Song". Following an earlier study by Overstreet and Healy [(2011). Item and order information in semantic memory: Students' retention of the "CU fight song" lyrics. Memory & Cognition, 39, 251-259. doi: 10.3758/s13421-010-0018-3 ], a symmetrical bow-shaped serial position function (with both primacy and recency advantages) was found for reconstructing the order of the nine lines in the song, whereas a function with no primacy advantage was found for recalling a missing word from each line. This difference between order and item information was found even though students filled in missing words without any alternatives provided and missing words came from the beginning, middle, or end of each line. Similar results were found for CU students' recall of the sequence of Harry Potter book titles and the lyrics of the Scooby Doo theme song. These findings strengthen the claim that the pronounced serial position function in semantic memory occurs largely because of the retention of order, rather than item, information.
Developing an item bank and short forms that assess the impact of asthma on quality of life.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

2014-02-01

The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Development of the Primary Care Quality-Homeless (PCQ-H) Instrument: A Practical Survey of Patients' Experiences in Primary Care

PubMed Central

Kertesz, Stefan. G.; Pollio, David E.; Jones, Richard N.; Steward, Jocelyn; Stringfellow, Erin J.; Gordon, Adam J.; Johnson, Nancy K.; Kim, Theresa A.; Granstaff, Unita; Austin, Erika L.; Young, Alexander S.; Golden, Joya; Davis, Lori L.; Roth, David L.; Holt, Cheryl L.

2015-01-01

Background Homeless patients face unique challenges in obtaining primary care responsive to their needs and context. Patient experience questionnaires could permit assessment of patient-centered medical homes for this population, but standard instruments may not reflect homeless patients' priorities and concerns. Objectives This report describes (a) the content and psychometric properties of a new primary care questionnaire for homeless patients and (b) the methods utilized in its development. Methods Starting with quality-related constructs from the Institute of Medicine, we identified relevant themes by interviewing homeless patients and experts in their care. A multidisciplinary team drafted a preliminary set of 78 items. This was administered to homeless-experienced clients (n=563) across 3 VA facilities and 1 non-VA Health Care for the Homeless Program. Using Item Response Theory, we examined Test Information Function curves to eliminate less informative items and devise plausibly distinct subscales. Results The resulting 33-item instrument (Primary Care Quality-Homeless, PCQ-H) has four subscales: Patient-Clinician Relationship (15 items), Cooperation among Clinicians (3 items), Access/Coordination (11 items) and Homeless-Specific Needs (4 items). Evidence for divergent and convergent validity is provided. Test Information Function (TIF) graphs showed adequate informational value to permit inferences about groups for 3 subscales (Relationship, Cooperation and Access/Coordination). The 3-item Cooperation subscale had lower informational value (TIF<5) but had good internal consistency (alpha=0.75) and patients frequently reported problems in this aspect of care. Conclusions Systematic application of qualitative and quantitative methods supported the development of a brief patient-reported questionnaire focused on the primary care of homeless patients and offers guidance for future population-specific instrument development. PMID:25023918
Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages

PubMed Central

Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J

2016-01-01

Introduction The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Methods Translation and cross-cultural adaptation has been carried out following the forward–backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. Results The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option ‘not applicable’ was added to two items of the ASAS HI to improve appropriateness. Discussion This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures. PMID:27752358
Factorial and Item-Level Invariance of a Principal Perspectives Survey: German and U.S. Principals.

PubMed

Wang, Chuang; Hancock, Dawson R; Muller, Ulrich

This study examined the factorial and item-level invariance of a survey of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal with a sample of US principals and another sample of German principals. Confirmatory factor analysis (CFA) and differential item functioning (DIF) analysis were employed at the test and item level, respectively. A single group CFA was conducted first, and the model was found to fit the data collected. The factorial invariance between the German and the US principals was tested through three steps: (a) configural invariance; (b) measurement invariance; and (c) structural invariance. The results suggest that the survey is a viable measure of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal because principals from two different cultures shared a similar pattern on all three constructs. The DIF analysis further revealed that 22 out of the 28 items functioned similarly between German and US principals.
Item Response Theory Applied to Factors Affecting the Patient Journey Towards Hearing Rehabilitation

PubMed Central

Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien

2016-01-01

To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428
Examining the Measurement Precision and Invariance of the Revised Get Ready to Read!

PubMed Central

Farrington, Amber L.; Lonigan, Christopher J.

2016-01-01

Children's emergent literacy skills are highly predictive of later reading abilities. To determine which children have weaker emergent literacy skills and are in need of intervention, it is necessary to assess emergent literacy skills accurately and reliably. In this study, 1,351 children were administered the Revised Get Ready to Read! (GRTR-R), and an item response theory analysis was used to evaluate the item-level reliability of the measure. Differential item functioning (DIF) analyses were conducted to examine whether items function similarly between subpopulations of children. The GRTR-R had acceptable reliability for children whose ability level was just below the mean. DIF for a small number of items was present for only two comparisons—children who were older versus younger and children who were White versus African American. These results demonstrate that the GRTR-R has acceptable reliability and limited DIF, enabling the screener to identify those at risk for developing reading problems. PMID:23851136
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?

ERIC Educational Resources Information Center

Royal, Kenneth D.; Stockdale, Myrah R.

2015-01-01

The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
IRTs of the ABCs: Children's Letter Name Acquisition

ERIC Educational Resources Information Center

Phillips, Beth M.; Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.

2012-01-01

We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns=1074 and 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses…
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

PubMed

Shou, Yiyun; Sellbom, Martin; Xu, Jing

2018-05-01

There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Depression symptoms across cultures: an IRT analysis of standard depression symptoms using data from eight countries.

PubMed

Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J

2016-07-01

Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

PubMed

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Measuring the ICF components of impairment, activity limitation and participation restriction: an item analysis using classical test theory and item response theory

PubMed Central

Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie

2009-01-01

Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
The mirror therapy program enhances upper-limb motor recovery and motor function in acute stroke patients.

PubMed

Lee, Myung Mo; Cho, Hwi-Young; Song, Chang Ho

2012-08-01

The purpose of this study was to evaluate the effects of the mirror therapy program on upper-limb motor recovery and motor function in patients with acute stroke. Twenty-six patients who had an acute stroke within 6 mos of study commencement were assigned to the experimental group (n = 13) or the control group (n = 13). Both experimental and control group members participated in a standard rehabilitation program, but only the experimental group members additionally participated in mirror therapy program, for 25 mins twice a day, five times a week, for 4 wks. The Fugl-Meyer Assessment, Brunnstrom motor recovery stage, and Manual Function Test were used to assess changes in upper-limb motor recovery and motor function after intervention. In upper-limb motor recovery, the scores of Fugl-Meyer Assessment (by shoulder/elbow/forearm items, 9.54 vs. 4.61; wrist items, 2.76 vs. 1.07; hand items, 4.43 vs. 1.46, respectively) and Brunnstrom stages for upper limb and hand (by 1.77 vs. 0.69 and 1.92 vs. 0.50, respectively) were improved more in the experimental group than in the control group (P < 0.05). In upper-limb motor function, the Manual Function Test score (by shoulder item, 5.00 vs. 2.23; hand item, 5.07 vs. 0.46, respectively) was significantly increased in the experimental group compared with the control group (P < 0.01). No significant differences were found between the groups for the coordination items in Fugl-Meyer Assessment. This study confirms that mirror therapy program is an effective intervention for upper-limb motor recovery and motor function improvement in acute stroke patients. Additional research on mirror therapy program components, intensity, application time, and duration could result in it being used as a standardized form of hand rehabilitation in clinics and homes.
Development of a computer-adaptive physical function instrument for Social Security Administration disability determination.

PubMed

Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton

2013-09-01

To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Can you ask? We just did! Assessing sexual function and concerns in patients presenting for initial gynecologic oncology consultation

PubMed Central

Kennedy, Vanessa; Abramsohn, Emily; Makelarski, Jennifer; Barber, Rachel; Wroblewski, Kristen; Tenney, Meaghan; Lee, Nita Karnik; Yamada, S. Diane; Lindau, Stacy Tessler

2015-01-01

Objectives To describe patterns of response to, and assess sexual function and activity elicited by, a self-administered assessment incorporated into a new patient intake form for gynecologic oncology consultation. Methods A cross-sectional study of patients presenting to a single urban academic medical center between January 2010 and September 2012. New patients completed a self-administered intake form, including six brief sexual activity and function items. These items, along with abstracted medical record data, were descriptively analyzed. Logistic regression was used to assess the association between sexual activity and function and disease status, adjusting for age. Results Median age was 50 years (range 18–91, N = 499); more than half had a final diagnosis of cancer. Most patients completed all sex-related items on the intake form; 98% answered at least one. Among patients who were sexually active in the prior 12 months (57% with cancer, 64% with benign disease), 52% indicated on the intake form having, during that period, a sexual problem lasting several months or more. Of these, 15% had physician documentation of the sexual problem. Eighteen women were referred for care. Providers reported no patient complaints about the inclusion of sexual items on the intake form. Conclusions Nearly all new patients presenting for gynecologic oncology consultation answered self-administered items to assess sexual activity and function. Further study is needed to determine the role of pretreatment identification of sexual function concerns in improving sexual outcomes associated with cancer diagnosis and treatment. PMID:25582823
Dissociable Temporo-Parietal Memory Networks Revealed by Functional Connectivity during Episodic Retrieval

PubMed Central

Hirose, Satoshi; Kimura, Hiroko M.; Jimura, Koji; Kunimatsu, Akira; Abe, Osamu; Ohtomo, Kuni; Miyashita, Yasushi; Konishi, Seiki

2013-01-01

Episodic memory retrieval most often recruits multiple separate processes that are thought to involve different temporal regions. Previous studies suggest dissociable regions in the left lateral parietal cortex that are associated with the retrieval processes. Moreover, studies using resting-state functional connectivity (RSFC) have provided evidence for the temporo-parietal memory networks that may support the retrieval processes. In this functional MRI study, we tested functional significance of the memory networks by examining functional connectivity of brain activity during episodic retrieval in the temporal and parietal regions of the memory networks. Recency judgments, judgments of the temporal order of past events, can be achieved by at least two retrieval processes, relational and item-based. Neuroimaging results revealed several temporal and parietal activations associated with relational/item-based recency judgments. Significant RSFC was observed between one parahippocampal region and one left lateral parietal region associated with relational recency judgments, and between four lateral temporal regions and another left lateral parietal region associated with item-based recency judgments. Functional connectivity during task was found to be significant between the parahippocampal region and the parietal region in the RSFC network associated with relational recency judgments. However, out of the four tempo-parietal RSFC networks associated with item-based recency judgments, only one of them (between the left posterior lateral temporal region and the left lateral parietal region) showed significant functional connectivity during task. These results highlight the contrasting roles of the parahippocampal and the lateral temporal regions in recency judgments, and suggest that only a part of the tempo-parietal RSFC networks are recruited to support particular retrieval processes. PMID:24009657
Measuring self-esteem after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Self-esteem item bank and short form

PubMed Central

Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.

2015-01-01

Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972

Measuring resilience after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Resilience item bank and short form

PubMed Central

Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.

2015-01-01

Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
A confirmative clinimetric analysis of the 36-item Family Assessment Device.

PubMed

Timmerby, Nina; Cosci, Fiammetta; Watson, Maggie; Csillag, Claudio; Schmitt, Florence; Steck, Barbara; Bech, Per; Thastum, Mikael

2018-02-07

The Family Assessment Device (FAD) is a 60-item questionnaire widely used to evaluate self-reported family functioning. However, the factor structure as well as the number of items has been questioned. A shorter and more user-friendly version of the original FAD-scale, the 36-item FAD, has therefore previously been proposed, based on findings in a nonclinical population of adults. We aimed in this study to evaluate the brief 36-item version of the FAD in a clinical population. Data from a European multinational study, examining factors associated with levels of family functioning in adult cancer patients' families, were used. Both healthy and ill parents completed the 60-item version FAD. The psychometric analyses conducted were Principal Component Analysis and Mokken-analysis. A total of 564 participants were included. Based on the psychometric analysis we confirmed that the 36-item version of the FAD has robust psychometric properties and can be used in clinical populations. The present analysis confirmed that the 36-item version of the FAD (18 items assessing 'well-being' and 18 items assessing 'dysfunctional' family function) is a brief scale where the summed total score is a valid measure of the dimensions of family functioning. This shorter version of the FAD is, in accordance with the concept of 'measurement-based care', an easy to use scale that could be considered when the aim is to evaluate self-reported family functioning.
Gender and Ethnicity Differences on the Abridged Big Five Circumplex (AB5C) of Personality Traits: A Differential Item Functioning Analysis

ERIC Educational Resources Information Center

Mitchelson, Jacqueline K.; Wicher, Eliza W.; LeBreton, James M.; Craig, S. Bartholomew

2009-01-01

The current study evaluates the measurement precision of the Abridged Big Five Circumplex (AB5C) of personality traits by identifying those items that demonstrate differential item functioning by gender and ethnicity. Differential item functioning is found in 33 of 45 (73%) of the AB5C scales, across gender and ethnic groups (Caucasian vs. African…
Applying a Mixed Methods Framework to Differential Item Function Analyses

ERIC Educational Resources Information Center

Hitchcock, John H.; Johanson, George A.

2015-01-01

Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
Effect of Differential Item Functioning on Test Equating

ERIC Educational Resources Information Center

Kabasakal, Kübra Atalay; Kelecioglu, Hülya

2015-01-01

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Ramsay-Curve Differential Item Functioning

ERIC Educational Resources Information Center

Woods, Carol M.

2011-01-01

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Monkey Visual Short-Term Memory Directly Compared to Humans

PubMed Central

Elmore, L. Caitlin; Wright, Anthony A.

2015-01-01

Two adult rhesus monkeys were trained to detect which item in an array of memory items had changed using the same stimuli, viewing times, and delays as used with humans. Although the monkeys were extensively trained, they were less accurate than humans with the same array sizes (2, 4, & 6 items), with both stimulus types (colored squares, clip art), and showed calculated memory capacities of about one item (or less). Nevertheless, the memory results from both monkeys and humans for both stimulus types were well characterized by the inverse power-law of display size. This characterization provides a simple and straightforward summary of a fundamental process of visual short-term memory (how VSTM declines with memory load) that emphasizes species similarities based upon similar functional relationships. By more closely matching of monkey testing parameters to those of humans, the similar functional relationships strengthen the evidence suggesting similar processes underlying monkey and human VSTM. PMID:25706544
Health- and vision-related quality of life in intellectually disabled children.

PubMed

Cui, Yu; Stapleton, Fiona; Suttle, Catherine; Bundy, Anita

2010-01-01

To investigate the psychometric properties of instruments for the assessment of self-reported functional vision performance and health-related quality of life in children with intellectual disabilities (IDs). Two instruments [Autoquestionnaire Enfant Image (AUQUEI), LV Prasad-Functional Vision Questionnaire (LVP-FVQ)] designed for the assessment of functional vision and health-related quality of life were adapted and administered to 168 school children with ID, aged 8 to 18 years. Rasch analysis was used to determine the appropriateness of the rating scales of these instruments and to identify any redundant items. Redundant items were excluded based on descriptive statistics and Rasch analysis, leaving 17 of 23 items in the revised AUQUEI and 16 of 22 in the LVP-FVQ. The AUQUEI items showed disordered thresholds on the rating scale. A modified step calibration (collapsed from four categories to three categories) resulted in ordered response thresholds for all items. The adjusted instrument produced an overall fit to the model (mean item infit = 1.06, SD = 0.32; mean item outfit = 1.11, SD = 0.35), indicating good construct validity. After Rasch analysis, the AUQUEI showed good content validity (person separation = 2.18; item reliability = 0.99; Cronbach alpha = 0.89). Increased similarity of person and item means and SDs on the logit scale after modification would indicate that the instrument was more applicable to the target population in its modified form. In contrast, the LVP-FVQ had a low person separation (1.35), suggesting that a more appropriate instrument is needed for assessment of vision-related quality of life in children with ID. The psychometric properties of two instruments were explored using Rasch analysis. By rescaling and reduction of items, the instruments were modified for use in a population of children with at least mild to moderate ID. However, an alternative instrument is needed for the assessment of vision-related quality of life in intellectually disabled children with normal vision or mild visual abnormalities.
Functions of the Renal Nerves.

ERIC Educational Resources Information Center

Koepke, John P.; DiBona, Gerald F.

1985-01-01

Discusses renal neuroanatomy, renal vasculature, renal tubules, renin secretion, renorenal reflexes, and hypertension as related to renal nerve functions. Indicates that high intensitites of renal nerve stimulation have produced alterations in several renal functions. (A chart with various stimulations and resultant renal functions and 10-item,…
Maintenance of item and order information in verbal working memory.

PubMed

Camos, Valérie; Lagner, Prune; Loaiza, Vanessa M

2017-09-01

Although verbal recall of item and order information is well-researched in short-term memory paradigms, there is relatively little research concerning item and order recall from working memory. The following study examined whether manipulating the opportunity for attentional refreshing and articulatory rehearsal in a complex span task differently affected the recall of item- and order-specific information of the memoranda. Five experiments varied the opportunity for articulatory rehearsal and attentional refreshing in a complex span task, but the type of recall was manipulated between experiments (item and order, order only, and item only recall). The results showed that impairing attentional refreshing and articulatory rehearsal similarly affected recall regardless of whether the scoring procedure (Experiments 1 and 4) or recall requirements (Experiments 2, 3, and 5) reflected item- or order-specific recall. This implies that both mechanisms sustain the maintenance of item and order information, and suggests that the common cumulative functioning of these two mechanisms to maintain items could be at the root of order maintenance.
Computer adaptive test approach to the assessment of children and youth with brachial plexus birth palsy.

PubMed

Mulcahey, M J; Merenda, Lisa; Tian, Feng; Kozin, Scott; James, Michelle; Gogola, Gloria; Ni, Pengsheng

2013-01-01

This study examined the psychometric properties of item pools relevant to upper-extremity function and activity performance and evaluated simulated 5-, 10-, and 15-item computer adaptive tests (CATs). In a multicenter, cross-sectional study of 200 children and youth with brachial plexus birth palsy (BPBP), parents responded to upper-extremity (n = 52) and activity (n = 34) items using a 5-point response scale. We used confirmatory and exploratory factor analysis, ordinal logistic regression, item maps, and standard errors to evaluate the psychometric properties of the item banks. Validity was evaluated using analysis of variance and Pearson correlation coefficients. Results show that the two item pools have acceptable model fit, scaled well for children and youth with BPBP, and had good validity, content range, and precision. Simulated CATs performed comparably to the full item banks, suggesting that a reduced number of items provide similar information to the entire set of items. Copyright © 2013 by the American Occupational Therapy Association, Inc.
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.

PubMed

Polak, Marike; de Rooij, Mark; Heiser, Willem J

2012-09-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework.

PubMed

Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A

2006-11-01

To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
Investigating diagnostic bias in autism spectrum conditions: An item response theory analysis of sex bias in the AQ-10.

PubMed

Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie

2017-05-01

Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Development of an instrument to measure behavioral health function for work disability: item pool construction and factor analysis.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Jette, Alan M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K

2013-09-01

To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Cross-sectional. Community. Item pools of behavioral health functioning were developed, refined, and field tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working because of mental or both mental and physical conditions. None. Social Security Administration Behavioral Health (SSA-BH) measurement instrument. Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the 4 scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these 4 distinct scales of function. This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work-related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work-related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The MIMIC Model as a Tool for Differential Bundle Functioning Detection

ERIC Educational Resources Information Center

Finch, W. Holmes

2012-01-01

Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…
Autobiographical memory functions in young Japanese men and women.

PubMed

Maki, Yoichi; Kawasaki, Yayoi; Demiray, Burcu; Janssen, Steve M J

2015-01-01

The present study examined whether the three major functions of autobiographical memory observed in Western societies (i.e., directing-behaviour, social-bonding and self-continuity) also exist in an East Asian society. Two self-report measures were used to assess the autobiographical memory functions of Japanese men and women. Japanese young adults (N = 451, ages 17-28 years) first completed the original Thinking About Life Experiences (TALE) Questionnaire. They subsequently received three TALE items that represented memory functions and attempted to recall a specific instance of memory recall for each item. Confirmatory factor analyses on the TALE showed that the three functions were replicated in the current sample. However, Japanese participants reported lower levels of all three functions than American participants in a previous study. We also explored whether there was an effect of gender in this Japanese sample. Women reported higher levels of the self-continuity and social-bonding functions than men. Finally, participants recalled more specific instances of memory recall for the TALE items that had received higher ratings on the TALE, suggesting that the findings on the first measure were supported by the second measure. Results are discussed in relation to the functional approach to autobiographical memory in a cross-cultural context.
Ventromedial prefrontal cortex, adding value to autobiographical memories

PubMed Central

Lin, Wen-Jing; Horner, Aidan J.; Burgess, Neil

2016-01-01

The medial prefrontal cortex (mPFC) has been consistently implicated in autobiographical memory recall and decision making. Its function in decision making tasks is believed to relate to value representation, but its function in autobiographical memory recall is not yet clear. We hypothesised that the mPFC represents the subjective value of elements during autobiographical memory retrieval. Using functional magnetic resonance imaging during an autobiographical memory recall task, we found that the blood oxygen level dependent (BOLD) signal in ventromedial prefrontal cortex (vmPFC) was parametrically modulated by the affective values of items in participants’ memories when they were recalling and evaluating these items. An unrelated modulation by the participant’s familiarity with the items was also observed. During retrieval of the event, the BOLD signal in the same region was modulated by the personal significance and emotional intensity of the memory, which was correlated with the values of the items within them. These results support the idea that vmPFC processes self-relevant information, and suggest that it is involved in representing the personal emotional values of the elements comprising autobiographical memories. PMID:27338616
Ventromedial prefrontal cortex, adding value to autobiographical memories.

PubMed

Lin, Wen-Jing; Horner, Aidan J; Burgess, Neil

2016-06-24

The medial prefrontal cortex (mPFC) has been consistently implicated in autobiographical memory recall and decision making. Its function in decision making tasks is believed to relate to value representation, but its function in autobiographical memory recall is not yet clear. We hypothesised that the mPFC represents the subjective value of elements during autobiographical memory retrieval. Using functional magnetic resonance imaging during an autobiographical memory recall task, we found that the blood oxygen level dependent (BOLD) signal in ventromedial prefrontal cortex (vmPFC) was parametrically modulated by the affective values of items in participants' memories when they were recalling and evaluating these items. An unrelated modulation by the participant's familiarity with the items was also observed. During retrieval of the event, the BOLD signal in the same region was modulated by the personal significance and emotional intensity of the memory, which was correlated with the values of the items within them. These results support the idea that vmPFC processes self-relevant information, and suggest that it is involved in representing the personal emotional values of the elements comprising autobiographical memories.
The item level psychometrics of the behaviour rating inventory of executive function-adult (BRIEF-A) in a TBI sample.

PubMed

Waid-Ebbs, J Kay; Wen, Pey-Shan; Heaton, Shelley C; Donovan, Neila J; Velozo, Craig

2012-01-01

To determine whether the psychometrics of the BRIEF-A are adequate for individuals diagnosed with TBI. A prospective observational study in which the BRIEF-A was collected as part of a larger study. Informant ratings of the 75-item BRIEF-A on 89 individuals diagnosed with TBI were examined to determine items level psychometrics for each of the two BRIEF-A indexes: Behaviour Rating Index (BRI) and Metacognitive Index (MI). Patients were either outpatients or at least 1 year post-injury. Each index measured a latent trait, separating individuals into five-to-six ability levels and demonstrated good reliability (0.94 and 0.96). Four items were identified that did not meet the infit criteria. The results provide support for the use of the BRIEF-A as a supplemental assessment of executive function in TBI populations. However, further validation is needed with other measures of executive function. Recommendations include use of the index scores over the Global Executive Composite score and use of the difficulty hierarchy for setting therapy goals.

Length of stay of stroke rehabilitation inpatients: prediction through the functional independence measure.

PubMed

Franchignoni, F; Tesio, L; Martino, M T; Benevolo, E; Castagna, M

1998-01-01

A model for prediction of length of stay (LOS, in days) of stroke rehabilitation inpatients was developed, based on patients' age (years) and function at admission (scored on the Functional Independence Measure, FIMSM). One hundred and twenty-nine cases, consecutively admitted to three free-standing rehabilitation centres in Italy, were analyzed. A multiple linear regression using forward stepwise selection procedure was adopted. Median admission and discharge scores were: 57 and 75 for the total FIM score, 29 and 48 for the 13-item motor FIM subscore, 29 and 30 for the 5-item cognitive FIM subscore (potential range: 18-126, 13-91, 5-35, respectively). Median LOS was 44 days (interquartile range 30-62). The logLOS predictive model included three FIM items ("toilet transfer", TTr; "social interaction"; "expression") and patient's age (R2 = 0.48). TTr alone explained 31.3% of the variance of logLOS. These results are consistent with previous American studies, showing that FIM scores at admission are strong predictors of patients' LOS, with the transfer items having the greatest predictive power.
Adolescent Depression: Differential Symptom Presentations in Deaf and Hard-of-Hearing Youth Using the Patient Health Questionnaire-9.

PubMed

Bozzay, Melanie L; O'Leary, Kimberly N; De Nadai, Alessandro S; Gryglewicz, Kim; Romero, Gabriela; Karver, Marc S

2017-04-01

The present study examined differences in symptom presentation in screening for pediatric depression via evaluation of the Patient Health Questionnaire-9 (PHQ-9). In particular, we examined whether PHQ-9 items function differentially among deaf and hard-of-hearing (DHH; n = 75) and hearing (n = 75) youth based on participants recruited from crisis assessment services. Multiple indicators multiple causes models were used to examine whether items of the PHQ-9 functioned differently between groups as well as whether there were group differences in the mean severity of depressive symptoms. Results indicate that DHH youth were more likely to endorse psychosomatic items, and less likely to endorse an affective item. These findings indicate that the PHQ-9 functions differently when used with DHH youth. Implications of these findings are discussed, including both for future work with the PHQ-9 and with regard to the conceptualization of depression across hearing groups. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similarity-based distortion of visual short-term memory is due to perceptual averaging.

PubMed

Dubé, Chad; Zhou, Feng; Kahana, Michael J; Sekuler, Robert

2014-03-01

A task-irrelevant stimulus can distort recall from visual short-term memory (VSTM). Specifically, reproduction of a task-relevant memory item is biased in the direction of the irrelevant memory item (Huang & Sekuler, 2010a). The present study addresses the hypothesis that such effects reflect the influence of neural averaging under conditions of uncertainty about the contents of VSTM (Alvarez, 2011; Ball & Sekuler, 1980). We manipulated subjects' attention to relevant and irrelevant study items whose similarity relationships were held constant, while varying how similar the study items were to a subsequent recognition probe. On each trial, subjects were shown one or two Gabor patches, followed by the probe; their task was to indicate whether the probe matched one of the study items. A brief cue told subjects which Gabor, first or second, would serve as that trial's target item. Critically, this cue appeared either before, between, or after the study items. A distributional analysis of the resulting mnemometric functions showed an inflation in probability density in the region spanning the spatial frequency of the average of the two memory items. This effect, due to an elevation in false alarms to probes matching the perceptual average, was diminished when cues were presented before both study items. These results suggest that (a) perceptual averages are computed obligatorily and (b) perceptual averages are relied upon to a greater extent when item representations are weakened. Implications of these results for theories of VSTM are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.
RhinAsthma patient perspective: A Rasch validation study.

PubMed

Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara

2018-02-01

In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.
Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

PubMed

Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

2017-01-01

The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

PubMed Central

Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

2014-01-01

Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
[Study of functional rating scale for amyotrophic lateral sclerosis: revised ALSFRS(ALSFRS-R) Japanese version].

PubMed

Ohashi, Y; Tashiro, K; Itoyama, Y; Nakano, I; Sobue, G; Nakamura, S; Sumino, S; Yanagisawa, N

2001-04-01

Amyotrophic lateral sclerosis(ALS) is progressive, degenerative, fatal disease of the motor neuron. No efficacious therapy is available to slow the progressive loss of function, but several new approaches including neurotrophic factors, antioxidants and glutamate antagonists, are currently being evaluated as potential therapies. Mortality, and/or time to tracheostomy, muscle strength and pulmonary function are used as primary endpoints in clinical trials for treatment of ALS. The effect of new therapies on the quality of patients' lives are also important, so we sought to develop a rating scale to measure it. The revised ALS Functional Rating Scale(ALSFRS-R), which has addition of items to ALSFRS to enhance the ability to assess respiratory symptoms, is an assessment determining the degree of impairment in ALS patients' abilities to function independently in activities of daily living. It consists of 12 items to evaluate bulbar function, motor function and respiratory function and each item is scored from 0(unable) to 4(normal). We translated the English score into Japanese one with minor modification considering the inter cultural difference. And we examined reliability of the translated scale. As a measure of reliability, the intraclass correlation coefficient(ICC) was evaluated for total score and the Kappa coefficient proposed by Cohen and Kraemer was calculated for each item. Moreover, we examined sensitivity to clinical change over time and carried out the factor analysis to analyze the factorial structure. The subjects were 27 ALS patients and each was scored twice for reliability or three times for sensitivity by 2 to 5 neurologists and if possible, nurses. The ICC for total score was 0.97(95% C. I.; 0.94-0.98). Extension of the Kappa coefficients were 0.48 to 1.00 for inter-rater reliability and the averaged Kappa coefficients were 0.63 to 1.00 for intra rater reliability, respectively. Concerning the factorial structure, the contribution of the first factor(the first principal component) were 53.5% principal factor solution. The factor loadings of items were 0.52-0.91 except "salivation" and this factor almost equal to the simple sum of all items was interpreted as the general degree of deterioration. The promax votation revealed the riginally supposed factor structure with 3 factors(groups of items): neuromuscuclar function, respiratory function and bulbar function. The rating scale correlated with Global clinical impression of change(GCIC) scored by neurologists and declined with time, indicating its sensitivity to change. On the bases of these results, ALSFRS-R(Japanese version) is considered to be highly reliable enough for clinical use.
For Which Boys and Which Girls Are Reading Assessment Items Biased Against? Detection of Differential Item Functioning in Heterogeneous Gender Populations

ERIC Educational Resources Information Center

Grover, Raman K.; Ercikan, Kadriye

2017-01-01

In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2010-01-01

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Real and Artificial Differential Item Functioning

ERIC Educational Resources Information Center

Andrich, David; Hagquist, Curt

2012-01-01

The literature in modern test theory on procedures for identifying items with differential item functioning (DIF) among two groups of persons includes the Mantel-Haenszel (MH) procedure. Generally, it is not recognized explicitly that if there is real DIF in some items which favor one group, then as an artifact of this procedure, artificial DIF…
A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure.

PubMed

Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C

2014-12-01

It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
Validation of a mobility item bank for older patients in primary care.

PubMed

Cabrero-García, Julio; Ramos-Pichardo, Juan Diego; Muñoz-Mendoza, Carmen Luz; Cabañero-Martínez, María José; González-Llopis, Lorena; Reig-Ferrer, Abilio

2012-12-05

To develop and validate an item bank to measure mobility in older people in primary care and to analyse differential item functioning (DIF) and differential bundle functioning (DBF) by sex. A pool of 48 mobility items was administered by interview to 593 older people attending primary health care practices. The pool contained four domains based on the International Classification of Functioning: changing and maintaining body position, carrying, lifting and pushing, walking and going up and down stairs. The Late Life Mobility item bank consisted of 35 items, and measured with a reliability of 0.90 or more across the full spectrum of mobility, except at the higher end of better functioning. No evidence was found of non-uniform DIF but uniform DIF was observed, mainly for items in the changing and maintaining body position and carrying, lifting and pushing domains. The walking domain did not display DBF, but the other three domains did, principally the carrying, lifting and pushing items. During the design and validation of an item bank to measure mobility in older people, we found that strength (carrying, lifting and pushing) items formed a secondary dimension that produced DBF. More research is needed to determine how best to include strength items in a mobility measure, or whether it would be more appropriate to design separate measures for each construct.
Use of item response curves of the Force and Motion Conceptual Evaluation to compare Japanese and American students' views on force and motion

NASA Astrophysics Data System (ADS)

Ishimoto, Michi; Davenport, Glen; Wittmann, Michael C.

2017-12-01

Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students' views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE) is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs) to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.
[Effects of planning and executive functions on young children's script change strategy: A developmental perspective].

PubMed

Yanaoka, Kaichi

2016-02-01

This research examined the effects of planning and executive functions on young children's (ages 3-to 5-years) strategies in changing scripts. Young children (N = 77) performed a script task (doll task), three executive function tasks (DCCS, red/blue task, and nine box task), a planning task, and a receptive vocabulary task. In the doll task, young children first enacted a "changing clothes" script, and then faced a situation in which some elements of the script were inappropriate. They needed to enact a script by compensating inappropriate items for the other-script items or by changing to the other script in advance. The results showed that shifting, a factor of executive function, had a positive influence on whether young children could compensate inappropriate items. In addition, planning was also an important factor that helped children to change to the other script in advance. These findings suggest that shifting and planning play different roles in using the two strategies appropriately when young children enact scripts in unexpected situations.
Combining agreement and frequency rating scales to optimize psychometrics in measuring behavioral health functioning.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M

2014-07-01

The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Which method of posttraumatic stress disorder classification best predicts psychosocial function in children with traumatic brain injury?

PubMed

Iselin, Greg; Le Brocque, Robyne; Kenardy, Justin; Anderson, Vicki; McKinlay, Lynne

2010-10-01

Controversy surrounds the classification of posttraumatic stress disorder (PTSD), particularly in children and adolescents with traumatic brain injury (TBI). In these populations, it is difficult to differentiate TBI-related organic memory loss from dissociative amnesia. Several alternative PTSD classification algorithms have been proposed for use with children. This paper investigates DSM-IV-TR and alternative PTSD classification algorithms, including and excluding the dissociative amnesia item, in terms of their ability to predict psychosocial function following pediatric TBI. A sample of 184 children aged 6-14 years were recruited following emergency department presentation and/or hospital admission for TBI. PTSD was assessed via semi-structured clinical interview (CAPS-CA) with the child at 3 months post-injury. Psychosocial function was assessed using the parent report CHQ-PF50. Two alternative classification algorithms, the PTSD-AA and 2 of 3 algorithms, reached statistical significance. While the inclusion of the dissociative amnesia item increased prevalence rates across algorithms, it generally resulted in weaker associations with psychosocial function. The PTSD-AA algorithm appears to have the strongest association with psychosocial function following TBI in children and adolescents. Removing the dissociative amnesia item from the diagnostic algorithm generally results in improved validity. Copyright 2010 Elsevier Ltd. All rights reserved.
Impaired work functioning due to common mental disorders in nurses and allied health professionals: the Nurses Work Functioning Questionnaire.

PubMed

Gärtner, F R; Nieuwenhuijsen, K; van Dijk, F J H; Sluiter, J K

2012-02-01

Common mental disorders (CMD) negatively affect work functioning. In the health service sector not only the prevalence of CMDs is high, but work functioning problems are associated with a risk of serious consequences for patients and healthcare providers. If work functioning problems due to CMDs are detected early, timely help can be provided. Therefore, the aim of this study is to develop a detection questionnaire for impaired work functioning due to CMDs in nurses and allied health professionals working in hospitals. First, an item pool was developed by a systematic literature study and five focus group interviews with employees and experts. To evaluate the content validity, additional interviews were held. Second, a cross-sectional assessment of the item pool in 314 nurses and allied health professionals was used for item selection and for identification and corroboration of subscales by explorative and confirmatory factor analysis. The study results in the Nurses Work Functioning Questionnaire (NWFQ), a 50-item self-report questionnaire consisting of seven subscales: cognitive aspects of task execution, impaired decision making, causing incidents at work, avoidance behavior, conflicts and irritations with colleagues, impaired contact with patients and their family, and lack of energy and motivation. The questionnaire has a proven high content validity. All subscales have good or acceptable internal consistency. The Nurses Work Functioning Questionnaire gives insight into precise and concrete aspects of impaired work functioning of nurses and allied health professionals. The scores can be used as a starting point for purposeful interventions.
Decisions that Make a Difference in Detecting Differential Item Functioning

ERIC Educational Resources Information Center

Sireci, Stephen G.; Rios, Joseph A.

2013-01-01

There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions…
Examining Differential Math Performance by Gender and Opportunity to Learn

ERIC Educational Resources Information Center

Albano, Anthony D.; Rodriguez, Michael C.

2013-01-01

Although a substantial amount of research has been conducted on differential item functioning in testing, studies have focused on detecting differential item functioning rather than on explaining how or why it may occur. Some recent work has explored sources of differential functioning using explanatory and multilevel item response models. This…

Conditional recall and the frequency effect in the serial recall task: an examination of item-to-item associativity.

PubMed

Miller, Leonie M; Roodenrys, Steven

2012-11-01

The frequency effect in short-term serial recall is influenced by the composition of lists. In pure lists, a robust advantage in the recall of high-frequency (HF) words is observed, yet in alternating mixed lists, HF and low-frequency (LF) words are recalled equally well. It has been argued that the preexisting associations between all list items determine a single, global level of supportive activation that assists item recall. Preexisting associations between items are assumed to be a function of language co-occurrence; HF-HF associations are high, LF-LF associations are low, and mixed associations are intermediate in activation strength. This account, however, is based on results when alternating lists with equal numbers of HF and LF words were used. It is possible that directional association between adjacent list items is responsible for the recall patterns reported. In the present experiment, the recall of three forms of mixed lists-those with equal numbers of HF and LF items and pure lists-was examined to test the extent to which item-to-item associations are present in serial recall. Furthermore, conditional probabilities were used to examine more closely the evidence for a contribution, since correct-in-position scoring may mask recall that is dependent on the recall of prior items. The results suggest that an item-to-item effect is clearly present for early but not late list items, and they implicate an additional factor, perhaps the availability of resources at output, in the recall of late list items.
Differential Item Functioning by Gender on a Large-Scale Science Performance Assessment: A Comparison across Grade Levels.

ERIC Educational Resources Information Center

Holweger, Nancy; Taylor, Grace

The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…
Development of a self-report physical function instrument for disability assessment: item pool construction and factor analysis.

PubMed

McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K

2013-09-01

To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Measuring grief and loss after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Grief and Loss item bank and short form

PubMed Central

Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.

2015-01-01

Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form

PubMed Central

Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.

2015-01-01

Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model

ERIC Educational Resources Information Center

Wang, Wen-Chung; Wilson, Mark

2005-01-01

This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
The Effects of Testlets on Reliability and Differential Item Functioning

ERIC Educational Resources Information Center

Teker, Gulsen Tasdelen; Dogan, Nuri

2015-01-01

Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…
MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Shih, Ching-Lin

2010-01-01

Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
Identifying Differential Item Functioning of Rating Scale Items with the Rasch Model: An Introduction and an Application

ERIC Educational Resources Information Center

Myers, Nicholas D.; Wolfe, Edward W.; Feltz, Deborah L.; Penfield, Randall D.

2006-01-01

This study (a) provided a conceptual introduction to differential item functioning (DIF), (b) introduced the multifaceted Rasch rating scale model (MRSM) and an associated statistical procedure for identifying DIF in rating scale items, and (c) applied this procedure to previously collected data from American coaches who responded to the coaching…
Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire

ERIC Educational Resources Information Center

Gao, Yong; Zhu, Weimo

2011-01-01

Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Li, Johnson

2013-01-01

The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
Mixture Item Response Theory-MIMIC Model: Simultaneous Estimation of Differential Item Functioning for Manifest Groups and Latent Classes

ERIC Educational Resources Information Center

Bilir, Mustafa Kuzey

2009-01-01

This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
A Comparison of Two Area Measures for Detecting Differential Item Functioning.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

1991-01-01

The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)
Impressions of functional food consumers.

PubMed

Saher, Marieke; Arvola, Anne; Lindeman, Marjaana; Lähteenmäki, Liisa

2004-02-01

Functional foods provide a new way of expressing healthiness in food choices. The objective of this study was to apply an indirect measure to explore what kind of impressions people form of users of functional foods. Respondents (n=350) received one of eight versions of a shopping list and rated the buyer of the foods on 66 bipolar attributes on 7-point scales. The shopping lists had either healthy or neutral background items, conventional or functional target items and the buyer was described either as a 40-year-old woman or man. The attribute ratings revealed three factors: disciplined, innovative and gentle. Buyers with healthy background items were perceived as more disciplined than those having neutral items on the list, users of functional foods were rated as more disciplined than users of conventional target items only when the background list consisted of neutral items. Buyers of functional foods were regarded as more innovative and less gentle, but gender affected the ratings on gentle dimension. The impressions of functional food users clearly differ from those formed of users of conventional foods with a healthy image. The shopping list method performed well as an indirect method, but further studies are required to test its feasibility in measuring other food-related impressions.
Health measurement using the ICF: Test-retest reliability study of ICF codes and qualifiers in geriatric care

PubMed Central

Okochi, Jiro; Utsunomiya, Sakiko; Takahashi, Tai

2005-01-01

Background The International Classification of Functioning, Disability and Health (ICF) was published by the World Health Organization (WHO) to standardize descriptions of health and disability. Little is known about the reliability and clinical relevance of measurements using the ICF and its qualifiers. This study examines the test-retest reliability of ICF codes, and the rate of immeasurability in long-term care settings of the elderly to evaluate the clinical applicability of the ICF and its qualifiers, and the ICF checklist. Methods Reliability of 85 body function (BF) items and 152 activity and participation (AP) items of the ICF was studied using a test-retest procedure with a sample of 742 elderly persons from 59 institutional and at home care service centers. Test-retest reliability was estimated using the weighted kappa statistic. The clinical relevance of the ICF was estimated by calculating immeasurability rate. The effect of the measurement settings and evaluators' experience was analyzed by stratification of these variables. The properties of each item were evaluated using both the kappa statistic and immeasurability rate to assess the clinical applicability of WHO's ICF checklist in the elderly care setting. Results The median of the weighted kappa statistics of 85 BF and 152 AP items were 0.46 and 0.55 respectively. The reproducibility statistics improved when the measurements were performed by experienced evaluators. Some chapters such as genitourinary and reproductive functions in the BF domain and major life area in the AP domain contained more items with lower test-retest reliability measures and rated as immeasurable than in the other chapters. Some items in the ICF checklist were rated as unreliable and immeasurable. Conclusion The reliability of the ICF codes when measured with the current ICF qualifiers is relatively low. The result in increase in reliability according to evaluators' experience suggests proper education will have positive effects to raise the reliability. The ICF checklist contains some items that are difficult to be applied in the geriatric care settings. The improvements should be achieved by selecting the most relevant items for each measurement and by developing appropriate qualifiers for each code according to the interest of the users. PMID:16050960
The shortened food expectations--Long-term care questionnaire: Assessing nursing home residents' satisfaction with food and food service.

PubMed

Crogan, Neva L; Evans, Bronwynne C

2006-11-01

Lack of nursing home resident satisfaction with meals often results in reduced food intake, leading to poor nutritional status, weight loss, functional decline, and depression. The purpose of this article is to describe the development and initial testing of the 28-item revised Food Expectations-Long-Term Care (FoodEx-LTC) questionnaire with a convenience sample of nursing home residents (N = 61). Because of possible respondent burden, the original 44-item, five-domain FoodEx-LTC was revised, resulting in the deletion of 16 redundant items and those with inter-item correlations less than .25. Coefficient alpha scores ranged from .65 to .82, and test-retest correlations ranged from .79 to .88, dependent on domain. This revised instrument has good initial validity and reliability, resulting in a shorter instrument that accurately assesses nursing home resident satisfaction with food and food service.
Plasma Interactions With Spacecraft

DTIC Science & Technology

2009-04-01

software core 3 Table 2. N2kDB classes 8 Table 3. N2kDB Application Programmer Interface 11 Table 4. How to get number of items from N2kDB 14 Table 5...grid, timesteps, and pages of particles. Table 4 specifies how these functions are used to get useful quantities. The Getcount function gets the...number of items with data item names that start with the specified string. 13 Table 4. How to get number of items from N2kDB. Function Specifics
Readability and Comprehension of the Geriatric Depression Scale and PROMIS® Physical Function Items in Older African Americans and Latinos.

PubMed

Paz, Sylvia H; Jones, Loretta; Calderón, José L; Hays, Ron D

2017-02-01

Depression and physical function are particularly important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) physical function item bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. The aim of this study was to estimate the readability of the GDS and PROMIS ® physical function items and to assess their comprehensibility using a sample of African American and Latino elderly. Readability was estimated using the Flesch-Kincaid and Flesch Reading Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS ® items by minority elderly was evaluated with 30 cognitive interviews. Readability estimates of a number of items in English and Spanish of the GDS and PROMIS ® physical functioning items exceed the U.S. recommended 5th-grade threshold for vulnerable populations, or were rated as 'fairly difficult', 'difficult', or 'very difficult' to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS ® items was considered confusing, and interpreting responses was problematic because they were based on using physical aids. Problems with item wording and response options of the GDS and PROMIS ® physical function items may reduce reliability and validity of measurement when used with minority elderly.
Development and validation of a measure of pediatric oral health-related quality of life: the POQL.

PubMed

Huntington, Noelle L; Spetter, Dante; Jones, Judith A; Rich, Sharron E; Garcia, Raul I; Spiro, Avron

2011-01-01

To develop a brief measure of oral health-related quality of life (OHQL) in children and demonstrate its reliability and validity in a diverse population. We administered the initial 20-item Pediatric Oral Health-Related Quality of Life (POQL) to children (Child Self-Report) and parents (Parent Report on Child) from diverse populations in both school-based and clinic-based settings. Clinical oral health status was measured on a subset of children. We used factor analysis to determine the underlying scales and then reduced the measure to 10 items based on several considerations. Multitrait analysis on the resulting 10-item POQL was used to reaffirm the discrimination of scales and assess the measure's internal consistency and interscale correlations. We established discriminant and convergent validity with clinical status, perceived oral health and responses on the PedsQL, and determined sensitivity to change with children undergoing ECC surgical repair. Factor analysis returned a four-scale solution for the initial items--Physical Functioning, Role Functioning, Social Functioning, and Emotional Functioning. The reduced items represented the same four scales--two each on Physical and Role and three each on Social and Emotional. Good reliability and validity were shown for the POQL as a whole and for each of the scales. The POQL is a valid and reliable measure of OHQL for use in preschool and school-aged children, with high utility for both clinical assessments and large-scale population studies.
A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

ERIC Educational Resources Information Center

Penfield, Randall D.; Algina, James

2006-01-01

One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…

Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

ERIC Educational Resources Information Center

Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

2010-01-01

This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Assessing patient report of function: content validity of the Functional Performance Inventory-Short Form (FPI-SF) in patients with chronic obstructive pulmonary disease (COPD).

PubMed

Leidy, Nancy Kline; Hamilton, Alan; Becker, Karin

2012-01-01

The performance of daily activities is a major challenge for people with chronic obstructive pulmonary disease (COPD). The Functional Performance Inventory (FPI) was developed based on an analytical framework of functional status and qualitative interviews with COPD patients describing these difficulties. The 65-item FPI was reduced to a 32-item short form (SF) through a systematic process of qualitative and quantitative item reduction and formatted for greater clarity and ease of use. This study examined the content validity of the reduced, reformatted form of the instrument, the FPI-SF. Qualitative cognitive interviews were conducted with COPD patients recruited from three geographically diverse pulmonary clinics in the United States. Interviews were designed to assess respondent interpretation of the instrument, evaluate clarity and ease of completion, and identify any new activities participants found important and difficult to perform that were not represented by the existing items. Twenty subjects comprised the sample; 12 (60%) were male, 14 (70%) were Caucasian, the mean age was 63.0 ± 11.3 years, 12 (60%) were retired, the mean forced expiratory volume in 1 second (FEV(1)) was 1.5 ± 0.5 L, and the mean percent predicted FEV(1) was 48.4% ± 13.1%. Participants understood the FPI-SF as intended, including instructions, items, and response options. Two minor formatting changes were suggested to improve clarity of presentation. Participants found the content of the FPI-SF to be comprehensive, with items covering activities they felt were important and often difficult to perform. These results, together with its development history and previously tested quantitative properties, suggest that the FPI-SF is content valid for use in clinical studies of COPD.
Testing parent dyad interchangeability in the parent proxy-report of PedsQL™ 4.0: a differential item functioning analysis.

PubMed

Doostfatemeh, Marziyeh; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

2015-08-01

In child-parent agreement studies in the field of paediatric health-related quality of life (HRQoL), little attention has been paid to the effect of gender in parental proxy rating of children's HRQoL. This study aims to test the potential interchangeability of parent dyads in reporting children's HRQoL on both item and scale levels of the PedsQL™ 4.0 instrument, using the approach of differential item functioning (DIF). The PedsQL™ 4.0 Generic Core Scales were completed by 576 father-and-mother dyads. A polytomous item response theory model, graded response model, was used to detect DIF across fathers and mothers. Assessment at item level showed that fathers and mothers perceived the meaning of items of the PedsQL™ 4.0 consistently. Regarding the scale level, a moderate to high level of agreement was observed between mothers' and fathers' reports on all similar subscales. Although the significant mean score differences in total, physical and emotional functioning indicated that fathers gave higher scores to their children, the small effect size implied that this difference may not be practically meaningful. Our findings revealed that discrepancy in parent dyads in rating children's HRQoL is a "real" difference and not an artefact due to measurement non-invariance. Fathers were seen to have slightly different insights into their children, especially for emotional functioning, but overall the results were not all that different. This suggests that paternal proxy-reports can be included in studies along with maternal proxy-reports, and the two may be combined when looking at parent-child agreement. Parent-child agreement studies in Iran are not affected by parents' gender, and therefore, researchers may rely on the assumption of the interchangeability of fathers and mothers in these studies.
Development of a Symptom-Focused Patient-Reported Outcome Measure for Functional Dyspepsia: The Functional Dyspepsia Symptom Diary (FDSD)

PubMed Central

Taylor, Fiona; Higgins, Sophie; Carson, Robyn T; Eremenco, Sonya; Foley, Catherine; Lacy, Brian E; Parkman, Henry P; Reasner, David S; Shields, Alan L; Tack, Jan; Talley, Nicholas J

2018-01-01

Objectives: The Functional Dyspepsia Symptom Diary (FDSD) was developed to address the lack of symptom-focused, patient-reported outcome (PRO) measures designed for use in functional dyspepsia (FD) patients and meeting Food and Drug Administration recommendations for PRO instrument development. Methods: Concept elicitation interviews were conducted with FD participants to identify symptoms important and relevant to FD patients. A preliminary version of the FDSD was constructed, then completed by FD participants on an electronic device in cognitive interviews to evaluate the readability, comprehensibility, relevance, and comprehensiveness of the FDSD, and to preliminarily evaluate its measurement properties. Results: During concept elicitation interviews, 45 participants spontaneously reported 19 symptom concepts. Of those, seven symptoms were selected for assessment by the eight-item FDSD. Cognitive interviews with 57 participants confirmed that participants were able to comprehend and provide meaningful responses to the FDSD, and that the handheld electronic FDSD format was suitable for use in the target population. Scores of the FDSD were well-distributed among response options, item discrimination indices suggested that the FDSD items differentiate among patients with varying degrees of FD severity, and inter-item correlations suggested that no items of the FDSD were capturing redundant information. Internal consistency estimates (0.87) and construct-related validity estimates using known-groups methods were within acceptable ranges. Conclusions: The FDSD is a content-valid PRO measure, with preliminary psychometric evidence providing support for the FDSD’s items and total score. Further psychometric evaluations are recommended to more fully test the FDSD’s score performance and other measurement properties in the target patient population. PMID:28925989
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.

PubMed

McCabe, Erin; Gross, Douglas P; Bulut, Okan

2018-06-07

The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Statistical power as a function of Cronbach alpha of instrument questionnaire items.

PubMed

Heo, Moonseong; Kim, Namhee; Faith, Myles S

2015-10-14

In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.
Using the Oxford Foot Model to determine the association between objective measures of foot function and results of the AOFAS Ankle-Hindfoot Scale and the Foot Function Index: a prospective gait analysis study in Germany

PubMed Central

Kostuj, Tanja; Stief, Felix; Hartmann, Kirsten Anna; Schaper, Katharina; Arabmotlagh, Mohammad; Baums, Mike H; Meurer, Andrea; Krummenauer, Frank; Lieske, Sebastian

2018-01-01

Objective After cross-cultural adaption for the German translation of the Ankle-Hindfoot Scale of the American Orthopaedic Foot and Ankle Society (AOFAS-AHS) and agreement analysis with the Foot Function Index (FFI-D), the following gait analysis study using the Oxford Foot Model (OFM) was carried out to show which of the two scores better correlates with objective gait dysfunction. Design and participants Results of the AOFAS-AHS and FFI-D, as well as data from three-dimensional gait analysis were collected from 20 patients with mild to severe ankle and hindfoot pathologies. Kinematic and kinetic gait data were correlated with the results of the total AOFAS scale and FFI-D as well as the results of those items representing hindfoot function in the AOFAS-AHS assessment. With respect to the foot disorders in our patients (osteoarthritis and prearthritic conditions), we correlated the total range of motion (ROM) in the ankle and subtalar joints as identified by the OFM with values identified during clinical examination ‘translated’ into score values. Furthermore, reduced walking speed, reduced step length and reduced maximum ankle power generation during push-off were taken into account and correlated to gait abnormalities described in the scores. An analysis of correlations with CIs between the FFI-D and the AOFAS-AHS items and the gait parameters was performed by means of the Jonckheere-Terpstra test; furthermore, exploratory factor analysis was applied to identify common information structures and thereby redundancy in the FFI-D and the AOFAS-AHS items. Results Objective findings for hindfoot disorders, namely a reduced ROM, in the ankle and subtalar joints, respectively, as well as reduced ankle power generation during push-off, showed a better correlation with the AOFAS-AHS total score—as well as AOFAS-AHS items representing ROM in the ankle, subtalar joints and gait function—compared with the FFI-D score. Factor analysis, however, could not identify FFI-D items consistently related to these three indicator parameters (pain, disability and function) found in the AOFAS-AHS. Furthermore, factor analysis did not support stratification of the FFI-D into two subscales. Conclusions The AOFAS-AHS showed a good agreement with objective gait parameters and is therefore better suited to evaluate disability and functional limitations of patients suffering from foot and ankle pathologies compared with the FFI-D. PMID:29626046
Development of Elderly Quality of Life Index – Eqoli: Item Reduction and Distribution into Dimensions

PubMed Central

Paschoal, Sérgio Márcio Pacheco; Filho, Wilson Jacob; Litvoc, Júlio

2008-01-01

OBJECTIVE To describe item reduction and its distribution into dimensions in the construction process of a quality of life evaluation instrument for the elderly. METHODS The sampling method was chosen by convenience through quotas, with selection of elderly subjects from four programs to achieve heterogeneity in the “health status”, “functional capacity”, “gender”, and “age” variables. The Clinical Impact Method was used, consisting of the spontaneous and elicited selection by the respondents of relevant items to the construct Quality of Life in Old Age from a previously elaborated item pool. The respondents rated each item’s importance using a 5-point Likert scale. The product of the proportion of elderly selecting the item as relevant (frequency) and the mean importance score they attributed to it (importance) represented the overall impact of that item in their quality of life (impact). The items were ordered according to their impact scores and the top 46 scoring items were grouped in dimensions by three experts. A review of the negative items was performed. RESULTS One hundred and ninety three people (122 women and 71 men) were interviewed. Experts distributed the 46 items into eight dimensions. Closely related items were grouped and dimensions not reaching the minimum expected number of items received additional items resulting in eight dimensions and 43 items. DISCUSSION The sample was heterogeneous and similar to what was expected. The dimensions and items demonstrated the multidimensionality of the construct. The Clinical Impact Method was appropriate to construct the instrument, which was named Elderly Quality of Life Index - EQoLI. An accuracy process will be examined in the future. PMID:18438571
Effectiveness of Social Behaviors for Autonomous Wheelchair Robot to Support Elderly People in Japan

PubMed Central

Shiomi, Masahiro; Iio, Takamasa; Kamei, Koji; Sharma, Chandraprakash; Hagita, Norihiro

2015-01-01

We developed a wheelchair robot to support the movement of elderly people and specifically implemented two functions to enhance their intention to use it: speaking behavior to convey place/location related information and speed adjustment based on individual preferences. Our study examines how the evaluations of our wheelchair robot differ when compared with human caregivers and a conventional autonomous wheelchair without the two proposed functions in a moving support context. 28 senior citizens participated in the experiment to evaluate three different conditions. Our measurements consisted of questionnaire items and the coding of free-style interview results. Our experimental results revealed that elderly people evaluated our wheelchair robot higher than the wheelchair without the two functions and the human caregivers for some items. PMID:25993038
Detection of Differential Item Functioning with Nonlinear Regression: A Non-IRT Approach Accounting for Guessing

ERIC Educational Resources Information Center

Drabinová, Adéla; Martinková, Patrícia

2017-01-01

In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

ERIC Educational Resources Information Center

Magis, David; De Boeck, Paul

2011-01-01

We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

ERIC Educational Resources Information Center

Kim, Jihye; Oshima, T. C.

2013-01-01

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Doong, Shing H.

2009-01-01

The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…
Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

ERIC Educational Resources Information Center

Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

2012-01-01

Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
A rasch analysis of the Manchester foot pain and disability index

PubMed Central

Muller, Sara; Roddy, Edward

2009-01-01

Background There is currently no interval-level measure of foot-related disability and this has hampered research in this area. The Manchester Foot Pain and Disability Index (FPDI) could potentially fill this gap. Objective To assess the fit of the three subscales (function, pain, appearance) of the FPDI to the Rasch unidimensional measurement model in order to form interval-level scores. Methods A two-stage postal survey at a general practice in the UK collected data from 149 adults aged 50 years and over with foot pain. The 17 FPDI items, in three subscales, were assessed for their fit to the Rasch model. Checks were carried out for differential item functioning by age and gender. Results The function and pain items fit the Rasch model and interval-level scores can be constructed. There were too few people without extreme scores on the appearance subscale to allow fit to the Rasch model to be tested. Conclusion The items from the FPDI function and pain subscales can be used to obtain interval level scores for these factors for use in future research studies in older adults. Further work is needed to establish the interval nature of these subscale scores in more diverse populations and to establish the measurement properties of these interval-level scores. PMID:19878536
[Development and validation of an inventory of ego functions and self regulation (Hannover Self-Regulation Inventory, HSRI)].

PubMed

Jäger, B; Schmid-Ott, G; Ernst, G; Dölle-Lange, E; Sack, M

2012-06-01

The aim of this study was to construct and validate a short self-rating questionnaire for the assessment of ego functions and ability of self regulation. An item pool of 120 items covering 6 postulated dimensions was reduced by two steps in independent samples (n = 136 + 470) via factor and item analyses to the final version consisting of 35 items. The 5 resulting questionnaire scales "interpersonal disturbances", "frustration tolerance and impulse control", "identity disturbances", "affect differentiation and affect tolerance" and "self-esteem" were well interpretable and showed in confirmatory factor analysis the best fit to the data (CHI²/df = 3.48; RMSEA = 0.73). Total scores were found to differentiate well between diagnostic groups of patients with more or less ego pathology (FANOVA = 9.8; df = 11; p < 0.001), thus proving good concurrent validity. Reliability was shown by testing internal consistency and test-retest correlations. The "Hannover self-regulation questionnaire" (HSRQ) evidently is an appropriate and reliable screening instrument in order to assess ego functions and capacities of self regulation in an economic and user-friendly means. The scale structure allows differentiated diagnostics of weak vs. stable ego functions and may be used for detailed therapy planning. © Georg Thieme Verlag KG Stuttgart · New York.
Varying the valuating function and the presentable bank in computerized adaptive testing.

PubMed

Barrada, Juan Ramón; Abad, Francisco José; Olea, Julio

2011-05-01

In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.
Development and validation of an item response theory-based Social Responsiveness Scale short form.

PubMed

Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T

2017-09-01

Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy.

PubMed

Montpetit, Kathleen; Haley, Stephen; Bilodeau, Nathalie; Ni, Pengsheng; Tian, Feng; Gorton, George; Mulcahey, M J

2011-02-01

This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function.
Development of a Multidimensional Functional Health Scale for Older Adults in China.

PubMed

Mao, Fanzhen; Han, Yaofeng; Chen, Junze; Chen, Wei; Yuan, Manqiong; Alicia Hong, Y; Fang, Ya

2016-05-01

A first step to achieve successful aging is assessing functional wellbeing of older adults. This study reports the development of a culturally appropriate brief scale (the Multidimensional Functional Health Scale for Chinese Elderly, MFHSCE) to assess the functional health of Chinese elderly. Through systematic literature review, Delphi method, cultural adaptation, synthetic statistical item selection, Cronbach's alpha and confirmatory factor analysis, we conducted development of item pool, two rounds of item selection, and psychometric evaluation. Synthetic statistical item selection and psychometric evaluation was processed among 539 and 2032 older adults, separately. The MFHSCE consists of 30 items, covering activities of daily living, social relationships, physical health, mental health, cognitive function, and economic resources. The Cronbach's alpha was 0.92, and the comparative fit index was 0.917. The MFHSCE has good internal consistency and construct validity; it is also concise and easy to use in general practice, especially in communities in China.

Development of a Measure of Asthma-Specific Quality of Life among Adults

PubMed Central

Eberhart, Nicole K.; Sherbourne, Cathy D.; Edelen, Maria Orlando; Stucky, Brian D.; Sin, Nancy L.; Lara, Marielena

2014-01-01

Purpose A key goal in asthma treatment is improvement in quality of life (QoL), but existing measures often confound QoL with symptoms and functional impairment. The current study addresses these limitations and the need for valid patient-reported outcome measures by using state-of-the-art methods to develop an item bank assessing QoL in adults with asthma. This article describes the process for developing an initial item pool for field testing. Methods Five focus group interviews were conducted with a total of 50 asthmatic adults. We used “pile sorting/binning” and “winnowing” methods to identify key QoL dimensions and develop a pool of items based on statements made in the focus group interviews. We then conducted a literature review and consulted with an expert panel to ensure that no key concepts were omitted. Finally, we conducted individual cognitive interviews to ensure that items were well understood and inform final item refinement. Results 661 QoL statements were identified from focus group interview transcripts and subsequently used to generate a pool of 112 items in 16 different content areas. Conclusions Items covering a broad range of content were developed that can serve as a valid gauge of individuals’ perceptions of the effects of asthma and its treatment on their lives. These items do not directly measure symptoms or functional impairment, yet they include a broader range of content than most existent measures of asthma-specific QoL. PMID:24062237
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

PubMed

Eichenbaum, Alexander E; Marcus, David K; French, Brian F

2017-06-01

This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Development of an Item Bank for the Assessment of Knowledge on Biology in Argentine University Students.

PubMed

Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo

The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.
Differential item functioning of the patient-reported outcomes information system (PROMIS®) pain interference item bank by language (Spanish versus English).

PubMed

Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D

2017-06-01

About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.
The Communicative Participation Item Bank (CPIB): Item bank calibration and development of a disorder-generic short form

PubMed Central

Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar

2015-01-01

Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

PubMed

Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

2015-07-01

The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Development and validation of a vision-specific quality-of-life questionnaire for Timor-Leste.

PubMed

du Toit, Rènée; Palagyi, Anna; Ramke, Jacqueline; Brian, Garry; Lamoureux, Ecosse L

2008-10-01

To develop and determine the reliability and validity of a vision-specific quality-of-life instrument (TL-VSQOL) designed to assess the impact of distance and near vision impairment in adults living in Timor-Leste. A vision-specific quality-of-life questionnaire was developed, piloted, and administered to 704 Timorese aged >or=40 years during a population-based eye health rapid assessment. Rasch analysis was performed on the data of 457 participants with presenting near vision worse than N8 (78.5%) and/or distance vision worse than 6/18 (69.8%). Unidimensionality, item fit to the model, response category performance, differential item functioning, and targeting of items to participants were assessed. Initially, the questionnaire lacked fit to the Rasch model. Removal of two items concerning emotional well-being resulted in a fit of the data (overall item-trait interaction: chi(2) (df) = 81 (51); mean (SD) person and item fit residual values: -0.30 (1.02) and -0.32 (1.46), and good targeting of person ability and item difficulty was evident. Poorer distance and near visual acuities were significantly associated with worse quality-of-life scores (P < 0.001). Person separation reliability was substantial (0.93), indicating that the instrument can discriminate between groups with normal and impaired vision. All 17 items were free of differential item functioning, and there was no evidence of multidimensionality. This 17-item TL-VSQOL has high reliability, construct, and criterion validity and effective targeting. It can effectively assess the impact on quality of life of adult Timorese with distance and near vision impairment. The TL-VSQOL could be adapted for use in other low-resource settings.
Development of a vision-targeted health-related quality of life item measure

PubMed Central

Slotkin, Jerry; McKean-Cowdin, Roberta; Lee, Paul; Owsley, Cynthia; Vitale, Susan; Varma, Rohit; Gershon, Richard; Hays, Ron D.

2013-01-01

Purpose To develop a vision-targeted health-related quality of life (HRQOL) measure for the NIH Toolbox for the Assessment of Neurological and Behavioral Function. Methods We conducted a review of existing vision-targeted HRQOL surveys and identified color vision, low luminance vision, distance vision, general vision, near vision, ocular symptoms, psychosocial well-being, and role performance domains. Items in existing survey instruments were sorted into these domains. We selected non-redundant items and revised them to improve clarity and to limit the number of different response options. We conducted 10 cognitive interviews to evaluate the items. Finally, we revised the items and administered them to 819 individuals to calibrate the items and estimate the measure’s reliability and validity. Results The field test provided support for the 53-item vision-targeted HRQOL measure encompassing 6 domains: color vision, distance vision, near vision, ocular symptoms, psychosocial well-being, and role performance. The domain scores had high levels of reliability (coefficient alphas ranged from 0.848 to 0.940). Validity was supported by high correlations between National Eye Institute Visual Function Questionnaire scales and the new-vision-targeted scales (highest values were 0.771 between psychosocial well-being and mental health, and 0.729 between role performance and role difficulties), and by lower mean scores in those groups self-reporting eye disease (F statistic with p < 0.01 for all comparisons except cataract with ocular symptoms, psychosocial well-being, and role performance scales). Conclusions This vision-targeted HRQOL measure provides a basis for comprehensive assessment of the impact of eye diseases and treatments on daily functioning and well-being in adults. PMID:23475688
Scale invariance and longitudinal stability of the Physical Functioning Western Ontario and MacMaster Universities Osteoarthritis Index using the Rasch model.

PubMed

Ayala, Alba; Bilbao, Amaia; Garcia-Perez, Sonia; Escobar, Antonio; Forjaz, Maria João

2018-03-01

The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) measures the quality of life of patients with osteoarthritis (OA), and there is a specific scale for the physical functioning dimension, the short version with seven items WOMAC-pf. This study describes the application of the Rasch model to explore scale invariance and response stability of the WOMAC-pf short version across affected joint and over time. A sample of 884 patients with OA, from 15 hospitals in Spain, completed the WOMAC-pf before surgery (baseline) and at 3, 6 and 12 months post-surgery of hip or knee. The invariance by joint was explored through the differential item functioning (DIF) analysis of the Rasch model using baseline data, and time stability (DIF by time) were evaluated in stack data (each participant is represented four times, one by time point). Mean age of the patients was of 69.13 years (SD 10.01), 59.3% of them were women (n = 524), 59.2% had knee OA (n = 523) and 40.8% hip OA (n = 361). Item "putting on socks" showed DIF by joint and time. Fit to the Rasch model using stack data improved when this item was removed. Good reliability for individual use, local independency and unidimensionality of the models were confirmed. WOMAC-pf 7-item short version was invariant over time and joint when item "putting on socks" was removed. Researchers should carefully evaluate this item as it presents problems in scale invariance and stability, which could affect results when comparing data by joint or when computing change scores.
Differential Item Functioning Analysis of the Mental, Emotional, and Bodily Toughness Inventory

ERIC Educational Resources Information Center

Gao, Yong; Mack, Mick G.; Ragan, Moira A.; Ragan, Brian

2012-01-01

In this study the authors used differential item functioning analysis to examine if there were items in the Mental, Emotional, and Bodily Toughness Inventory functioning differently across gender and athletic membership. A total of 444 male (56.3%) and female (43.7%) participants (30.9% athletes and 69.1% non-athletes) responded to the Mental,…
A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

ERIC Educational Resources Information Center

Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

2011-01-01

We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Dimensionality of Helicopter Parenting and Relations to Emotional, Decision-Making, and Academic Functioning in Emerging Adults.

PubMed

Luebbe, Aaron M; Mancini, Kathryn J; Kiel, Elizabeth J; Spangler, Brooke R; Semlak, Julie L; Fussner, Lauren M

2016-08-24

The current study tests the underlying structure of a multidimensional construct of helicopter parenting (HP), assesses reliability of the construct, replicates past relations of HP to poor emotional functioning, and expands the literature to investigate links of HP to emerging adults' decision-making and academic functioning. A sample of 377 emerging adults (66% female; ages 17-30; 88% European American) were administered several items assessing HP as well as measures of other parenting behaviors, depression, anxiety, decision-making style, grade point average, and academic functioning. Exploratory factor analysis results suggested a four-factor, 23-item measure that encompassed varying levels of parental involvement in the personal and professional lives of their children. A bifactor model was also fit to the data and suggested the presence of a reliable overarching HP factor in addition to three reliable subfactors. The fourth subfactor was not reliable and item variances were subsumed by the general HP factor. HP was found to be distinct from, but correlated in expected ways with, other reports of parenting behavior. HP was also associated with poorer functioning in emotional functioning, decision making, and academic functioning. Parents' information-seeking behaviors, when done in absences of other HP behaviors, were associated with better decision making and academic functioning. © The Author(s) 2016.
Small-Sample DIF Estimation Using Log-Linear Smoothing: A SIBTEST Application. Research Report. ETS RR-07-10

ERIC Educational Resources Information Center

Puhan, Gautam; Moses, Tim P.; Yu, Lei; Dorans, Neil J.

2007-01-01

The purpose of the current study was to examine whether log-linear smoothing of observed score distributions in small samples results in more accurate differential item functioning (DIF) estimates under the simultaneous item bias test (SIBTEST) framework. Data from a teacher certification test were analyzed using White candidates in the reference…
Examining Measurement Properties of an English Self-Efficacy Scale for English Language Learners in Korea

ERIC Educational Resources Information Center

Wang, Chuang; Kim, Do-Hong; Bong, Mimi; Ahn, Hyun Seon

2013-01-01

This study provides evidence for the validity of the Questionnaire of English Self-Efficacy in a sample of 167 college students in Korea. Results show that the scale measures largely satisfy the Rasch model for unidimensionality. The rating scale appeared to function effectively. The item hierarchy was consistent with the expected item order. The…
Neuroimaging Evidence for Agenda-Dependent Monitoring of Different Features during Short-Term Source Memory Tests

ERIC Educational Resources Information Center

Mitchell, Karen J.; Raye, Carol L.; McGuire, Joseph T.; Frankel, Hillary; Greene, Erich J.; Johnson, Marcia K.

2008-01-01

A short-term source monitoring procedure with functional magnetic resonance imaging assessed neural activity when participants made judgments about the format of 1 of 4 studied items (picture, word), the encoding task performed (cost, place), or whether an item was old or new. The results support findings from long-term memory studies showing that…
A Psychometric Evaluation of the DSM-IV Criteria for Antisocial Personality Disorder: Dimensionality, Local Reliability, and Differential Item Functioning Across Gender.

PubMed

Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin

2017-12-01

This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.
Associating a product with a luxury brand label modulates neural reward processing and favors choices in materialistic individuals.

PubMed

Audrin, Catherine; Ceravolo, Leonardo; Chanal, Julien; Brosch, Tobias; Sander, David

2017-11-23

The present study investigated the extent to which luxury vs. non-luxury brand labels (i.e., extrinsic cues) randomly assigned to items and preferences for these items impact choice, and how this impact may be moderated by materialistic tendencies (i.e., individual characteristics). The main objective was to investigate the neural correlates of abovementioned effects using functional magnetic resonance imaging. Behavioural results showed that the more materialistic people are, the more they choose and like items labelled with luxury brands. Neuroimaging results revealed the implication of a neural network including the dorsolateral and ventromedial prefrontal cortex and the orbitofrontal cortex that was modulated by the brand label and also by the participants' preference. Most importantly, items with randomly assigned luxurious brand labels were preferentially chosen by participants and triggered enhanced signal in the caudate nucleus. This effect increased linearly with materialistic tendencies. Our results highlight the impact of brand-item association, although random in our study, and materialism on preference, relying on subparts of the brain valuation system for the integration of extrinsic cues, preferences and individual characteristics.
The Spanish version of the Self-Determination Inventory Student Report: application of item response theory to self-determination measurement.

PubMed

Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A

2018-04-01

A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
A Note on the Item Information Function of the Four-Parameter Logistic Model

ERIC Educational Resources Information Center

Magis, David

2013-01-01

This article focuses on four-parameter logistic (4PL) model as an extension of the usual three-parameter logistic (3PL) model with an upper asymptote possibly different from 1. For a given item with fixed item parameters, Lord derived the value of the latent ability level that maximizes the item information function under the 3PL model. The…
Sex Differences in Item Functioning in the Comprehensive Inventory of Basic Skills-II Vocabulary Assessments

ERIC Educational Resources Information Center

French, Brian F.; Gotch, Chad M.

2013-01-01

The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…

Group-Specific Effects of Matching Subtest Contamination on the Identification of Differential Item Functioning

ERIC Educational Resources Information Center

Keiffer, Elizabeth Ann

2011-01-01

A differential item functioning (DIF) simulation study was conducted to explore the type and level of impact that contamination had on type I error and power rates in DIF analyses when the suspect item favored the same or opposite group as the DIF items in the matching subtest. Type I error and power rates were displayed separately for the…
Federal Logistics Information System. FLIS Procedures Manual Publications. Volume 15.

DTIC Science & Technology

1995-01-01

which provides for the processing of adjustments/revisions to established item identifications and characteristics in the FLIS Data Base. Item Logistics...A function in FLIS which provides for the processing of adjustments/revisions to established item identifications and characteristics in the FLIS...the materiel management functions for assigned items. Mechanization of Warehousing and Shipment Processing (MOWASP). A uniform data 6 system designed
Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data.

PubMed

Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

2017-01-01

The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
Functional recovery is considered the most important target: a survey of dedicated professionals

PubMed Central

2014-01-01

Background The aim of this study was to survey the relative importance of postoperative recovery targets and perioperative care items, as perceived by a large group of international dedicated professionals. Methods A questionnaire with eight postoperative recovery targets and 13 perioperative care items was mailed to participants of the first international Enhanced Recovery After Surgery (ERAS) congress and to authors of papers with a clear relevance to ERAS in abdominal surgery. The responders were divided into categories according to profession and region. Results The recovery targets ‘To be completely free of nausea’, ‘To be independently mobile’ and ‘To be able to eat and drink as soon as possible’ received the highest score irrespective of the responder's profession or region of origin. Equally, the care items ‘Optimizing fluid balance’, ‘Preoperative counselling’ and ‘Promoting early and scheduled mobilisation’ received the highest score across all groups. Conclusions Functional recovery, as in tolerance of food without nausea and regained mobility, was considered the most important target of recovery. There was a consistent uniformity in the way international dedicated professionals scored the relative importance of recovery targets and care items. The relative rating of the perioperative care items was not dependent on the strength of evidence supporting the items. PMID:25089195
Effects of Aging on the Neural Correlates of Successful Item and Source Memory Encoding

PubMed Central

Dennis, Nancy A.; Hayes, Scott M.; Prince, Steven E.; Madden, David J.; Huettel, Scott A.; Cabeza, Roberto

2009-01-01

To investigate the neural basis of age-related source memory (SM) deficits, young and older adults were scanned with fMRI while encoding faces, scenes, and face-scene pairs. Successful encoding activity was identified by comparing encoding activity for subsequently remembered versus forgotten items or pairs. Age deficits in successful encoding activity in hippocampal and prefrontal regions were more pronounced for SM (pairs) compared to item memory (faces and scenes). Age-related reductions were also found in regions specialized in processing faces (fusiform face area) and scenes (parahippocampal place area), but these reductions were similar for item and SM. Functional connectivity between the hippocampus and the rest of the brain was also affected by aging; whereas connections with posterior cortices were weaker in older adults, connections with anterior cortices including prefrontal regions were stronger in older adults. Taken together, the results provide a link between SM deficits in older adults and reduced recruitment of hippocampal and prefrontal regions during encoding. The functional connectivity findings are consistent with a posterior-anterior shift with aging (PASA), previously reported in several cognitive domains and linked to functional compensation. PMID:18605869
A study of the face validity of the 40 item version of the Defense Style Questionnaire (DSQ-40).

PubMed

Chabrol, Henri; Rousseau, Amélie; Rodgers, Rachel; Callahan, Stacey; Pirlot, Gérard; Sztulman, Henri

2005-11-01

There are few studies examining the face validity of the 40-item version of the Defense Style Questionnaire (DSQ-40). Moreover, the existing studies have provided conflicting results. The present study provides an in-depth examination of the face validity of the DSQ-40. Eight clinicians independently attributed each item of the DSQ-40 to a defense mechanism. The defense mechanisms listed in the DSM-IV Defensive Functioning Scale and their definitions were provided as a guide, along with the definition of those defense mechanisms investigated by the DSQ that are not included. It was further specified that the raters could attribute the items to defense mechanisms other than those listed or coping mechanisms. Twelve items out of 40 (30%) were attributed to the defense mechanisms they were supposed to investigate by fewer than four out of the eight raters. This result suggests that a substantial part of the DSQ-40 is lacking in face validity.
A note on monotonicity of item response functions for ordered polytomous item response theory models.

PubMed

Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua

2018-03-08

A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

PubMed

Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

2018-02-01

The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency.

PubMed

Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E

2014-05-01

To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.
Methodology for the development and calibration of the SCI-QOL item banks.

PubMed

Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

2015-05-01

To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

ERIC Educational Resources Information Center

Sydorenko, Tetyana

2011-01-01

This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
Development and Validation of Participation and Positive Psychologic Function Measures for Stroke Survivors

PubMed Central

Bode, Rita K.; Heinemann, Allen W.; Butt, Zeeshan; Stallings, Jena; Taylor, Caitlin; Rowe, Morgan; Roth, Elliot J.

2013-01-01

Bode RK, Heinemann AW, Butt Z, Stallings J, Taylor C, Rowe M, Roth EJ. Development and validation of participation and positive psychologic function measures for stroke survivors. Objective To evaluate the reliability and validity of Neurologic Quality of Life (NeuroQOL) item banks that assess quality-of-life (QOL) domains not typically included in poststroke measures. Design Secondary analysis of item responses to selected NeuroQOL domains. Setting Community. Participants Community-dwelling stroke survivors (n=111) who were at least 12 months poststroke. Interventions Not applicable. Main Outcome Measures Five measures developed for 3 NeuroQoL domains: ability to participate in social activities, satisfaction with participation in social activities, and positive psychologic function. Results A single bank was developed for the positive psychologic function domain, but 2 banks each were developed for the ability-to-participate and satisfaction-with-participation domains. The resulting item banks showed good psychometric properties and external construct validity with correlations with the legacy instruments, ranging from .53 to .71. Using these measures, stroke survivors in this sample reported an overall high level of QOL. Conclusions The NeuroQoL-derived measures are promising and valid methods for assessing aspects of QOL not typically measured in this population. PMID:20801251
Scientific literacy: Factor structure and gender differences

NASA Astrophysics Data System (ADS)

Manhart, James Joseph

The purpose of this study was to investigate the factor structure of scientific literacy and to document any gender differences with respect to each factor. Participants included 1139 students (574 females, 565 males) in grades 9 through 12 who were taking a science class at one of four Midwestern high schools. Based on National Science Education Standards, a 100 item multiple-choice test was constructed to assess scientific literacy. Confirmatory factor analysis of item parcels suggested a three factor model was the best way to explain the data resulting from the administration of this test. The factors were labeled constructs of science, abilities necessary to do scientific inquiry, and social aspects of science. Gender differences with respect to these factors were examined using analysis of variance procedures. Because differential enrollment in science classes could cause gender differences in grades 11 and 12, parallel analyses were conducted on the grades 9 and 10 subsample and the grades 11 and 12 subsample. However, the results of the two analyses were similar. The most consistent gender difference observed was that females performed better than males on the social aspects of science factor. Males tended to perform better than females on the constructs of science factor, although no consistent gender difference was noted for items dealing with life science. With respect to the abilities necessary to do scientific inquiry factor, females tended to perform better than males in grades 9 and 10, while no consistent gender difference was observed in grades 11 and 12. Gender differences were also examined using the Mantel-Haenszel procedure to flag individual items that functioned differently for females and males of the same ability. Twelve items were flagged for grades 9 and 10 (8 in favor of females, 4 in favor of males). Fourteen items were flagged for grades 11 and 12 (7 in favor of females, 7 in favor of males). All of the flagged items exhibited only small to moderate differential item functioning (DIF). Only three items were similarly flagged in both subsamples, one item from each factor.
Measurement equivalence and differential item functioning in family psychology.

PubMed

Bingenheimer, Jeffrey B; Raudenbush, Stephen W; Leventhal, Tama; Brooks-Gunn, Jeanne

2005-09-01

Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui and H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (c) 2005 APA, all rights reserved
Examining Power and Type 1 Error for Step and Item Level Tests of Invariance: Investigating the Effect of the Number of Item Score Levels

ERIC Educational Resources Information Center

Ayodele, Alicia Nicole

2017-01-01

Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…
The Value of the Studied Item in the Matching Criterion in Differential Item Functioning (DIF) Analysis. Research Report. ETS RR-10-13

ERIC Educational Resources Information Center

Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan

2010-01-01

The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…
Using the Cumulative Common Log-Odds Ratio to Identify Differential Item Functioning of Rating Scale Items in the Exercise and Sport Sciences

ERIC Educational Resources Information Center

Penfield, Randall D.; Giacobbi, Peter R., Jr.; Myers, Nicholas D.

2007-01-01

One aspect of construct validity is the extent to which the measurement properties of a rating scale are invariant across the groups being compared. An increasingly used method for assessing between-group differences in the measurement properties of items of a scale is the framework of differential item functioning (DIF). In this paper we…
Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor

ERIC Educational Resources Information Center

Shih, Ching-Lin; Wang, Wen-Chung

2009-01-01

The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general,…

Kernel-Smoothing Estimation of Item Characteristic Functions for Continuous Personality Items: An Empirical Comparison with the Linear and the Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere J.

2004-01-01

This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
Differential Item Functioning Analysis of the "Preschool Language Scale-4" between English-Speaking Hispanic and European American Children from Low-Income Families

ERIC Educational Resources Information Center

Qi, Cathy Huaqing; Marley, Scott C.

2009-01-01

The study examined whether item bias is present in the "Preschool Language Scale-4" (PLS-4). Participants were 440 children (3-5 years old; 86% English-speaking Hispanic and 14% European American) who were enrolled in Head Start programs. The PLS-4 items were analyzed for differential item functioning (DIF) using logistic regression and…
Rasch analysis of the UK Functional Assessment Measure in patients with complex disability after stroke.

PubMed

Medvedev, Oleg N; Turner-Stokes, Lynne; Ashford, Stephen; Siegert, Richard J

2018-02-28

To determine whether the UK Functional Assessment Measure (UK FIM+FAM) fits the Rasch model in stroke patients with complex disability and, if so, to derive a conversion table of Rasch-transformed interval level scores. The sample included a UK multicentre cohort of 1,318 patients admitted for specialist rehabilitation following a stroke. Rasch analysis was conducted for the 30-item scale including 3 domains of items measuring physical, communication and psychosocial functions. The fit of items to the Rasch model was examined using 3 different analytical approaches referred to as "pathways". The best fit was achieved in the pathway where responses from motor, communication and psychosocial domains were summarized into 3 super-items and where some items were split because of differential item functioning (DIF) relative to left and right hemisphere location (χ2 (10) = 14.48, p = 0.15). Re-scoring of items showing disordered thresholds did not significantly improve the overall model fit. The UK FIM+FAM with domain super-items satisfies expectations of the unidimensional Rasch model without the need for re-scoring. A conversion table was produced to convert the total scale scores into interval-level data based on person estimates of the Rasch model. The clinical benefits of interval-transformed scores require further evaluation.
T111. PANSS NEGATIVE SYMPTOM DIMENSIONS ACROSS GEOGRAPHICAL REGIONS: IMPLICATIONS FOR SOCIAL, LINGUISTIC AND CULTURAL CONSISTENCY

PubMed Central

Khan, Anzalee; Liharska, Lora; Harvey, Philip; Atkins, Alexandra; Keefe, Richard; Ulshen, Danny

2018-01-01

Abstract Background Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are conceptualized across geographical regions may result in better understanding and treatment. The expressive-experiential distinction has been shown to have vast importance in relation to functional outcomes in schizophrenia. Previous studies have shown that the PANSS may not be equivalently rated across counties and cultures, suggesting regional differences in both symptom expression and rater judgment of symptom severity. Items that perform in markedly different ways across demographic, regional, cultural, or clinical severity characteristics may not offer valid representations of the target construct. 1) Will the expressive and experiential dimensions of the PANSS vary over 15 geographical regions and will the item ratings defining each dimension manifest similar reliability across these regions? 2) In large multi-center, international trials where data are combined, which of the two dimensions are disposed to social, linguistic and cultural inconsistency? Methods Data was obtained for the baseline PANSS visits of 6,889 subjects. Using Confirmatory Factor Analysis (CFA), we examined whether the expressive-experiential distinction would be replicated in our sample. We investigated the validity of the expressive-experiential distinction using Differential Item Functioning (DIF; Mantel-Haenszel) across 15 geographical regions – South America-Mexico, Austria-Germany, Belgium-Netherlands, Brazil, Canada, Nordic regions (Denmark, Finland, Norway, Sweden), France, Great Britain, India, Italy, Poland, Eastern Europe (Romania, Slovakia, Ukraine, Croatia, Estonia, Czech Republic), Russia, South Africa, and Spain - as compared to the United States. Results Expressive Deficit: More DIF was observed for items in the Expressive deficit factor than for items relating to experiential deficits. The following regions showed at least moderate to large DIF for all items: Austria-Germany, Nordic, France, and Poland. Of all the items, N3 Poor Rapport showed the most moderate and large DIF (n = 13; 86.67%) across countries, with 7 countries reporting large DIF. Similarly, N6 Lack of Spontaneity and Flow of Conversation showed moderate and large DIF for 66.67% countries (n=10). Experiential Deficit: Item G16 Active Social Avoidance reported negligible DIF for 14 of the 15 countries investigated (93.33%). Large DIF was observed for N2 Emotional Withdrawal and N4 Passive Apathetic Social Withdrawal for Brazil and India. Seven regions demonstrated no DIF across all items of the PANSS experiential deficit factor (South America-Mexico, Belgium-Netherlands, Nordic, Great Britain, Eastern Europe, Russia, and Spain). Overall, there were many fewer observed items with large DIF for PANSS experiential domain. Discussion These results suggest that the PANSS Negative Symptoms Factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in ratings on the PANSS expressive items, but not the experiential items, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences in the interpretation of items, rater training, or understanding of scoring anchors. Knowing which items are challenging for raters across regions can help guide PANSS training to improve results of international clinical trials aimed at negative symptoms.
Independent Orbiter Assessment (IOA): Analysis of the auxiliary power unit

NASA Technical Reports Server (NTRS)

Barnes, J. E.

1986-01-01

The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Auxiliary Power Unit (APU). The APUs are required to provide power to the Orbiter hydraulics systems during ascent and entry flight phases for aerosurface actuation, main engine gimballing, landing gear extension, and other vital functions. For analysis purposes, the APU system was broken down into ten functional subsystems. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. A preponderance of 1/1 criticality items were related to failures that allowed the hydrazine fuel to escape into the Orbiter aft compartment, creating a severe fire hazard, and failures that caused loss of the gas generator injector cooling system.
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.

PubMed

Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A

2017-09-01

The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

ERIC Educational Resources Information Center

Magis, David; Facon, Bruno

2013-01-01

Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Evaluation of Item Candidates: The PROMIS Qualitative Item Review

PubMed Central

DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.

2009-01-01

One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114
The Cognitive Assessment Interview (CAI): Development and Validation of an Empirically Derived, Brief Interview-Based Measure of Cognition

PubMed Central

Ventura, Joseph; Reise, Steven P.; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert M.

2011-01-01

Background Practical, reliable “real world” measures of cognition are needed to supplement neurocognitive performance data to evaluate possible efficacy of new drugs targeting cognitive deficits associated with schizophrenia. Because interview-based measures of cognition offer one possible approach, data from the MATRICS initiative (n=176) were used to examine the psychometric properties of the Schizophrenia Cognition Rating Scale (SCoRS) and the Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS). Method We used classical test theory methods and item response theory to derive the 10 item Cognitive Assessment Interview (CAI) from the SCoRS and CGI-Cogs (“parent instruments”). Sources of information for CAI ratings included the patient and an informant. Validity analyses examined the relationship between the CAI and objective measures of cognitive functioning, intermediate measures of cognition, and functional outcome. Results The rater’s score from the newly derived CAI (10-items) correlate highly (r = .87) with those from the combined set of the SCoRS and CGI-CogS (41 items). Both the patient (r= .82) and the informant (r= .95) data were highly correlated with the rater’s score. The CAI was modestly correlated with objectively measured neurocognition (r = −.32), functional capacity (r = −.44), and functional outcome (r = −.32), which was comparable to the parent instruments. Conclusions The CAI allows for expert judgment in evaluating a patient’s cognitive functioning and was modestly correlated with neurocognitive functioning, functional capacity, and functional outcome. The CAI is a brief, repeatable, and potentially valuable tool for rating cognition in schizophrenia patients who are participating in clinical trials. PMID:20542412
Effects of aging on neural connectivity underlying selective memory for emotional scenes

PubMed Central

Waring, Jill D.; Addis, Donna Rose; Kensinger, Elizabeth A.

2012-01-01

Older adults show age-related reductions in memory for neutral items within complex visual scenes, but just like young adults, older adults exhibit a memory advantage for emotional items within scenes compared with the background scene information. The present study examined young and older adults’ encoding-stage effective connectivity for selective memory of emotional items versus memory for both the emotional item and its background. In a functional magnetic resonance imaging (fMRI) study, participants viewed scenes containing either positive or negative items within neutral backgrounds. Outside the scanner, participants completed a memory test for items and backgrounds. Irrespective of scene content being emotionally positive or negative, older adults had stronger positive connections among frontal regions and from frontal regions to medial temporal lobe structures than did young adults, especially when items and backgrounds were subsequently remembered. These results suggest there are differences between young and older adults’ connectivity accompanying the encoding of emotional scenes. Older adults may require more frontal connectivity to encode all elements of a scene rather than just encoding the emotional item. PMID:22542836
Effects of aging on neural connectivity underlying selective memory for emotional scenes.

PubMed

Waring, Jill D; Addis, Donna Rose; Kensinger, Elizabeth A

2013-02-01

Older adults show age-related reductions in memory for neutral items within complex visual scenes, but just like young adults, older adults exhibit a memory advantage for emotional items within scenes compared with the background scene information. The present study examined young and older adults' encoding-stage effective connectivity for selective memory of emotional items versus memory for both the emotional item and its background. In a functional magnetic resonance imaging (fMRI) study, participants viewed scenes containing either positive or negative items within neutral backgrounds. Outside the scanner, participants completed a memory test for items and backgrounds. Irrespective of scene content being emotionally positive or negative, older adults had stronger positive connections among frontal regions and from frontal regions to medial temporal lobe structures than did young adults, especially when items and backgrounds were subsequently remembered. These results suggest there are differences between young and older adults' connectivity accompanying the encoding of emotional scenes. Older adults may require more frontal connectivity to encode all elements of a scene rather than just encoding the emotional item. Published by Elsevier Inc.
Diagnostic Utility of Craving in Predicting Nicotine Dependence: Impact of Craving Content and Item Stability

PubMed Central

2013-01-01

Introduction: Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Methods: Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Results: Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Conclusions: Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed. PMID:23817585
The Usefulness of Differential Item Functioning Methodology in Longitudinal Intervention Studies

USDA-ARS?s Scientific Manuscript database

Perceived self-efficacy (SE) for engaging in physical activity (PA) is a key variable mediating PA change in interventions. The purpose of this study is to demonstrate the usefulness of item response modeling-based (IRM) differential item functioning (DIF) in the investigation of group differences ...
DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange

ERIC Educational Resources Information Center

Penfield, Randall D.

2005-01-01

Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…
41 CFR 101-26.100-1 - Procurement of lowest cost items.

Code of Federal Regulations, 2010 CFR

2010-07-01

... similar items to meet particular end-use requirements under the GSA stock program, special order program... functional end-use procurement needs of the various ordering agencies. Therefore, in submitting requisitions... source from which the lowest cost item can be obtained which will adequately serve the functional end-use...
Does Gender-Specific Differential Item Functioning Affect the Structure in Vocational Interest Inventories?

ERIC Educational Resources Information Center

Beinicke, Andrea; Pässler, Katja; Hell, Benedikt

2014-01-01

The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…
Computerized Adaptive Testing Provides Reliable and Efficient Depression Measurement Using the CES-D Scale

PubMed Central

2017-01-01

Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Evaluating the Comparability of Paper-and-Pencil and Computerized Versions of a Large-Scale Certification Test. Research Report. ETS RR-05-21

ERIC Educational Resources Information Center

Puhan, Gautam; Boughton, Keith A.; Kim, Sooyeon

2005-01-01

The study evaluated the comparability of two versions of a teacher certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). Standardized mean difference (SMD) and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that effect sizes…
Differential item functioning magnitude and impact measures from item response theory models.

PubMed

Kleinman, Marjorie; Teresi, Jeanne A

2016-01-01

Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Psychometric evaluation of Persian Nomophobia Questionnaire: Differential item functioning and measurement invariance across gender.

PubMed

Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H

2018-03-01

Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.

Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages.

PubMed

Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J

2016-01-01

The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Translation and cross-cultural adaptation has been carried out following the forward-backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option 'not applicable' was added to two items of the ASAS HI to improve appropriateness. This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures.
Investigation of the effects of mirror therapy on the upper extremity functions of stroke patients using the manual function test.

PubMed

Kim, Hwanhee; Shim, Jemyung

2015-01-01

[Purpose] The purpose of this study was to investigate the effects of mirror therapy on the upper extremity functions of stroke patients. [Subjects] The subjects of this study were 14 hemiplegia patients (8 males, 6 females; 9 infarction, 5 hemorrhage; 8 right hemiplegia, 6 left hemiplegia) who voluntarily consented to participate in the study. [Methods] The Korean version of the manual function test (MFT) was used in this study. The test was performed in the following order: arm movement (4 items), grasp and pinch (2 items), and manipulation (2 items). The experiment was conducted with the subjects sitting in a chair. The mirror was vertically placed in the sagittal plane on the desk. The paretic hand was placed behind the mirror, and the non-paretic hand was placed in front of the mirror so that it was reflected in the mirror. In this position, the subjects completed activities repetitively according to the mirror therapy program over the course of four weeks. [Results] There were significant increases in the grasp-and-pinch score and manipulation score. [Conclusion] In conclusion, the grasp-and-pinch and manipulation functions were improved through mirror therapy.
A Rasch Differential Item Functioning Analysis of the Massachusetts Youth Screening Instrument: Identifying Race and Gender Differential Item Functioning among Juvenile Offenders

ERIC Educational Resources Information Center

Cauffman, Elizabeth; MacIntosh, Randall

2006-01-01

The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.…
An item-oriented recommendation algorithm on cold-start problem

NASA Astrophysics Data System (ADS)

Qiu, Tian; Chen, Guang; Zhang, Zi-Ke; Zhou, Tao

2011-09-01

Based on a hybrid algorithm incorporating the heat conduction and probability spreading processes (Proc. Natl. Acad. Sci. U.S.A., 107 (2010) 4511), in this letter, we propose an improved method by introducing an item-oriented function, focusing on solving the dilemma of the recommendation accuracy between the cold and popular items. Differently from previous works, the present algorithm does not require any additional information (e.g., tags). Further experimental results obtained in three real datasets, RYM, Netflix and MovieLens, show that, compared with the original hybrid method, the proposed algorithm significantly enhances the recommendation accuracy of the cold items, while it keeps the recommendation accuracy of the overall and the popular items. This work might shed some light on both understanding and designing effective methods for long-tailed online applications of recommender systems.
Is selective attention the basis for selective imitation in infants? An eye-tracking study of deferred imitation with 12-month-olds.

PubMed

Kolling, Thorsten; Oturai, Gabriella; Knopf, Monika

2014-08-01

Infants and children do not blindly copy every action they observe during imitation tasks. Research demonstrated that infants are efficient selective imitators. The impact of selective perceptual processes (selective attention) for selective deferred imitation, however, is still poorly described. The current study, therefore, analyzed 12-month-old infants' looking behavior during demonstration of two types of target actions: arbitrary versus functional actions. A fully automated remote eye tracker was used to assess infants' looking behavior during action demonstration. After a 30-min delay, infants' deferred imitation performance was assessed. Next to replicating a memory effect, results demonstrate that infants do imitate significantly more functional actions than arbitrary actions (functionality effect). Eye-tracking data show that whereas infants do not fixate significantly longer on functional actions than on arbitrary actions, amount of fixations and amount of saccades differ between functional and arbitrary actions, indicating different encoding mechanisms. In addition, item-level findings differ from overall findings, indicating that perceptual and conceptual item features influence looking behavior. Looking behavior on both the overall and item levels, however, does not relate to deferred imitation performance. Taken together, the findings demonstrate that, on the one hand, selective imitation is not explainable merely by selective attention processes. On the other hand, notwithstanding this reasoning, attention processes on the item level are important for encoding processes during target action demonstration. Limitations and future studies are discussed. Copyright © 2014 Elsevier Inc. All rights reserved.
Comparisons of mathematics achievement of grade 8 students in the United States and the Russian Federation.

PubMed

Bazarova, Saodat I; Engelhard, George

2004-01-01

Using the Mantel-Haenszel (MH) Procedure, we analyzed data for 7,087 American and 4,022 Russian Grade 8 students from the Third International Mathematics and Science Study (TIMSS) to compare mathematics achievement in the two countries on each of the 124 multiple-choice items. The results of the analyses indicate that the performance of the students on individual multiple-choice mathematics items vary by country. The results also suggest that the relationship between country and item performance differ as a function of content area. A total score of a country's achievement does not provide the whole picture of achievement dynamics; it averages out potentially important information on student achievement and the causes of their performance relative to other countries. The dynamics of achievement across countries will not be revealed unless the analyses are done at the item level.
Development and validation of the Overall Depression Severity and Impairment Scale.

PubMed

Bentley, Kate H; Gallagher, Matthew W; Carl, Jenna R; Barlow, David H

2014-09-01

The need to capture severity and impairment of depressive symptomatology is widespread. Existing depression scales are lengthy and largely focus on individual symptoms rather than resulting impairment. The Overall Depression Severity and Impairment Scale (ODSIS) is a 5-item, continuous measure designed for use across heterogeneous mood disorders and with subthreshold depressive symptoms. This study examined the psychometric properties of the ODSIS in outpatients in a clinic for emotional disorders (N = 100), undergraduate students (N = 566), and community-based adults (N = 189). Internal consistency, latent structure, item response theory, classification accuracy, convergent and discriminant validity, and differential item functioning analyses were conducted. ODSIS scores exhibited excellent internal consistency, and confirmatory factor analyses supported a unidimensional structure. Item response theory results demonstrated that the ODSIS provides more information about individuals with high levels of depression than those with low levels of depression. Responses on the ODSIS discriminated well between individuals with and without a mood disorder and depression-related severity across clinical and subclinical levels. A cut score of 8 correctly classified 82% of outpatients as with or without a mood disorder; it evidenced a favorable balance of sensitivity and specificity and of positive and negative predictive values. The ODSIS demonstrated good convergent and discriminant validity, and results indicate that items function similarly across clinical and nonclinical samples. Overall, findings suggest that the ODSIS is a valid tool for measuring depression-related severity and impairment. The brevity and ease of use of the ODSIS support its utility for screening and monitoring treatment response across a variety of settings. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Development and initial psychometric evaluation of an item bank created to measure upper extremity function in persons with stroke.

PubMed

Higgins, Johanne; Finch, Lois E; Kopec, Jacek; Mayo, Nancy E

2010-02-01

To create and illustrate the development of a method to parsimoniously and hierarchically assess upper extremity function in persons after stroke. Data were analyzed using Rasch analysis. Re-analysis of data from 8 studies involving persons after stroke. Over 4000 patients with stroke who participated in various studies in Montreal and elsewhere in Canada. Data comprised 17 tests or indices of upper extremity function and health-related quality of life, for a total of 99 items related to upper extremity function. Tests and indices included, among others, the Box and Block Test, the Nine-Hole Peg Test and the Stroke Impact Scale. Data were collected at various times post-stroke from 3 days to 1 year. Once the data fit the model, a bank of items measuring upper extremity function with persons and items organized hierarchically by difficulty and ability in log units was produced. This bank forms the basis for eventual computer adaptive testing. The calibration of the items should be tested further psychometrically, as should the interpretation of the metric arising from using the item calibration to measure the upper extremity of individuals.
Translation, Validation, and Reliability of the Dutch Late-Life Function and Disability Instrument Computer Adaptive Test.

PubMed

Arensman, Remco M; Pisters, Martijn F; de Man-van Ginkel, Janneke M; Schuurmans, Marieke J; Jette, Alan M; de Bie, Rob A

2016-09-01

Adequate and user-friendly instruments for assessing physical function and disability in older adults are vital for estimating and predicting health care needs in clinical practice. The Late-Life Function and Disability Instrument Computer Adaptive Test (LLFDI-CAT) is a promising instrument for assessing physical function and disability in gerontology research and clinical practice. The aims of this study were: (1) to translate the LLFDI-CAT to the Dutch language and (2) to investigate its validity and reliability in a sample of older adults who spoke Dutch and dwelled in the community. For the assessment of validity of the LLFDI-CAT, a cross-sectional design was used. To assess reliability, measurement of the LLFDI-CAT was repeated in the same sample. The item bank of the LLFDI-CAT was translated with a forward-backward procedure. A sample of 54 older adults completed the LLFDI-CAT, World Health Organization Disability Assessment Schedule 2.0, RAND 36-Item Short-Form Health Survey physical functioning scale (10 items), and 10-Meter Walk Test. The LLFDI-CAT was repeated in 2 to 8 days (mean=4.5 days). Pearson's r and the intraclass correlation coefficient (ICC) (2,1) were calculated to assess validity, group-level reliability, and participant-level reliability. A correlation of .74 for the LLFDI-CAT function scale and the RAND 36-Item Short-Form Health Survey physical functioning scale (10 items) was found. The correlations of the LLFDI-CAT disability scale with the World Health Organization Disability Assessment Schedule 2.0 and the 10-Meter Walk Test were -.57 and -.53, respectively. The ICC (2,1) of the LLFDI-CAT function scale was .84, with a group-level reliability score of .85. The ICC (2,1) of the LLFDI-CAT disability scale was .76, with a group-level reliability score of .81. The high percentage of women in the study and the exclusion of older adults with recent joint replacement or hospitalization limit the generalizability of the results. The Dutch LLFDI-CAT showed strong validity and high reliability when used to assess physical function and disability in older adults dwelling in the community. © 2016 American Physical Therapy Association.
Physical performance testing in mucopolysaccharidosis I: a pilot study.

PubMed

Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F

2004-01-01

To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.
Measurement characteristics for two health-related quality of life measures in older adults: The SF-36 and the CDC Healthy Days items.

PubMed

Barile, John P; Horner-Johnson, Willi; Krahn, Gloria; Zack, Matthew; Miranda, David; DeMichele, Kimberly; Ford, Derek; Thompson, William W

2016-10-01

The Short Form Health Survey (SF-36) and the Centers for Disease Control and Prevention (CDC) Healthy Days items are well known measures of health-related quality of life. The validity of the SF-36 for older adults and those with disabilities has been questioned. Assess the extent to which the SF-36 and the Centers for Disease Control and Prevention (CDC) Healthy Days items measure the same aspects of health; whether the SF-36 and the CDC unhealthy days items are invariant across gender, functional status, or the presence of chronic health conditions of older adults; and whether each of the SF-36's eight subscales is independently associated with the CDC Healthy Days items. We analyzed data from 66,269 adult Medicare advantage members age 65 and older. We used confirmatory factor analyses and regression modeling to test associations between the CDC Healthy Days items and subscales of the SF-36. The CDC Healthy Days items were associated with the SF-36 global measures of physical and mental health. The CDC physically unhealthy days item was associated with the SF-36 subscales for bodily pain, physical role limitations, and general health, while the CDC mentally unhealthy days item was associated with the SF-36 subscales for mental health, emotional role limitations, vitality and social functioning. The SF-36 physical functioning subscale was not independently associated with either of the CDC Healthy Days items. The CDC Healthy Days items measure similar domains as the SF-36 but appear to assess HRQOL without regard to limitations in functioning. Copyright © 2016 Elsevier Inc. All rights reserved.
Recent advances in analysis of differential item functioning in health research using the Rasch model.

PubMed

Hagquist, Curt; Andrich, David

2017-09-19

Rasch analysis with a focus on Differential Item Functioning (DIF) is increasingly used for examination of psychometric properties of health outcome measures. To take account of DIF in order to retain precision of measurement, split of DIF-items into separate sample specific items has become a frequently used technique. The purpose of the paper is to present and summarise recent advances of analysis of DIF in a unified methodology. In particular, the paper focuses on the use of analysis of variance (ANOVA) as a method to simultaneously detect uniform and non-uniform DIF, the need to distinguish between real and artificial DIF and the trade-off between reliability and validity. An illustrative example from health research is used to demonstrate how DIF, in this case between genders, can be identified, quantified and under specific circumstances accounted for using the Rasch model. Rasch analyses of DIF were conducted of a composite measure of psychosomatic problems using Swedish data from the Health Behaviour in School-aged Children study for grade 9 students collected during the 1985-2014 time periods. The procedures demonstrate how DIF can be identified efficiently by ANOVA of residuals, and how the magnitude of DIF can be quantified and potentially accounted for by resolving items according to identifiable groups and using principles of test equating on the resolved items. The results of the analysis also show that the real DIF in some items does affect person measurement estimates. Firstly, in order to distinguish between real and artificial DIF, the items showing DIF initially should not be resolved simultaneously but sequentially. Secondly, while resolving instead of deleting a DIF item may retain reliability, both options may affect the content validity negatively. Resolving items with DIF is not justified if the source of the DIF is relevant for the content of the variable; then resolving DIF may deteriorate the validity of the instrument. Generally, decisions on resolving items to deal with DIF should also rely on external information.
Assessing items on the SF-8 Japanese version for health-related quality of life: a psychometric analysis based on the nominal categories model of item response theory.

PubMed

Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya

2009-06-01

The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Evaluation of the Patient-Reported Outcomes Information System (PROMIS(®)) Spanish-language physical functioning items.

PubMed

Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D

2013-09-01

To evaluate the equivalence of the PROMIS(®) physical functioning item bank by language of administration (English versus Spanish). The PROMIS(®) wave 1 English-language physical functioning bank consists of 124 items, and 114 of these were translated into Spanish. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were calculated. The IRT assumption of unidimensionality was evaluated by fitting a single-factor confirmatory factor analytic model. IRT threshold and discrimination parameters were estimated using Samejima's Graded Response Model. DIF by language of administration was evaluated. Item means ranged from 2.53 (SD = 1.36) to 4.62 (SD = 0.82). Coefficient alpha was 0.99, and item-rest correlations ranged from 0.41 to 0.89. A one-factor model fits the data well (CFI = 0.971, TLI = 0.970, and RMSEA = 0.052). The slope parameters ranged from 0.45 ("Are you able to run 10 miles?") to 4.50 ("Are you able to put on a shirt or blouse?"). The threshold parameters ranged from -1.92 ("How much do physical health problems now limit your usual physical activities (such as walking or climbing stairs)?") to 6.06 ("Are you able to run 10 miles?"). Fifty of the 114 items were flagged for DIF based on an R(2) of 0.02 or above criterion. The expected total score was higher for Spanish- than English-language respondents. English- and Spanish-speaking subjects with the same level of underlying physical function responded differently to 50 of 114 items. This study has important implications in the study of physical functioning among diverse populations.
Internal validity of a household food security scale is consistent among diverse populations participating in a food supplement program in Colombia

PubMed Central

Hackett, Michelle; Melgar-Quinonez, Hugo; Uribe, Martha C Alvarez

2008-01-01

Objective We assessed the validity of a locally adapted Colombian Household Food Security Scale (CHFSS) used as a part of the 2006 evaluation of the food supplement component of the Plan for Improving Food and Nutrition in Antioquia, Colombia (MANA – Plan Departamental de Seguridad Alimentaria y Nutricional de Antioquia). Methods Subjects included low-income families with pre-school age children in MANA that responded affirmatively to at least one CHFSS item (n = 1,319). Rasch Modeling was used to evaluate the psychometric characteristics of the items through measure and INFIT values. Differences in CHFSS performance were assessed by area of residency, socioeconomic status and number of children enrolled in MANA. Unidimensionality of a scale by group was further assessed using Differential Item Functioning (DIF). Results Most CHFSS items presented good fitness with most INFIT values within the adequate range of 0.8 to 1.2. Consistency in item measure values between groups was found for all but two items in the comparison by area of residency. Only two adult items exhibited DIF between urban and rural households. Conclusion The results indicate that the adapted CHFSS is a valid tool to assess the household food security of participants in food assistance programs like MANA. PMID:18500988
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

PubMed

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model

ERIC Educational Resources Information Center

Suh, Youngsuk

2016-01-01

This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Explaining Crossing DIF in Polytomous Items Using Differential Step Functioning Effects

ERIC Educational Resources Information Center

Penfield, Randall D.

2010-01-01

Crossing, or intersecting, differential item functioning (DIF) is a form of nonuniform DIF that exists when the sign of the between-group difference in expected item performance changes across the latent trait continuum. The presence of crossing DIF presents a problem for many statistics developed for evaluating DIF because positive and negative…
Testing for Differential Item Functioning with Measures of Partial Association

ERIC Educational Resources Information Center

Woods, Carol M.

2009-01-01

Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for one group of people versus another, irrespective of mean differences on the construct. There are many methods available for DIF assessment. The present article is focused on indices of partial association. A family of average…
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning

ERIC Educational Resources Information Center

Finch, W. Holmes

2011-01-01

Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…

Semiparametric Item Response Functions in the Context of Guessing

ERIC Educational Resources Information Center

Falk, Carl F.; Cai, Li

2016-01-01

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)

ERIC Educational Resources Information Center

Arenson, Ethan A.; Karabatsos, George

2017-01-01

Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

ERIC Educational Resources Information Center

Woods, Carol M.; Grimm, Kevin J.

2011-01-01

In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination

PubMed Central

Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.

2014-01-01

Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
Using the Oxford Foot Model to determine the association between objective measures of foot function and results of the AOFAS Ankle-Hindfoot Scale and the Foot Function Index: a prospective gait analysis study in Germany.

PubMed

Kostuj, Tanja; Stief, Felix; Hartmann, Kirsten Anna; Schaper, Katharina; Arabmotlagh, Mohammad; Baums, Mike H; Meurer, Andrea; Krummenauer, Frank; Lieske, Sebastian

2018-04-05

After cross-cultural adaption for the German translation of the Ankle-Hindfoot Scale of the American Orthopaedic Foot and Ankle Society (AOFAS-AHS) and agreement analysis with the Foot Function Index (FFI-D), the following gait analysis study using the Oxford Foot Model (OFM) was carried out to show which of the two scores better correlates with objective gait dysfunction. Results of the AOFAS-AHS and FFI-D, as well as data from three-dimensional gait analysis were collected from 20 patients with mild to severe ankle and hindfoot pathologies.Kinematic and kinetic gait data were correlated with the results of the total AOFAS scale and FFI-D as well as the results of those items representing hindfoot function in the AOFAS-AHS assessment. With respect to the foot disorders in our patients (osteoarthritis and prearthritic conditions), we correlated the total range of motion (ROM) in the ankle and subtalar joints as identified by the OFM with values identified during clinical examination 'translated' into score values. Furthermore, reduced walking speed, reduced step length and reduced maximum ankle power generation during push-off were taken into account and correlated to gait abnormalities described in the scores. An analysis of correlations with CIs between the FFI-D and the AOFAS-AHS items and the gait parameters was performed by means of the Jonckheere-Terpstra test; furthermore, exploratory factor analysis was applied to identify common information structures and thereby redundancy in the FFI-D and the AOFAS-AHS items. Objective findings for hindfoot disorders, namely a reduced ROM, in the ankle and subtalar joints, respectively, as well as reduced ankle power generation during push-off, showed a better correlation with the AOFAS-AHS total score-as well as AOFAS-AHS items representing ROM in the ankle, subtalar joints and gait function-compared with the FFI-D score.Factor analysis, however, could not identify FFI-D items consistently related to these three indicator parameters (pain, disability and function) found in the AOFAS-AHS. Furthermore, factor analysis did not support stratification of the FFI-D into two subscales. The AOFAS-AHS showed a good agreement with objective gait parameters and is therefore better suited to evaluate disability and functional limitations of patients suffering from foot and ankle pathologies compared with the FFI-D. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Work Functioning Among Firefighters: A Comparison Between Self-Reported Limitations and Functional Task Performance.

PubMed

MacDermid, Joy C; Tang, Kenneth; Sinden, Kathryn E; D'Amico, Robert

2018-05-25

Purpose Performance-based and disease indicators have been widely studied in firefighters; self-reported work role limitations have not. The aim of this study was to describe the distributions and correlations of a generic self-reported Work Limitations Questionnaire (WLQ-26) and firefighting-specific task performance-based tests. Methods Active firefighters from the City of Hamilton Fire Services (n = 293) were recruited. Participants completed the WLQ-26 to quantify on-the-job difficulties over five work domains: work scheduling (4 items), output demands (7 items), physical demands (8 items), mental demands (4 items), and social demands (3 items). A subset of participants (n = 149) were also assessed on hose drag and stair climb with a high-rise pack performance-based tests. Descriptive statistics and correlations were used to compare item/subscale performance; and to describe the inter-relationships between tests. Results The mean WLQ-26 item scores (/5) ranged from 4.1 to 4.4 (median = 5 for all items); most firefighters (54.5-80.5%) selected "difficult none of the time" response option on all items. A substantial ceiling effect was observed across all five WLQ-26 subscales as 44.0-55.6% were in the highest category. Subscale means ranged from 61.8 (social demands) to 78.7 (output demands and physical demands). Internal consistency exceeded 0.90 on all subscales. For the hose drag task, the mean time-to-completion was 48.0 s (SD = 14.5; range 20.4-95.0). For the stair climb task, the mean time-to-completion was 76.7 s (SD = 37.2; range 21.0-218.0). There were no significant correlations between self-report work limitations and performance of firefighting tasks. Conclusions The WLQ-26 measured five domains, but had ceiling effects in firefighters. Performance-based testing showed wider score range, lacked ceiling effects and did not correlate to the WLQ-26. A firefighter-specific, self-report role functioning scale may be needed to identify compromised work role capabilities in firefighters.
The effects of relative food item size on optimal tooth cusp sharpness during brittle food item processing

PubMed Central

Berthaume, Michael A.; Dumont, Elizabeth R.; Godfrey, Laurie R.; Grosse, Ian R.

2014-01-01

Teeth are often assumed to be optimal for their function, which allows researchers to derive dietary signatures from tooth shape. Most tooth shape analyses normalize for tooth size, potentially masking the relationship between relative food item size and tooth shape. Here, we model how relative food item size may affect optimal tooth cusp radius of curvature (RoC) during the fracture of brittle food items using a parametric finite-element (FE) model of a four-cusped molar. Morphospaces were created for four different food item sizes by altering cusp RoCs to determine whether optimal tooth shape changed as food item size changed. The morphospaces were also used to investigate whether variation in efficiency metrics (i.e. stresses, energy and optimality) changed as food item size changed. We found that optimal tooth shape changed as food item size changed, but that all optimal morphologies were similar, with one dull cusp that promoted high stresses in the food item and three cusps that acted to stabilize the food item. There were also positive relationships between food item size and the coefficients of variation for stresses in food item and optimality, and negative relationships between food item size and the coefficients of variation for stresses in the enamel and strain energy absorbed by the food item. These results suggest that relative food item size may play a role in selecting for optimal tooth shape, and the magnitude of these selective forces may change depending on food item size and which efficiency metric is being selected. PMID:25320068
Validating Self-Report Measures of Pain and Function in Patients Undergoing Hip or Knee Arthroplasty

PubMed Central

Dogra, Moneet; Woodhouse, Linda; Kennedy, Deborah M.; Spadoni, Greg F.

2009-01-01

ABSTRACT Purpose: To investigate the factorial and construct validity of a four-item pain intensity scale, the P4, in patients awaiting primary total hip or knee arthroplasty secondary to osteoarthritis. Method: A construct validation design was applied to a sample of convenience of 117 patients (mean age 65.6 [SD = 11.2] years) at their preoperative visit. All patients completed the P4 and the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Exploratory and confirmatory factor analyses were used to examine the factorial structure of the P4 and WOMAC. To evaluate construct validity, we examined the correlation between the P4 and WOMAC pain sub-scales and the ability of the P4 to differentiate between patients awaiting hip and knee replacement. Results: Two distinct factors consistent with the themes of pain and function were identified with P4 and WOMAC physical function items, but not with the WOMAC pain and physical function items. The P4 correlates more with the WOMAC pain scores (r = 0.67) than with the WOMAC physical function scores (r = 0.60). Conclusion: The P4's validity was supported in this patient group. The use of the P4 with the WOMAC physical function sub-scale provides a more distinct assessment of pain and function than the WOMAC pain and physical function scales. PMID:20808479
Use of NON-PARAMETRIC Item Response Theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS)

PubMed Central

2011-01-01

Background Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Methods Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. Results The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. Conclusions The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity. PMID:22087503
Assessing the Straightforwardly-Worded Brief Fear of Negative Evaluation Scale for Differential Item Functioning Across Gender and Ethnicity.

PubMed

Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael

2015-06-01

The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

PubMed Central

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

2018-01-01

Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities

PubMed Central

Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.

2017-01-01

Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495
Measuring self-esteem after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Self-esteem item bank and short form.

PubMed

Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Measuring resilience after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Resilience item bank and short form.

PubMed

Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

ERIC Educational Resources Information Center

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol

2016-01-01

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
An enhanced functional ability questionnaire (faVIQ) to measure the impact of rehabilitation services on the visually impaired

PubMed Central

Wolffsohn, James Stuart; Jackson, Jonathan; Hunt, Olivia Anne; Cottriall, Charles; Lindsay, Jennifer; Gilmour, Richard; Sinclair, Anne; Harper, Robert

2014-01-01

AIM To develop a short, enhanced functional ability Quality of Vision (faVIQ) instrument based on previous questionnaires employing comprehensive modern statistical techniques to ensure the use of an appropriate response scale, items and scoring of the visual related difficulties experienced by patients with visual impairment. METHODS Items in current quality-of-life questionnaires for the visually impaired were refined by a multi-professional group and visually impaired focus groups. The resulting 76 items were completed by 293 visually impaired patients with stable vision on two occasions separated by a month. The faVIQ scores of 75 patients with no ocular pathology were compared to 75 age and gender matched patients with visual impairment. RESULTS Rasch analysis reduced the faVIQ items to 27. Correlation to standard visual metrics was moderate (r=0.32-0.46) and to the NEI-VFQ was 0.48. The faVIQ was able to clearly discriminate between age and gender matched populations with no ocular pathology and visual impairment with an index of 0.983 and 95% sensitivity and 95% specificity using a cut off of 29. CONCLUSION The faVIQ allows sensitive assessment of quality-of-life in the visually impaired and should support studies which evaluate the effectiveness of low vision rehabilitation services. PMID:24634868
A comparison of home-based exercise programs with and without self-manual therapy in individuals with knee osteoarthritis in community.

PubMed

Cheawthamai, Kornkamon; Vongsirinavarat, Mantana; Hiengkaew, Vimonwan; Saengrueangrob, Sasithorn

2014-07-01

The present study aimed to compare the effectiveness of the treatment programs of home-based exercise with and without self-manual therapy in individuals with knee osteoarthritis (knee OA) in community. Forty-three participants with knee OA were randomly assigned in groups. All participants received the same home-based exercise program with or without self-manual therapy over 12 weeks. Outcome measures were pain intensity, range of motions, six-minute walk test distance, the knee injury and osteoarthritis outcome score (KOOS), short-form 36 (SF-36) and satisfaction. The results showed that the self-manual therapy program significantly decreased pain at 4 weeks, increased flexion and extension at 4 and 12 weeks, and improved the KOOS in pain item and SF-36 in physical function and mental health items. The home-based exercise group showed significant increase of the six-minute walk distance at 4 and 12 weeks, improvements in the KOOS in pain and symptom items and SF-36 in the physical function and role-emotional items. Overall, the results favored a combination of self-manual therapy and home-based exercise for patients with knee OA, which apparently showed superior benefits in decreasing pain and improving active knee range of motions.
Assessing Psychopathy Among Justice Involved Adolescents with the PCL: YV: An Item Response Theory Examination Across Gender

PubMed Central

Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.

2014-01-01

This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672
Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis.

PubMed

Forkmann, Thomas; Boecker, Maren; Norra, Christine; Eberle, Nicole; Kircher, Tilo; Schauerte, Patrick; Mischke, Karl; Westhofen, Martin; Gauggel, Siegfried; Wirtz, Markus

2009-05-01

The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

PubMed

Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

2018-03-01

This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.

Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

PubMed

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
The neural correlates of gist-based true and false recognition

PubMed Central

Gutchess, Angela H.; Schacter, Daniel L.

2012-01-01

When information is thematically related to previously studied information, gist-based processes contribute to false recognition. Using functional MRI, we examined the neural correlates of gist-based recognition as a function of increasing numbers of studied exemplars. Sixteen participants incidentally encoded small, medium, and large sets of pictures, and we compared the neural response at recognition using parametric modulation analyses. For hits, regions in middle occipital, middle temporal, and posterior parietal cortex linearly modulated their activity according to the number of related encoded items. For false alarms, visual, parietal, and hippocampal regions were modulated as a function of the encoded set size. The present results are consistent with prior work in that the neural regions supporting veridical memory also contribute to false memory for related information. The results also reveal that these regions respond to the degree of relatedness among similar items, and implicate perceptual and constructive processes in gist-based false memory. PMID:22155331
False-positive tangible outcomes of functional analyses.

PubMed

Rooker, Griffin W; Iwata, Brian A; Harper, Jill M; Fahmie, Tara A; Camp, Erin M

2011-01-01

Functional analysis (FA) methodology is the most precise method for identifying variables that maintain problem behavior. Occasionally, however, results of an FA may be influenced by idiosyncratic sensitivity to aspects of the assessment conditions. For example, data from several studies suggest that inclusion of a tangible condition during an FA may be prone to a false-positive outcome, although the extent to which tangible reinforcement routinely produces such outcomes is unknown. We examined susceptibility to tangible reinforcement by determining whether a new response was acquired more readily when exposed to a tangible contingency relative to others commonly used in an FA (Study 1), and whether problem behavior known not to have a social function nevertheless emerged when exposed to tangible reinforcement (Study 2). Results indicated that inclusion of items in the tangible condition should be done with care and that selection should be based on those items typically found in the individual's environment.
Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life.

PubMed

Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P

2017-11-01

Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.
Dimensionality of the Knee Numeric-Entity Evaluation Score (KNEES-ACL): a condition-specific questionnaire.

PubMed

Comins, J D; Krogsgaard, M R; Kreiner, S; Brodersen, J

2013-10-01

The benefit of anterior cruciate ligament (ACL) reconstruction has been questioned based on patient-reported outcome measures (PROMs). Valid interpretation of such results requires confirmation of the psychometric properties of the PROM. Rasch analysis is the gold standard for validation of PROMs, yet PROMs used for ACL reconstruction have not been validated using Rasch analysis. We used Rasch analysis to investigate the psychometric properties of the Knee Numeric-Entity Evaluation Score (KNEES-ACL), a newly developed PROM for patients treated for ACL deficiency. Two-hundred forty-two patients pre- and post-ACL reconstruction completed the pilot PROM. Rasch models were used to assess the psychometric properties (e.g., unidimensionality, local response dependency, and differential item functioning). Forty-one items distributed across seven unidimensional constructs measuring impairment, functional limitations, and psychosocial consequences were confirmed to fit Rasch models. Fourteen items were removed because of statistical lack of fit and inadequate face validity. Local response dependency and differential item functioning were identified and adjusted. The KNEES-ACL is the first Rasch-validated condition-specific PROM constructed for patients with ACL deficiency and patients with ACL reconstruction. Thus, this instrument can be used for within- and between-group comparisons. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Measurement of overgeneral autobiographical memory: Psychometric properties of the autobiographical memory test in young and older populations

PubMed Central

Romero, Dulce; Ricarte, Jorge J.; Serrano, Juan P.; Nieto, Marta; Latorre, Jose M.

2018-01-01

The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used. PMID:29672583
Measurement of overgeneral autobiographical memory: Psychometric properties of the autobiographical memory test in young and older populations.

PubMed

Ros, Laura; Romero, Dulce; Ricarte, Jorge J; Serrano, Juan P; Nieto, Marta; Latorre, Jose M

2018-01-01

The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used.
Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

PubMed

Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

2017-09-16

This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
Optimal lot sizing in screening processes with returnable defective items

NASA Astrophysics Data System (ADS)

Vishkaei, Behzad Maleki; Niaki, S. T. A.; Farhangi, Milad; Rashti, Mehdi Ebrahimnezhad Moghadam

2014-07-01

This paper is an extension of Hsu and Hsu (Int J Ind Eng Comput 3(5):939-948, 2012) aiming to determine the optimal order quantity of product batches that contain defective items with percentage nonconforming following a known probability density function. The orders are subject to 100 % screening process at a rate higher than the demand rate. Shortage is backordered, and defective items in each ordering cycle are stored in a warehouse to be returned to the supplier when a new order is received. Although the retailer does not sell defective items at a lower price and only trades perfect items (to avoid loss), a higher holding cost incurs to store defective items. Using the renewal-reward theorem, the optimal order and shortage quantities are determined. Some numerical examples are solved at the end to clarify the applicability of the proposed model and to compare the new policy to an existing one. The results show that the new policy provides better expected profit per time.
Comparison of the functional rating index and the 18-item Roland-Morris Disability Questionnaire: responsiveness and reliability.

PubMed

Chansirinukor, Wunpen; Maher, Christopher G; Latimer, Jane; Hush, Julia

2005-01-01

Retrospective design. To compare the responsiveness and test-retest reliability of the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire in detecting change in disability in patients with work-related low back pain. Many low back pain-specific disability questionnaires are available, including the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire. No previous study has compared the responsiveness and reliability of these questionnaires. Files of patients who had been treated for work-related low back pain at a physical therapy clinic were reviewed, and those containing initial and follow-up Functional Rating Index and 18-item Roland-Morris Disability Questionnaires were selected. The responsiveness of both questionnaires was compared using two different methods. First, using the assumption that patients receiving treatment improve over time, various responsiveness coefficients were calculated. Second, using change in work status as an external criterion to identify improved and nonimproved patients, Spearman's rho and receiver operating characteristic curves were calculated. Reliability was estimated from the subset of patients who reported no change in their condition over this period and expressed with the intraclass correlation coefficient and the minimal detectable change. One hundred and forty-three patient files were retrieved. The responsiveness coefficients for the Functional Rating Index were greater than for the 18-item Roland-Morris Disability Questionnaire. The intraclass correlation coefficient values for both questionnaires calculated from 96 patient files were similar, but the minimal detectable change for the Functional Rating Index was less than for the 18-item Roland-Morris Disability Questionnaire. The Functional Rating Index seems preferable to the 18-item Roland-Morris Disability Questionnaire for use in clinical trials and clinical practice.
An initial psychometric evaluation of the German PROMIS v1.2 Physical Function item bank in patients with a wide range of health conditions.

PubMed

Liegl, Gregor; Rose, Matthias; Correia, Helena; Fischer, H Felix; Kanlidere, Sibel; Mierke, Annett; Obbarius, Alexander; Nolte, Sandra

2018-01-01

To translate the PROMIS Physical Function (PF) item bank version 1.2 into German and to investigate psychometric properties of resulting full bank and seven derived short forms. Cross-sectional psychometric study. Inpatient and outpatient clinics of the Department of Psychosomatic Medicine at Charité-Universitätsmedizin Berlin, Germany. A total of 10 adult patients with various chronic diseases participated in cognitive debriefing interviews. The final item bank was administered to n = 266 adult patients with a broad range of medical conditions. Patient-reported outcome assessment as part of routine care. PROMIS v1.2 PF bank; MOS SF-36 PF scale (PF-10). Cross-cultural adaptation of the item bank followed established guidelines. For the final German translation, the corrected item-total correlations ranged from 0.44 to 0.84. Cronbach's alpha was high for each PROMIS PF short form ( α = 0.88-0.96). The full PROMIS PF bank and most short forms correlated highly with the SF-36 PF-10 ( r = 0.85-0.90), with the exception of PROMIS Upper Extremity ( r = 0.64). PROMIS Upper Extremity showed ceiling effects and lower agreement with the full bank than other short forms. Unidimensionality was supported for all PROMIS PF measures using traditional factor analysis and nonparametric item response theory. The German PROMIS PF bank was found to be conceptually equivalent to the English version and fulfilled the psychometric requirements for use of short forms in clinical practice. Future studies should pay particular attention to samples with upper extremity functional limitations to further investigate the dimensional structure of PF as conceptualized according to PROMIS.
Development of the NIH PROMIS ® Sexual Function and Satisfaction measures in patients with cancer.

PubMed

Flynn, Kathryn E; Lin, Li; Cyranowski, Jill M; Reeve, Bryce B; Reese, Jennifer Barsky; Jeffery, Diana D; Smith, Ashley Wilder; Porter, Laura S; Dombeck, Carrie B; Bruner, Deborah Watkins; Keefe, Francis J; Weinfurt, Kevin P

2013-02-01

We describe the development and validation of the Patient-Reported Outcomes Measurement Information System(®) Sexual Function and Satisfaction (PROMIS(®) SexFS; National Institutes of Health) measures, version 1.0, for cancer populations. To develop a customizable self-report measure of sexual function and satisfaction as part of the U.S. National Institutes of Health PROMIS Network. Our multidisciplinary working group followed a comprehensive protocol for developing psychometrically robust patient-reported outcome measures including qualitative (scale development) and quantitative (psychometric evaluation) development. We performed an extensive literature review, conducted 16 focus groups with cancer patients and multiple discussions with clinicians, and evaluated candidate items in cognitive testing with patients. We administered items to 819 cancer patients. Items were calibrated using item-response theory and evaluated for reliability and validity. The PROMIS SexFS measures, version 1.0, include 81 items in 11 domains: Interest in Sexual Activity, Lubrication, Vaginal Discomfort, Erectile Function, Global Satisfaction with Sex Life, Orgasm, Anal Discomfort, Therapeutic Aids, Sexual Activities, Interfering Factors, and Screener Questions. In addition to content validity (patients indicate that items cover important aspects of their experiences) and face validity (patients indicate that items measure sexual function and satisfaction), the measure shows evidence for discriminant validity (domains discriminate between groups expected to be different) and convergent validity (strong correlations between scores on PROMIS and scores on conceptually similar older measures of sexual function), as well as favorable test-retest reliability among people not expected to change (interclass correlations from two administrations of the instrument, 1 month apart). The PROMIS SexFS offers researchers a reliable and valid set of tools to measure self-reported sexual function and satisfaction among diverse men and women. The measures are customizable; researchers can select the relevant domains and items comprising those domains for their study. © 2013 International Society for Sexual Medicine.
A measure of early physical functioning (EPF) post-stroke.

PubMed

Finch, Lois E; Higgins, Johanne; Wood-Dauphinee, Sharon; Mayo, Nancy E

2008-07-01

To develop a comprehensive measure of Early Physical Functioning (EPF) post-stroke quantified through Rasch analysis and conceptualized using the International Classification of Functioning Disability and Health (ICF). An observational cohort study. A cohort of 262 subjects (mean age 71.6 (standard deviation 12.5) years) hospitalized post-acute stroke. Functional assessments were made within 3 days of stroke with items from valid and reliable indices commonly utilized to evaluate stroke survivors. Information on important variables was also collected. Principal component and Rasch analysis confirmed the factor structure, and dimensionality of the measure. Rasch analysis combined items across ICF components to develop the measure. Items were deleted iteratively, those retained fit the model and were related to the construct; reliability and validity were assessed. A 38-item unidimensional measure of the EPF met all Rasch model requirements. The item difficulty matched the person ability (mean person measure: -0.31; standard error 0.37 logits), reliability of the person-item-hierarchy was excellent at 0.97. Initial validity was adequate. The 38-item EPF measure was developed. It expands the range of assessment post acute stroke; it covers a broad spectrum of difficulty with good initial psychometric properties that, once revalidated, can assist in planning and evaluating early interventions.
Rasch analysis of the carers quality of life questionnaire for parkinsonism.

PubMed

Pillas, Marios; Selai, Caroline; Schrag, Anette

2017-03-01

To assess the psychometric properties of the Carers Quality of Life Questionnaire for Parkinsonism using a Rasch modeling approach and determine the optimal cut-off score. We performed a Rasch analysis of the survey answers of 430 carers of patients with atypical parkinsonism. All of the scale items demonstrated acceptable goodness of fit to the Rasch model. The scale was unidimensional and no notable differential item functioning was detected in the items regarding age and disease type. Rating categories were functioning adequately in all scale items. The scale had high reliability (.95) and construct validity and a high degree of precision, distinguishing between 5 distinct groups of carers with different levels of quality of life. A cut-off score of 62 was found to have the optimal screening accuracy based on Hospital Anxiety and Depression Scale subscores. The results suggest that the Carers Quality of Life Questionnaire for Parkinsonism is a useful scale to assess carers' quality of life and allows analyses requiring interval scaling of variables. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Detection of Uniform and Nonuniform Differential Item Functioning by Item-Focused Trees

ERIC Educational Resources Information Center

Berger, Moritz; Tutz, Gerhard

2016-01-01

Detection of differential item functioning (DIF) by use of the logistic modeling approach has a long tradition. One big advantage of the approach is that it can be used to investigate nonuniform (NUDIF) as well as uniform DIF (UDIF). The classical approach allows one to detect DIF by distinguishing between multiple groups. We propose an…
Examining Differential Item Functioning: IRT-Based Detection in the Framework of Confirmatory Factor Analysis

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2017-01-01

This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.
Parent Ratings of ADHD Symptoms: Generalized Partial Credit Model Analysis of Differential Item Functioning across Gender

ERIC Educational Resources Information Center

Gomez, Rapson

2012-01-01

Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.

ERIC Educational Resources Information Center

Muraki, Eiji

1999-01-01

Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844

ERIC Educational Resources Information Center

Falk, Carl F.; Cai, Li

2015-01-01

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

ERIC Educational Resources Information Center

Li, Xiaomin; Wang, Wen-Chung

2015-01-01

The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

ERIC Educational Resources Information Center

Sachse, Karoline A.; Haag, Nicole

2017-01-01

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

ERIC Educational Resources Information Center

Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

2017-01-01

We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…
Psychometric Evaluation of the HIV Disclosure Belief Scale: A Rasch Model Approach.

PubMed

Hu, Jinxiang; Serovich, Julianne M; Chen, Yi-Hsin; Brown, Monique J; Kimberly, Judy A

2017-01-01

This study provides psychometric assessment of an HIV disclosure belief scale (DBS) among men who have sex with men (MSM). This study used baseline data from a clinical trial evaluating the effectiveness of an HIV serostatus disclosure intervention of 338 HIV-positive MSM. The Rasch model was used after unidimensionality and local independence assumptions were tested for application of the model. Results suggest that there was only one item that did not fit the model well. After removing the item, the DBS showed good model-data fit and high item and person reliabilities. This instrument showed measurement invariance across two different age groups, but some items showed differential item functioning between Caucasian and other minority groups. The findings suggest that the DBS is suitable for measuring the HIV disclosure beliefs, but it should be cautioned when the DBS is used to compare the disclosure beliefs between different racial/ethnic groups.
Irrational Delay Revisited: Examining Five Procrastination Scales in a Global Sample

PubMed Central

Svartdal, Frode; Steel, Piers

2017-01-01

Scales attempting to measure procrastination focus on different facets of the phenomenon, yet they share a common understanding of procrastination as an unnecessary, unwanted, and disadvantageous delay. The present paper examines in a global sample (N = 4,169) five different procrastination scales – Decisional Procrastination Scale (DPS), Irrational Procrastination Scale (IPS), Pure Procrastination Scale (PPS), Adult Inventory of Procrastination Scale (AIP), and General Procrastination Scale (GPS), focusing on factor structures and item functioning using Confirmatory Factor Analysis and Item Response Theory. The results indicated that The PPS (12 items selected from DPS, AIP, and GPS) measures different facets of procrastination even better than the three scales it is based on. An even shorter version of the PPS (5 items focusing on irrational delay), corresponds well to the nine-item IPS. Both scales demonstrate good psychometric properties and appear to be superior measures of core procrastination attributes than alternative procrastination scales. PMID:29163302
Irrational Delay Revisited: Examining Five Procrastination Scales in a Global Sample.

PubMed

Svartdal, Frode; Steel, Piers

2017-01-01

Scales attempting to measure procrastination focus on different facets of the phenomenon, yet they share a common understanding of procrastination as an unnecessary, unwanted, and disadvantageous delay. The present paper examines in a global sample ( N = 4,169) five different procrastination scales - Decisional Procrastination Scale (DPS), Irrational Procrastination Scale (IPS), Pure Procrastination Scale (PPS), Adult Inventory of Procrastination Scale (AIP), and General Procrastination Scale (GPS), focusing on factor structures and item functioning using Confirmatory Factor Analysis and Item Response Theory. The results indicated that The PPS (12 items selected from DPS, AIP, and GPS) measures different facets of procrastination even better than the three scales it is based on. An even shorter version of the PPS (5 items focusing on irrational delay), corresponds well to the nine-item IPS. Both scales demonstrate good psychometric properties and appear to be superior measures of core procrastination attributes than alternative procrastination scales.
Robust Measurement via A Fused Latent and Graphical Item Response Theory Model.

PubMed

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang

2018-03-12

Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.
The Comparability of English, French and Dutch Scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F): An Assessment of Differential Item Functioning in Patients with Systemic Sclerosis

PubMed Central

Kwakkenbos, Linda; Willems, Linda M.; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H. M.; Thombs, Brett D.

2014-01-01

Objective The Functional Assessment of Chronic Illness Therapy- Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. Methods The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. Results A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. Conclusions There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics. PMID:24638101
The Mindful Attention Awareness Scale: Further Examination of Dimensionality, Reliability, and Concurrent Validity Estimates.

PubMed

Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M

2016-01-01

We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
The Health and Functioning ICF-60: Development and Psychometric Properties

PubMed Central

Tutelyan, V A; Chatterji, S; Baturin, A K; Pogozheva, A V; Kishko, O N; Akolzina, S E

2014-01-01

Background This paper describes the development and psychometric properties of the Health and Functioning ICF-60 (HF-ICF-60) measure, based on the World Health Organization (WHO) ‘International Classification of Functioning, Disability and Health: ICF’ (2001). The aims of the present study were to test psychometric properties of the HF-ICF-60, developed as a measure that would be responsive to change in functioning through changes in health and nutritional status, as a prospective measure to monitor health and nutritional status of populations and to explore the relationship of the HF-ICF-60 with quality of life measures such as the World Health Organization WHOQOL-BREF quality of life assessment in relation to non-communicable diseases. Methods The HF-ICF-60 measure consists of 60 items selected from the ICF by an expert panel, which included 18 items that cover Body Functions, 21 items that cover Activities and Participation, rated on five-point scales, and 21 items that cover Environmental Factors (seven items cover Individual Environmental Factors and 14 items cover Societal Environmental Factors), rated on nine-point scales. The HF-ICF-60 measure was administered to the Russian nationally representative sample within the Russian National Population Quality of Life, Health and Nutrition Survey, in 2004 (n = 9807) and 2005 (n = 9560), as part of the two waves of the Russian Longitudinal Monitoring Survey (RLMS). The statistical analyses were carried out with the use of both classical and modern psychometric methods, such as factor analysis, and based on Item Response Theory, respectively. Results The HF-ICF-60 questionnaire is a new measure derived directly from the ICF and covers the ICF components as follows: Body Functions, Activities and Participation, and Environmental Factors (Individual Environmental Factors and Societal Environmental Factors). The results from the factor analyses (both Exploratory Factor Analyses and Confirmatory Factor Analyses) show good support for the proposed structure together with an overall higher-order factor for each scale of the measure. The measure has good reliability and validity, and sensitivity to change in the health and nutritional status of respondents over time. Normative values were developed for the Russian adult population. Conclusions The HF-ICF-60 has shown good psychometric properties in the two waves of the nationally representative RLMS, which provided considerable support to using the HF-ICF-60 data as the normative health and functioning values for the Russian population. Similarly, the administration of the WHOQOL-BREF in the same two waves of the nationally representative RLMS has allowed the normative quality of life values for the Russian population to be obtained. Therefore, the objective assessment of health and functioning of the HF-ICF-60 could be mapped onto the subjective evaluation of quality of life of the WHOQOL-BREF to increase the potential usefulness of the surveys in relation to non-communicable diseases. © 2014 The Authors. Clinical Psychology & Psychotherapy. Published by John Wiley & Sons, Ltd. Key Practitioner Message The HF-ICF-60 offers a new perspective in measuring change in functioning through changes in lifestyle and diet. The HF-ICF-60 can be combined with the WHOQOL-BREF to map the objective assessment of health and functioning onto the subjective evaluation of quality of life. Combined use of the HF-ICF-60 and the WHOQOL-BREF can be especially useful for national and global monitoring and surveillance of implementation of measures to reduce risk factors of non-communicable diseases and to promote healthy lifestyles and healthy diets. PMID:24931300
Language production in a shared task: Cumulative Semantic Interference from self- and other-produced context words.

PubMed

Hoedemaker, Renske S; Ernst, Jessica; Meyer, Antje S; Belke, Eva

2017-01-01

This study assessed the effects of semantic context in the form of self-produced and other-produced words on subsequent language production. Pairs of participants performed a joint picture naming task, taking turns while naming a continuous series of pictures. In the single-speaker version of this paradigm, naming latencies have been found to increase for successive presentations of exemplars from the same category, a phenomenon known as Cumulative Semantic Interference (CSI). As expected, the joint-naming task showed a within-speaker CSI effect, such that naming latencies increased as a function of the number of category exemplars named previously by the participant (self-produced items). Crucially, we also observed an across-speaker CSI effect, such that naming latencies slowed as a function of the number of category members named by the participant's task partner (other-produced items). The magnitude of the across-speaker CSI effect did not vary as a function of whether or not the listening participant could see the pictures their partner was naming. The observation of across-speaker CSI suggests that the effect originates at the conceptual level of the language system, as proposed by Belke's (2013) Conceptual Accumulation account. Whereas self-produced and other-produced words both resulted in a CSI effect on naming latencies, post-experiment free recall rates were higher for self-produced than other-produced items. Together, these results suggest that both speaking and listening result in implicit learning at the conceptual level of the language system but that these effects are independent of explicit learning as indicated by item recall. Copyright © 2016 Elsevier B.V. All rights reserved.
Maximum Marginal Likelihood Estimation of a Monotonic Polynomial Generalized Partial Credit Model with Applications to Multiple Group Analysis.

PubMed

Falk, Carl F; Cai, Li

2016-06-01

We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.
Differential Gender Effects in the Relationship between Perceived Immune Functioning and Autistic Traits.

PubMed

Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C

2017-04-12

Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Development and initial validation of the assessment of caregiver experience with neuromuscular disease.

PubMed

Matsumoto, Hiroko; Clayton-Krasinski, Debora A; Klinge, Stephen A; Gomez, Jaime A; Booker, Whitney A; Hyman, Joshua E; Roye, David P; Vitale, Michael G

2011-01-01

Orthopaedic intervention can have a wide range of functional and psychosocial effects on children with neuromuscular disease (NMD). In the multihandicapped child (Gross Motor Classification System IV/V), functional status, pain, psychosocial function, and health-related quality of life also have effects on the families of these child. The purpose of this study is to report the development and initial validation of an outcomes instrument specifically designed to assess the caregiver impact experienced by parents raising severely affected NMD children: the Assessment of Caregiver Experience with Neuromuscular Disease (ACEND). In the first part of this prospective study, 61 children with NMD and their parents were administered a range of earlier validated pediatric health measures. A framework technique was used to select the most appropriate and relevant subset of questions from this large set. Sensitivity analyses guided the development of a master question list measuring caregiver impact, excluding items with low relevance, and modifying unclear questions. In the second part of the study, the ACEND was administered to the caregivers of 46 children with moderate-to-severe NMD. Statistical analyses were conducted to determine validity of the instrument. The resulting ACEND instrument included 2 domains, 7 subdomains, and 41 items. Domain 1, examining physical impact, includes 4 subdomains: feeding/grooming/dressing (6 items), sitting/play (5 items), transfers (5 items), and mobility (7 items). Domain 2, which examines general caregiver impact, included 3 subdomains: time (4 items), emotion (9 items), and finance (5 items). Mean overall relevance rating was 6.21 ± 0.37 and clarity rating was 6.68 ± 0.52 (scale 0 to 7). Multiple floor effects in patients with GMFCS V and ceiling effects in patients with GMFCS III were identified almost exclusively in motor-based items. Virtually no floor or ceiling effects were identified in the time, emotion or finance domains across GMFCS level. The initial validation demonstrated that ACEND is a valid, disease-specific measure to quantify experience on caregivers of children with NMD. Larger groups of patients across NMD disease type are currently being tested to strengthen validity findings. Additionally, the ACEND is now being administered before and after orthopaedic interventions to determine responsiveness, which is critical to health outcomes research. LEVEL OF EVIDENCE/RELEVANCE: IIc.
Further evidence that similar principles govern recall from episodic and semantic memory: the Canadian prime ministerial serial position function.

PubMed

Neath, Ian; Saint-Aubin, Jean

2011-06-01

The serial position function, with its characteristic primacy and recency effects, is one of the most ubiquitous findings in episodic memory tasks. In contrast, there are only two demonstrations of such functions in tasks thought to tap semantic memory. Here, we provide a third demonstration, showing that free recall of the prime ministers of Canada also results in a serial position function. Scale Independent Memory, Perception, and Learning (SIMPLE), a local distinctiveness model of memory that was designed to account for serial position effects in episodic memory, fit the data. According to SIMPLE, serial position functions observed in episodic and semantic memory all reflect the relative distinctiveness principle: items will be well remembered to the extent that they are more distinct than competing items at the time of retrieval. (PsycINFO Database Record (c) 2011 APA, all rights reserved).
Using existing questionnaires in latent class analysis: should we use summary scores or single items as input? A methodological study using a cohort of patients with low back pain.

PubMed

Nielsen, Anne Molgaard; Vach, Werner; Kent, Peter; Hestbaek, Lise; Kongsted, Alice

2016-01-01

Latent class analysis (LCA) is increasingly being used in health research, but optimal approaches to handling complex clinical data are unclear. One issue is that commonly used questionnaires are multidimensional, but expressed as summary scores. Using the example of low back pain (LBP), the aim of this study was to explore and descriptively compare the application of LCA when using questionnaire summary scores and when using single items to subgrouping of patients based on multidimensional data. Baseline data from 928 LBP patients in an observational study were classified into four health domains (psychology, pain, activity, and participation) using the World Health Organization's International Classification of Functioning, Disability, and Health framework. LCA was performed within each health domain using the strategies of summary-score and single-item analyses. The resulting subgroups were descriptively compared using statistical measures and clinical interpretability. For each health domain, the preferred model solution ranged from five to seven subgroups for the summary-score strategy and seven to eight subgroups for the single-item strategy. There was considerable overlap between the results of the two strategies, indicating that they were reflecting the same underlying data structure. However, in three of the four health domains, the single-item strategy resulted in a more nuanced description, in terms of more subgroups and more distinct clinical characteristics. In these data, application of both the summary-score strategy and the single-item strategy in the LCA subgrouping resulted in clinically interpretable subgroups, but the single-item strategy generally revealed more distinguishing characteristics. These results 1) warrant further analyses in other data sets to determine the consistency of this finding, and 2) warrant investigation in longitudinal data to test whether the finer detail provided by the single-item strategy results in improved prediction of outcomes and treatment response.
Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

PubMed

Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

2018-02-01

Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

PubMed

Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

2016-12-01

The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Rapid Forgetting Results From Competition Over Time Between Items in Visual Working Memory

PubMed Central

2016-01-01

Working memory is now established as a fundamental cognitive process across a range of species. Loss of information held in working memory has the potential to disrupt many aspects of cognitive function. However, despite its significance, the mechanisms underlying rapid forgetting remain unclear, with intense recent debate as to whether it is interference between stored items that leads to loss of information or simply temporal decay. Here we show that both factors are essential and interact in a highly specific manner. Although a single item can be maintained in memory with high fidelity, multiple items compete in working memory, progressively degrading each other’s representations as time passes. Specifically, interaction between items is associated with both worsening precision and increased reporting errors of object features over time. Importantly, during the period of maintenance, although items are no longer visible, maintenance resources can be selectively redeployed to protect the probability to recall the correct feature and the precision with which cued items can be recalled, as if it was the only item in memory. These findings reveal that the biased competition concept could be applied not only to perceptual processes but also to active maintenance of working memory representations over time. PMID:27668485
Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis

PubMed Central

Pallant, Julie F; Miller, Renée L; Tennant, Alan

2006-01-01

Background The Edinburgh Postnatal Depression Scale (EPDS) is a 10 item self-rating post-natal depression scale which has seen widespread use in epidemiological and clinical studies. Concern has been raised over the validity of the EPDS as a single summed scale, with suggestions that it measures two separate aspects, one of depressive feelings, the other of anxiety. Methods As part of a larger cross-sectional study conducted in Melbourne, Australia, a community sample (324 women, ranging in age from 18 to 44 years: mean = 32 yrs, SD = 4.6), was obtained by inviting primiparous women to participate voluntarily in this study. Data from the EPDS were fitted to the Rasch measurement model and tested for appropriate category ordering, for item bias through Differential Item Functioning (DIF) analysis, and for unidimensionality through tests of the assumption of local independence. Results Rasch analysis of the data from the ten item scale initially demonstrated a lack of fit to the model with a significant Item-Trait Interaction total chi-square (chi Square = 82.8, df = 40; p < .001). Removal of two items (items 7 and 8) resulted in a non-significant Item-Trait Interaction total chi-square with a residual mean value for items of -0.467 with a standard deviation of 0.850, showing fit to the model. No DIF existed in the final 8-item scale (EPDS-8) and all items showed fit to model expectations. Principal Components Analysis of the residuals supported the local independence assumption, and unidimensionality of the revised EPDS-8 scale. Revised cut points were identified for EPDS-8 to maintain the case identification of the original scale. Conclusion The results of this study suggest that EPDS, in its original 10 item form, is not a viable scale for the unidimensional measurement of depression. Rasch analysis suggests that a revised eight item version (EPDS-8) would provide a more psychometrically robust scale. The revised cut points of 7/8 and 9/10 for the EPDS-8 show high levels of agreement with the original case identification for the EPDS-10. PMID:16768803

Item Response Theory Using Hierarchical Generalized Linear Models

ERIC Educational Resources Information Center

Ravand, Hamdollah

2015-01-01

Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
The Brief Impairment Scale (Bis): A Multidimensional Scale of Functional Impairment for Children and Adolescents.

ERIC Educational Resources Information Center

Bird, Hector R.; Canino, Glorisa J.; Davies, Mark; Ramirez, Rafael; Chavez, Ligia; Duarte, Cristiane; Shen, Sa

2005-01-01

Objective: This article provides the results of the psychometric testing of the Brief Impairment Scale (BIS). The BIS is a 23-item instrument that evaluates three domains of functioning: interpersonal relations, school/work functioning, and self-care/self-fulfilment. It capitalizes on the strengths of existing global measures while addressing some…
Validation of Gujarati Version of ABILOCO-Kids Questionnaire

PubMed Central

Diwan, Jasmin; Patel, Pankaj; Bansal, Ankita B.

2015-01-01

Background ABILOCO-Kids is a measure of locomotion ability for children with cerebral palsy (CP) aged 6 to 15 years & is available in English & French. Aim To validate the Gujarati version of ABILOCO-Kids questionnaire to be used in clinical research on Gujarati population. Materials and Methods ABILOCO-Kids questionnaire was translated into Gujarati from English using forward-backward-forward method. To ensure face & content validity of Gujarati version using group consensus method, each item was examined by group of experts having mean experience of 24.62 years in field of paediatric and paediatric physiotherapy. Each item was analysed for content, meaning, wording, format, ease of administration & scoring. Each item was scored by expert group as either accepted, rejected or accepted with modification. Procedure was continued until 80% of consensus for all items. Concurrent validity was examined on 55 children with Cerebral Palsy (6-15 years) of all Gross Motor Functional Classification System (GMFCS) level & all clinical types by correlating score of ABILOCO-Kids with Gross Motor Functional Measure & GMFCS. Result In phase 1 of validation, 16 items were accepted as it is; 22 items accepted with modification & 3 items went for phase 2 validation. For concurrent validity, highly significant positive correlation was found between score of ABILOCO-Kids & total GMFM (r=0.713, p<0.005) & highly significant negative correlation with GMFCS (r= -0.778, p<0.005). Conclusion Gujarati translated version of ABILOCO-Kids questionnaire has good face & content validity as well as concurrent validity which can be used to measure caregiver reported locomotion ability in children with CP. PMID:26557603
Clinical characteristics of patients with major depressive disorder with and without hypothyroidism: a comparative study.

PubMed

Mowla, Arash; Kalantarhormozi, Mohammad Reza; Khazraee, Samaneh

2011-01-01

Differentiating major depressive disorder (MDD) without hypothyroidism from MDD associated with hypothyroidism can be challenging. Therefore some authors have suggested that thyroid function should be tested in all depressed patients. This study compared the clinical characteristics of patients with MDD associated with hypothyroidism with those of patients with MDD without hypothyroidism. Thyroid function tests were administered to 75 patients (60 female and 15 male) who met DSM-IV criteria for MDD. The 15 patients with hypothyroidism (8 with subclinical hypothyroidism and 7 with overt hypothyroidism) were compared with the other 60 patients with regard to depressive characteristics. The primary measure of depressive signs and symptoms used to assess depression severity and symptoms was the Hamilton Rating Scale for Depression, first 17 items (Ham-D-17). Baseline demographic data, including age and sex, were also compared. The two groups did not differ significantly in severity of overall depression at baseline, as measured by total score on the Ham-D-17 (P=0.471, Z=0.970). Patients with MDD without hypothyroidism had worse scores on item 1 (depressed mood), item 2 (feelings of guilt), item 3 (suicidality), item 6 (late insomnia), and item 16 (loss of weight). In contrast, depressed patients with hypothyroidism had more severe anxiety symptoms and greater agitation (items 9, 10, and 11). Our results may help clinicians differentiate MDD associated with hypothyroidism from MDD without hypothyroidism. Depressed patients with hypothyroidism had more anxiety symptoms and greater agitation, but they had fewer severe core depressive symptoms and biological signs of MDD. (Journal of Psychiatric Practice. 2011;17:67-71).
Mayo-Portland adaptability inventory: comparing psychometrics in cerebrovascular accident to traumatic brain injury.

PubMed

Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon

2012-12-01

(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Development of Evaluation Methods for Lower Limb Function between Aged and Young Using Principal Component Analysis

NASA Astrophysics Data System (ADS)

Nomoto, Yohei; Yamashita, Kazuhiko; Ohya, Tetsuya; Koyama, Hironori; Kawasumi, Masashi

There is the increasing concern of the society to prevent the fall of the aged. The improvement in aged people's the muscular strength of the lower-limb, postural control and walking ability are important for quality of life and fall prevention. The aim of this study was to develop multiple evaluation methods in order to advise for improvement and maintenance of lower limb function between aged and young. The subjects were 16 healthy young volunteers (mean ± S.D: 19.9 ± 0.6 years) and 10 healthy aged volunteers (mean ± S.D: 80.6 ± 6.1 years). Measurement items related to lower limb function were selected from the items which we have ever used. Selected measurement items of function of lower are distance of extroversion of the toe, angle of flexion of the toe, maximum width of step, knee elevation, moving distance of greater trochanter, walking balance, toe-gap force and rotation range of ankle joint. Measurement items summarized by the principal component analysis into lower ability evaluation methods including walking ability and muscle strength of lower limb and flexibility of ankle. The young group demonstrated the factor of 1.6 greater the assessment score of walking ability compared with the aged group. The young group demonstrated the factor of 1.4 greater the assessment score of muscle strength of lower limb compared with the aged group. The young group demonstrated the factor of 1.2 greater the assessment score of flexibility of ankle compared with the aged group. The results suggested that it was possible to assess the lower limb function of aged and young numerically and to advise on their foot function.
Person Response Functions and the Definition of Units in the Social Sciences

ERIC Educational Resources Information Center

Engelhard, George, Jr.; Perkins, Aminah F.

2011-01-01

Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
Differential Item Functioning Detection Across Two Methods of Defining Group Comparisons

PubMed Central

Sari, Halil Ibrahim

2014-01-01

This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF studies. In this study, a simulation was conducted based on data from a 60-item ACT Mathematics test (ACT; Hanson & Béguin). The unsigned area measure method (Raju) was used as the DIF detection method. An application to operational data was also completed in the study, as well as a comparison of observed Type I error rates and false discovery rates across the two methods of defining groups. Results indicate that the amount of flagged DIF or interpretations about DIF in all conditions were not the same across the two methods, and there may be some benefits to using composite group approaches. The results are discussed in connection to differing definitions of fairness. Recommendations for practice are made. PMID:29795837
Investigating Linguistic Sources of Differential Item Functioning Using Expert Think-Aloud Protocols in Science Achievement Tests

NASA Astrophysics Data System (ADS)

Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye

2013-03-01

Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.
Does Linking Mixed-Format Tests Using a Multiple-Choice Anchor Produce Comparable Results for Male and Female Subgroups? Research Report. ETS RR-11-44

ERIC Educational Resources Information Center

Kim, Sooyeon; Walker, Michael E.

2011-01-01

This study examines the use of subpopulation invariance indices to evaluate the appropriateness of using a multiple-choice (MC) item anchor in mixed-format tests, which include both MC and constructed-response (CR) items. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using an MC-only anchor set for 4…
Validation of a Health Literacy Measure for Adolescents and Young Adults Diagnosed with Cancer.

PubMed

McDonald, Fiona E J; Patterson, Pandora; Costa, Daniel S J; Shepherd, Heather L

2016-03-01

Health literacy can influence long-term health outcomes. This study aimed to validate an adapted version of the Functional, Communicative and Critical Health Literacy measure for adolescent and young adult (AYA) cancer patients and survivors (N = 105; age 12-24 years). Exploratory factor analysis was used to validate the measure, and indicated that a slightly modified item structure better fit the results. Furthermore, item response theory analysis highlighted location and discrimination parameter differences among items. Acceptability of the measure was high. This is the first validation of a health literacy measure among AYAs with an illness such as cancer.
Behavioral decoding of working memory items inside and outside the focus of attention.

PubMed

Mallett, Remington; Lewis-Peacock, Jarrod A

2018-03-31

How we attend to our thoughts affects how we attend to our environment. Holding information in working memory can automatically bias visual attention toward matching information. By observing attentional biases on reaction times to visual search during a memory delay, it is possible to reconstruct the source of that bias using machine learning techniques and thereby behaviorally decode the content of working memory. Can this be done when more than one item is held in working memory? There is some evidence that multiple items can simultaneously bias attention, but the effects have been inconsistent. One explanation may be that items are stored in different states depending on the current task demands. Recent models propose functionally distinct states of representation for items inside versus outside the focus of attention. Here, we use behavioral decoding to evaluate whether multiple memory items-including temporarily irrelevant items outside the focus of attention-exert biases on visual attention. Only the single item in the focus of attention was decodable. The other item showed a brief attentional bias that dissipated until it returned to the focus of attention. These results support the idea of dynamic, flexible states of working memory across time and priority. © 2018 New York Academy of Sciences.
Proposing Electronic Health Record Usability Requirements Based on Enriched ISO 9241 Metric Usability Model

PubMed Central

Farzandipour, Mehrdad; Riazi, Hossein; Jabali, Monireh Sadeqi

2018-01-01

Introduction: System usability assessment is among the important aspects in assessing the quality of clinical information technology, especially when the end users of the system are concerned. This study aims at providing a comprehensive list of system usability. Methods: This research is a descriptive cross-sectional one conducted using Delphi technique in three phases in 2013. After experts’ ideas were concluded, the final version of the questionnaire including 163 items in three phases was presented to 40 users of information systems in hospitals. The grading ranged from 0-4. Data analysis was conducted using SPSS software. Those requirements with a mean point of three or higher were finally confirmed. Results: The list of system usability requirements for electronic health record was designed and confirmed in nine areas including suitability for the task (24 items), self-descriptiveness (22 items), controllability (19 questions), conformity with user expectations (25 items), error tolerance (21 items), suitability for individualization (7 items), suitability for learning (19 items), visual clarity (18 items) and auditory presentation (8 items). Conclusion: A relatively comprehensive model including useful requirements for using EHR was presented which can increase functionality, effectiveness and users’ satisfaction. Thus, it is suggested that the present model be adopted by system designers and healthcare system institutions to assess those systems. PMID:29719310
Validation of a condition-specific measure for women having an abnormal screening mammography.

PubMed

Brodersen, John; Thorsen, Hanne; Kreiner, Svend

2007-01-01

The aim of this study is to assess the validity of a new condition-specific instrument measuring psychosocial consequences of abnormal screening mammography (PCQ-DK33). The draft version of the PCQ-DK33 was completed on two occasions by 184 women who had received an abnormal screening mammography and on one occasion by 240 women who had received a normal screening result. Item Response Theories and Classical Test Theories were used to analyze data. Construct validity, concurrent validity, known group validity, objectivity and reliability were established by item analysis examining the fit between item responses and Rasch models. Six dimensions covering anxiety, behavioral impact, sense of dejection, impact on sleep, breast examination, and sexuality were identified. One item belonging to the dejection dimension had uniform differential item functioning. Two items not fitting the Rasch models were retained because of high face validity. A sick leave item added useful information when measuring side effects and socioeconomic consequences of breast cancer screening. Five "poor items" were identified and should be deleted from the final instrument. Preliminary evidence for a valid and reliable condition-specific measure for women having an abnormal screening mammography was established. The measure includes 27 "good" items measuring different attributes of the same overall latent structure-the psychosocial consequences of abnormal screening mammography.
Differential Item Functioning in the SF-36 Physical Functioning and Mental Health Sub-Scales: A Population-Based Investigation in the Canadian Multicentre Osteoporosis Study.

PubMed

Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard

2016-01-01

Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Screening for depression in arthritis populations: an assessment of differential item functioning in three self-reported questionnaires.

PubMed

Hu, Jinxiang; Ward, Michael M

2017-09-01

To determine if persons with arthritis differ systematically from persons without arthritis in how they respond to questions on three depression questionnaires, which include somatic items such as fatigue and sleep disturbance. We extracted data on the Centers for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9), and the Kessler-6 (K-6) scale from three large population-based national surveys. We assessed items on these questionnaires for differential item functioning (DIF) between persons with and without self-reported physician-diagnosed arthritis using multiple indicator multiple cause models, which controlled for the underlying level of depression and important confounders. We also examined if DIF by arthritis status was similar between women and men. Although five items of the CES-D, one item of the PHQ-9, and five items of the K-6 scale had evidence of DIF based on statistical comparisons, the magnitude of each difference was less than the threshold of a small effect. The statistical differences were a function of the very large sample sizes in the surveys. Effect sizes for DIF were similar between women and men except for two items on the Patient Health Questionnaire-9. For each questionnaire, DIF accounted for 8% or less of the arthritis-depression association, and excluding items with DIF did not reduce the difference in depression scores between those with and without arthritis. Persons with arthritis respond to items on the CES-D, PHQ-9, and K-6 depression scales similarly to persons without arthritis, despite the inclusion of somatic items in these scales.
Rasch analysis of the Trypophobia Questionnaire.

PubMed

Imaizumi, Shu; Tanno, Yoshihiko

2018-02-14

This study aimed to assess Rasch-based psychometric properties of the Trypophobia Questionnaire measuring proneness to trypophobia, which refers to disgust and unpleasantness induced by the observation of clusters of objects (e.g., lotus seed pods). Rasch analysis was performed on data from 582 healthy Japanese adults. The results suggested that Trypophobia Questionnaire has a unidimensional structure with ordered response categories and sufficient person and item reliabilities, and that it does not have differential item functioning across sexes and age groups, whereas the targeting of the scale leaves room for improvements. When items that did not fit the Rasch model were removed, the shortened version showed slightly improved psychometric properties. However, results were not conclusive in determining whether the full or shortened version is better for practical use. Further assessment and validation are needed.
A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a Rheumatoid Arthritis Population using Rasch Analysis

PubMed Central

Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan

2007-01-01

Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
The functional assessment measure (FAM) in closed traumatic brain injury outpatients: a Rasch-based psychometric study.

PubMed

Tesio, L; Cantagallo, A

1998-01-01

The Functional Assessment Measure (FAM) has been proposed as a measure of disability in post-acute Traumatic Brain Injury (TBI) outpatients. It is comprised of the 18 items of The Functional Independence Measure (FIMSM), scored in terms of dependence, and of 12 newly designed items, scored in terms of dependence (7 items) or performance (5 items). The FIMSM covers the domains of self-care, sphincter management, mobility, locomotion, communication and social cognition. The 12 new items explore the domains of community integration, emotional status, orientation, attention, reading/writing skills, swallowing and speech intelligibility. By addressing a set of problems quite specific for TBI outpatients the FAM was intended to raise the ceiling of the FIMSM and to allow a more precise estimate of their disability. These claims, however, were never supported in previous studies. We administered the FAM to 60 TBI outpatient, 2-88 months (median 16) from trauma. Rasch analysis (rating scale model) was adopted to test the psychometric properties of the scale. The FAM was reliable (Rasch item and person reliability 0.91 and 0.93, respectively). Two of the 12 FAM-specific items were severely misfitting with the general construct, and were deleted. Within the 28-item refined FAM scale, 4 new items and 2 FIMSM items still retained signs of misfit. The FAM was on average too easy. The most difficult item (a new one, Employability) did not attain the average ability of the subjects. Also, it was only slightly more difficult than than the most difficult FIMSM item (Memory). The FAM does not seem to improve the FIMSM as a far as TBI outpatients are to be assessed.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars

PubMed Central

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499

Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

PubMed

Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Scale refinement and initial evaluation of a behavioral health function measurement tool for work disability evaluation.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M

2013-09-01

To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Cross-sectional survey followed by IRT calibration data simulations. Community. Sample of individuals applying for Social Security Administration disability benefits: claimants (n=1015) and a normative comparative sample of U.S. adults (n=1000). None. SSA-BH measurement instrument. IRT analyses supported the unidimensionality of 4 SSA-BH scales: mood and emotions (35 items), self-efficacy (23 items), social interactions (6 items), and behavioral control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10-item computer adaptive tests with the full item bank indicated robust ability of the computer adaptive testing approach to comprehensively characterize behavioral health function along 4 distinct dimensions. Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all 4 scales. Behavioral function profiles of Social Security Administration claimants were generated and compared with age- and sex-matched norms along 4 scales: mood and emotions, behavioral control, social interactions, and self-efficacy. Using the computer adaptive test-based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the Social Security Administration's work disability programs. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

ERIC Educational Resources Information Center

Polak, Marike; De Rooij, Mark; Heiser, Willem J.

2012-01-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Item Response Theory Models for Wording Effects in Mixed-Format Scales

ERIC Educational Resources Information Center

Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

2015-01-01

Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

PubMed

Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

2018-01-01

To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

PubMed

Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

2018-03-01

The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Measurement properties of painDETECT: Rasch analysis of responses from community-dwelling adults with neuropathic pain.

PubMed

Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian

2017-03-04

painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.
Opposing effects of negative emotion on amygdalar and hippocampal memory for items and associations

PubMed Central

Horner, Aidan J.; Hørlyck, Lone D.; Burgess, Neil

2016-01-01

Although negative emotion can strengthen memory of an event it can also result in memory disturbances, as in post-traumatic stress disorder (PTSD). We examined the effects of negative item content on amygdalar and hippocampal function in memory for the items themselves and for the associations between them. During fMRI, we examined encoding and retrieval of paired associates made up of all four combinations of neutral and negative images. At test, participants were cued with an image and, if recognised, had to retrieve the associated (target) image. The presence of negative images increased item memory but reduced associative memory. At encoding, subsequent item recognition correlated with amygdala activity, while subsequent associative memory correlated with hippocampal activity. Hippocampal activity was reduced by the presence of negative images, during encoding and correct associative retrieval. In contrast, amygdala activity increased for correctly retrieved negative images, even when cued by a neutral image. Our findings support a dual representation account, whereby negative emotion up-regulates the amygdala to strengthen item memory but down-regulates the hippocampus to weaken associative representations. These results have implications for the development and treatment of clinical disorders in which diminished associations between emotional stimuli and their context contribute to negative symptoms, as in PTSD. PMID:26969864
Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

PubMed

Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

2015-01-01

To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children

PubMed Central

2012-01-01

Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Application of Think Aloud Protocols for Examining and Confirming Sources of Differential Item Functioning Identified by Expert Reviews

ERIC Educational Resources Information Center

Ercikan, Kadriye; Arim, Rubab; Law, Danielle; Domene, Jose; Gagnon, France; Lacroix, Serge

2010-01-01

This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence…
Do Items that Measure Self-Perceived Physical Appearance Function Differentially across Gender Groups? An Application of the MACS Model

ERIC Educational Resources Information Center

Gonzalez-Roma, Vicente; Tomas, Ines; Ferreres, Doris; Hernandez, Ana

2005-01-01

The aims of this study were to investigate whether the 6 items of the Physical Appearance Scale (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) show differential item functioning (DIF) across gender groups of adolescents, and to show how this can be done using the multigroup mean and covariance structure (MG-MACS) analysis model. Two samples…
A Comparison of Methods for Estimating Conditional Item Score Differences in Differential Item Functioning (DIF) Assessments. Research Report. ETS RR-10-15

ERIC Educational Resources Information Center

Moses, Tim; Miao, Jing; Dorans, Neil

2010-01-01

This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…
Content Validity of Patient-Reported Outcome Instruments used with Pediatric Patients with Facial Differences: A Systematic Review.

PubMed

Wickert, Natasha M; Wong Riff, Karen W Y; Mansour, Mark; Forrest, Christopher R; Goodacre, Timothy E E; Pusic, Andrea L; Klassen, Anne F

2018-01-01

Objective The aim of this systematic review was to identify patient-reported outcome (PRO) instruments used in research with children/youth with conditions associated with facial differences to identify the health concepts measured. Design MEDLINE, EMBASE, CINAHL, and PsycINFO were searched from 2004 to 2016 to identify PRO instruments used in acne vulgaris, birthmarks, burns, ear anomalies, facial asymmetries, and facial paralysis patients. We performed a content analysis whereby the items were coded to identify concepts and categorized as positive or negative content or phrasing. Results A total of 7,835 articles were screened; 6 generic and 11 condition-specific PRO instruments were used in 96 publications. Condition-specific instruments were for acne (four), oral health (two), dermatology (one), facial asymmetries (two), microtia (one), and burns (one). The PRO instruments provided 554 items (295 generic; 259 condition specific) that were sorted into 4 domains, 11 subdomains, and 91 health concepts. The most common domain was psychological (n = 224 items). Of the identified items, 76% had negative content or phrasing (e.g., "Because of the way my face looks I wish I had never been born"). Given the small number of items measuring facial appearance (n = 19) and function (n = 22), the PRO instruments reviewed lacked content validity for patients whose condition impacted facial function and/or appearance. Conclusions Treatments can change facial appearance and function. This review draws attention to a problem with content validity in existing PRO instruments. Our team is now developing a new PRO instrument called FACE-Q Kids to address this problem.
Validating a Cantonese short version of the Zarit Burden Interview (CZBI-Short) for dementia caregivers.

PubMed

Tang, Jennifer Yee-Man; Ho, Andy Hau-Yan; Luo, Hao; Wong, Gloria Hoi-Yan; Lau, Bobo Hi-Po; Lum, Terry Yat-Sang; Cheung, Karen Siu-Lan

2016-09-01

The present study aimed to develop and validate a Cantonese short version of the Zarit Burden Interview (CZBI-Short) for Hong Kong Chinese dementia caregivers. The 12-item Zarit Burden Interview (ZBI) was translated into spoken Cantonese and back-translated by two bilingual research assistants and face validated by a panel of experts. Five hundred Chinese dementia caregivers showing signs of stress reported their burden using the translated ZBI and rated their depressive symptoms, overall health, and care recipients' physical functioning and behavioral problems. The factor structure of the translated scale was identified using principal component analysis and confirmatory factor analysis; internal consistency and item-total correlations were assessed; and concurrent validity was tested by correlating the ZBI with depressive symptoms, self-rated health, and care recipients' physical functioning and behavioral problems. The principal component analysis resulted in 11 items loading on a three-factor model comprised role strain, self-criticism, and negative emotion, which accounted for 59% of the variance. The confirmatory factor analysis supported the three-factor model (CZBI-Short) that explained 61% of the total variance. Cronbach's alpha (0.84) and item-total correlations (rho = 0.39-0.71) indicated CZBI-Short had good reliability. CZBI-Short showed correlations with depressive symptoms (r = 0.50), self-rated health (r = -0.26) and care recipients' physical functioning (r = 0.18-0.26) and disruptive behaviors (r = 0.36). The 12-item CZBI-Short is a concise, reliable, and valid instrument to assess burden in Chinese dementia caregivers in clinical and social care settings.
Identifying Country-Specific Cultures of Physics Education: A differential item functioning approach

NASA Astrophysics Data System (ADS)

Mesic, Vanes

2012-11-01

In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.
26 CFR 1.985-5 - Adjustments required upon change in functional currency.

Code of Federal Regulations, 2011 CFR

2011-04-01

... property and the new functional currency amount of liabilities and any other relevant items (e.g., items... adjusted basis or amount multiplied by the new functional currency/old functional currency spot exchange rate on the last day of the taxable year ending before the year of change (spot rate). (d) Step 3A...
26 CFR 1.985-5 - Adjustments required upon change in functional currency.

Code of Federal Regulations, 2013 CFR

2013-04-01

... property and the new functional currency amount of liabilities and any other relevant items (e.g., items... adjusted basis or amount multiplied by the new functional currency/old functional currency spot exchange rate on the last day of the taxable year ending before the year of change (spot rate). (d) Step 3A...
26 CFR 1.985-5 - Adjustments required upon change in functional currency.

Code of Federal Regulations, 2010 CFR

2010-04-01

... property and the new functional currency amount of liabilities and any other relevant items (e.g., items... adjusted basis or amount multiplied by the new functional currency/old functional currency spot exchange rate on the last day of the taxable year ending before the year of change (spot rate). (d) Step 3A...

26 CFR 1.985-5 - Adjustments required upon change in functional currency.

Code of Federal Regulations, 2012 CFR

2012-04-01

... property and the new functional currency amount of liabilities and any other relevant items (e.g., items... adjusted basis or amount multiplied by the new functional currency/old functional currency spot exchange rate on the last day of the taxable year ending before the year of change (spot rate). (d) Step 3A...
26 CFR 1.985-5 - Adjustments required upon change in functional currency.

Code of Federal Regulations, 2014 CFR

2014-04-01

... property and the new functional currency amount of liabilities and any other relevant items (e.g., items... adjusted basis or amount multiplied by the new functional currency/old functional currency spot exchange rate on the last day of the taxable year ending before the year of change (spot rate). (d) Step 3A...
Independent Orbiter Assessment (IOA): Analysis of the Electrical Power Distribution and Control Subsystem, Volume 2

NASA Technical Reports Server (NTRS)

Schmeckpeper, K. R.

1987-01-01

The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Electrical Power Distribution and Control (EPD and C) hardware. The EPD and C hardware performs the functions of distributing, sensing, and controlling 28 volt DC power and of inverting, distributing, sensing, and controlling 117 volt 400 Hz AC power to all Orbiter subsystems from the three fuel cells in the Electrical Power Generation (EPG) subsystem. Volume 2 continues the presentation of IOA analysis worksheets and contains the potential critical items list.
Visual acuity and contrast sensitivity are two important factors affecting vision-related quality of life in advanced age-related macular degeneration

PubMed Central

Selivanova, Alexandra; Shin, Hyun Joon; Miller, Joan W.; Jackson, Mary Lou

2018-01-01

Purpose Vision loss from age-related macular degeneration (AMD) has a profound effect on vision-related quality of life (VRQoL). The pupose of this study is to identify clinical factors associated with VRQoL using the Rasch- calibrated NEI VFQ-25 scales in bilateral advanced AMD patients. Methods We retrospectively reviewed 47 patients (mean age 83.2 years) with bilateral advanced AMD. Clinical assessment included age, gender, type of AMD, high contrast visual acuity (VA), history of medical conditions, contrast sensitivity (CS), central visual field loss, report of Charles Bonnet Syndrome, current treatment for AMD and Rasch-calibrated NEI VFQ-25 visual function and socioemotional function scales. The NEI VFQ visual function scale includes items of general vision, peripheral vision, distance vision and near vision-related activity while the socioemotional function scale includes items of vision related-social functioning, role difficulties, dependency, and mental health. Multiple regression analysis (structural regression model) was performed using fixed item parameters obtained from the one-parameter item response theory model. Results Multivariate analysis showed that high contrast VA and CS were two factors influencing VRQoL visual function scale (β = -0.25, 95% CI-0.37 to -0.12, p<0.001 and β = 0.35, 95% CI 0.25 to 0.46, p<0.001) and socioemontional functioning scale (β = -0.2, 95% CI -0.37 to -0.03, p = 0.023, and β = 0.3, 95% CI 0.18 to 0.43, p = 0.001). Central visual field loss was not assoicated with either VRQoL visual or socioemontional functioning scale (β = -0.08, 95% CI-0.28 to 0.12,p = 0.44 and β = -0.09, 95% CI -0.03 to 0.16, p = 0.50, respectively). Conclusion In patients with vision impairment secondary to bilateral advanced AMD, high contrast VA and CS are two important factors affecting VRQoL. PMID:29746512
Using the International Classification of Functioning, Disability and Health (ICF) to describe children referred to special care or paediatric dental services.

PubMed

Faulks, Denise; Norderyd, Johanna; Molina, Gustavo; Macgiolla Phadraig, Caoimhin; Scagnet, Gabriela; Eschevins, Caroline; Hennequin, Martine

2013-01-01

Children in dentistry are traditionally described in terms of medical diagnosis and prevalence of oral disease. This approach gives little information regarding a child's capacity to maintain oral health or regarding the social determinants of oral health. The biopsychosocial approach, embodied in the International Classification of Functioning, Disability and Health - Child and Youth version (ICF-CY) (WHO), provides a wider picture of a child's real-life experience, but practical tools for the application of this model are lacking. This article describes the preliminary empirical study necessary for development of such a tool - an ICF-CY Core Set for Oral Health. An ICF-CY questionnaire was used to identify the medical, functional, social and environmental context of 218 children and adolescents referred to special care or paediatric dental services in France, Sweden, Argentina and Ireland (mean age 8 years ± 3.6 yrs). International Classification of Disease (ICD-10) diagnoses included disorders of the nervous system (26.1%), Down syndrome (22.0%), mental retardation (17.0%), autistic disorders (16.1%), and dental anxiety alone (11.0%). The most frequently impaired items in the ICF Body functions domain were 'Intellectual functions', 'High-level cognitive functions', and 'Attention functions'. In the Activities and Participation domain, participation restriction was frequently reported for 25 items including 'Handling stress', 'Caring for body parts', 'Looking after one's health' and 'Speaking'. In the Environment domain, facilitating items included 'Support of friends', 'Attitude of friends' and 'Support of immediate family'. One item was reported as an environmental barrier - 'Societal attitudes'. The ICF-CY can be used to highlight common profiles of functioning, activities, participation and environment shared by children in relation to oral health, despite widely differing medical, social and geographical contexts. The results of this empirical study might be used to develop an ICF-CY Core Set for Oral Health - a holistic but practical tool for clinical and epidemiological use.
Refining the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) item candidates: interpretation of a self-reported outcome measure of functional performance by young people with neurodevelopmental disabilities.

PubMed

Kramer, Jessica M; Schwartz, Ariel

2017-10-01

This study examined the item interpretability and rating scale use of the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) by young people with developmental disabilities. The PEDI-PRO assesses the functional performance of discrete functional tasks in the context of everyday life situations. A two-phase cognitive interview design was implemented with a convenience sample of 37 young people (mean age 19y, SD 2y 5mo; 13 males and 24 females; 68% with intellectual disability) with developmental disabilities. In phase I, 182 item candidates were each reviewed by an average of four young people. In phase II, 103 items were carried forward or revised and each reviewed by an average of seven additional young people. Two raters coded responses for intended item interpretation and performance quality; codes were analysed using descriptive statistics. Qualitative analysis explored young people's self-evaluation process. Items were interpreted as intended by most young people (mean 86%). Young people can use PEDI-PRO response categories appropriately to describe their performance: 94% of positive performance descriptions coincided with a positive response category choice; 73% of negative descriptions coincided with a negative response category choice. Young people interpreted items in a literal manner, and their self-evaluation incorporated the use of supports that facilitate functional performance. The PEDI-PRO's measurement framework appears to support the self-evaluation of functional performance of young people with developmental disabilities. © 2017 Mac Keith Press.
Screening for elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient therapy.

PubMed

Hart, Dennis L; Werneke, Mark W; George, Steven Z; Matheson, James W; Wang, Ying-Chih; Cook, Karon F; Mioduski, Jerome E; Choi, Seung W

2009-08-01

Screening people for elevated levels of fear-avoidance beliefs is uncommon, but elevated levels of fear could worsen outcomes. Developing short screening tools might reduce the data collection burden and facilitate screening, which could prompt further testing or management strategy modifications to improve outcomes. The purpose of this study was to develop efficient yet accurate screening methods for identifying elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient rehabilitation. A secondary analysis of data collected prospectively from people with a variety of common neuromusculoskeletal diagnoses was conducted. Intake Fear-Avoidance Beliefs Questionnaire (FABQ) data were collected from 17,804 people who had common neuromusculoskeletal conditions and were receiving outpatient rehabilitation in 121 clinics in 26 states (in the United States). Item response theory (IRT) methods were used to analyze the FABQ data, with particular emphasis on differential item functioning among clinically logical groups of subjects, and to identify screening items. The accuracy of screening items for identifying subjects with elevated levels of fear was assessed with receiver operating characteristic analyses. Three items for fear of physical activities and 10 items for fear of work activities represented unidimensional scales with adequate IRT model fit. Differential item functioning was negligible for variables known to affect functional status outcomes: sex, age, symptom acuity, surgical history, pain intensity, condition severity, and impairment. Items that provided maximum information at the median for the FABQ scales were selected as screening items to dichotomize subjects by high versus low levels of fear. The accuracy of the screening items was supported for both scales. This study represents a retrospective analysis, which should be replicated using prospective designs. Future prospective studies should assess the reliability and validity of using one FABQ item to screen people for high levels of fear-avoidance beliefs. The lack of differential item functioning in the FABQ scales in the sample tested in this study suggested that FABQ screening could be useful in routine clinical practice and allowed the development of single-item screening for fear-avoidance beliefs that accurately identified subjects with elevated levels of fear. Because screening was accurate and efficient, single IRT-based FABQ screening items are recommended to facilitate improved evaluation and care of heterogeneous populations of people receiving outpatient rehabilitation.
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

PubMed Central

Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

2004-01-01

Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Methodology for developing and evaluating the PROMIS smoking item banks.

PubMed

Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

2014-09-01

This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS health expectancies of smoking item banks.

PubMed

Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS nicotine dependence item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS negative psychosocial expectancies of smoking item banks.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Cerully, Jennifer; Kuhfeld, Megan; Hansen, Mark; Cai, Li

2014-09-01

Negative psychosocial expectancies of smoking include aspects of social disapproval and disappointment in oneself. This paper describes analyses conducted to develop and evaluate item banks for assessing psychosocial expectancies among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of psychosocial expectancies items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess psychosocial expectancies. A total of 21 items were included in the Psychosocial Expectancies item banks: 14 items are common across daily and nondaily smokers, 6 are unique to daily, and 1 is unique to nondaily. For both daily and nondaily smokers, the Psychosocial Expectancies item banks are strongly unidimensional, highly reliable (reliability = 0.95 and 0.93, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.85). Results from simulated CATs showed that, on average, fewer than 8 items are needed to assess psychosocial expectancies with adequate precision when using the item banks. Psychosocial expectancies of smoking can be assessed on the basis of these item banks via the SF, by using CAT, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Boundary curves of individual items in the distribution of total depressive symptom scores approximate an exponential pattern in a general population

PubMed Central

Kawasaki, Yohei; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Previously, we proposed a model for ordinal scale scoring in which individual thresholds for each item constitute a distribution by each item. This lead us to hypothesize that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores follow a common mathematical model, which is expressed as the product of the frequency of the total depressive symptom scores and the probability of the cumulative distribution function of each item threshold. To verify this hypothesis, we investigated the boundary curves of the distribution of total depressive symptom scores in a general population. Methods Data collected from 21,040 subjects who had completed the Center for Epidemiologic Studies Depression Scale (CES-D) questionnaire as part of a national Japanese survey were analyzed. The CES-D consists of 20 items (16 negative items and four positive items). The boundary curves of adjacent item scores in the distribution of total depressive symptom scores for the 16 negative items were analyzed using log-normal scales and curve fitting. Results The boundary curves of adjacent item scores for a given symptom approximated a common linear pattern on a log normal scale. Curve fitting showed that an exponential fit had a markedly higher coefficient of determination than either linear or quadratic fits. With negative affect items, the gap between the total score curve and boundary curve continuously increased with increasing total depressive symptom scores on a log-normal scale, whereas the boundary curves of positive affect items, which are not considered manifest variables of the latent trait, did not exhibit such increases in this gap. Discussion The results of the present study support the hypothesis that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores commonly follow the predicted mathematical model, which was verified to approximate an exponential mathematical pattern. PMID:27761346
Priming in Episodic and Semantic Memory.

ERIC Educational Resources Information Center

McKoon, Gail; Ratcliff, Roger

1979-01-01

Four experiments examined priming between newly learned paired associates through two procedures, lexical decision and item recognition. Results argue against a functional separation of the semantic and episodic memory systems. (Author/AM)
Federal Logistics Information System (FLIS) Procedures Manual, Volume 4. Item Identification.

DTIC Science & Technology

1995-01-01

Functional I DRMS Defense Reutilization 1,15 Description and Marketing FDM Full Descriptive 2 Service Method (Item DPSC Defense Personnel 2,13,14...under DIC KRE, return code ment or segment mix of FLIS data. For interna- AU. tional cataloging, only one Output Data RequestV Code may be used per...Screening Results) with KMR (Matching NATO Maintenance and Supply Agency (NAMSA), Reference-Screening) and either KFC (File Data the custodian for control
A Preliminary Analysis of Teaching Improvisation with the Picture Exchange Communication System to Children with Autism

PubMed Central

Marckel, Julie M; Neef, Nancy A; Ferreri, Summer J

2006-01-01

Two young boys with autism who used the picture exchange communication system were taught to solve problems (improvise) by using descriptors (functions, colors, and shapes) to request desired items for which specific pictures were unavailable. The results of a multiple baseline across descriptors showed that training increased the number of improvised requests, and that these skills generalized to novel items, and across settings and listeners in the natural environment. PMID:16602390
A preliminary analysis of teaching improvisation with the picture exchange communication system to children with autism.

PubMed

Marckel, Julie M; Neef, Nancy A; Ferreri, Summer J

2006-01-01

Two young boys with autism who used the picture exchange communication system were taught to solve problems (improvise) by using descriptors (functions, colors, and shapes) to request desired items for which specific pictures were unavailable. The results of a multiple baseline across descriptors showed that training increased the number of improvised requests, and that these skills generalized to novel items, and across settings and listeners in the natural environment.
Frontoparietal network involved in successful retrieval from episodic memory. Spatial and temporal analyses using fMRI and ERP.

PubMed

Iidaka, Tetsuya; Matsumoto, Atsushi; Nogawa, Junpei; Yamamoto, Yukiko; Sadato, Norihiro

2006-09-01

The neural basis for successful recognition of previously studied items, referred to as "retrieval success," has been investigated using either neuroimaging or brain potentials; however, few studies have used both modalities. Our study combined event-related functional magnetic resonance imaging (fMRI) and event-related potential (ERP) in separate groups of subjects. The neural responses were measured while the subjects performed an old/new recognition task with pictures that had been previously studied in either a deep- or shallow-encoding condition. The fMRI experiment showed that among the frontoparietal regions involved in retrieval success, the inferior frontal gyrus and intraparietal sulcus were crucial to conscious recollection because the activity of these regions was influenced by the depth of memory at encoding. The activity of the right parietal region in response to a repeated item was modulated by the repetition lag, indicating that this area would be critical to familiarity-based judgment. The results of structural equation modeling revealed that the functional connectivity among the regions in the left hemisphere was more significant than that in the right hemisphere. The results of the ERP experiment and independent component analysis paralleled those of the fMRI experiment and demonstrated that the repeated item produced an earlier peak than the hit item by approximately 50 ms.
Development and initial validation of a brief self-report measure of cognitive dysfunction in fibromyalgia.

PubMed

Kratz, Anna L; Schilling, Stephen G; Goesling, Jenna; Williams, David A

2015-06-01

Pain is often the focus of research and clinical care in fibromyalgia (FM); however, cognitive dysfunction is also a common, distressing, and disabling symptom in FM. Current efforts to address this problem are limited by the lack of a comprehensive, valid measure of subjective cognitive dysfunction in FM that is easily interpretable, accessible, and brief. The purpose of this study was to leverage cognitive functioning item banks that were developed as part of the Patient Reported Outcomes Measurement Information System (PROMIS) to devise a 10-item short form measure of cognitive functioning for use in FM. In study 1, a nationwide (U.S.) sample of 1,035 adults with FM (age range = 18-82, 95.2% female) completed 2 cognitive item pools. Factor analyses and item response theory analyses were used to identify dimensionality and optimally performing items. A recommended 10-item measure, called the Multidimensional Inventory of Subjective Cognitive Impairment (MISCI) was created. In study 2, 232 adults with FM completed the MISCI and a legacy measure of cognitive functioning that is used in FM clinical trials, the Multiple Ability Self-Report Questionnaire (MASQ). The MISCI showed excellent internal reliability, low ceiling/floor effects, and good convergent validity with the MASQ (r = -.82). This paper presents the MISCI, a 10-item measure of cognitive dysfunction in FM, developed through classical test theory and item response theory. This brief but comprehensive measure shows evidence of excellent construct validity through large correlations with a lengthy legacy measure of cognitive functioning. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.

Item Reliabilities for a Family of Answer-Until-Correct (AUC) Scoring Rules.

ERIC Educational Resources Information Center

Kane, Michael T.; Moloney, James M.

The Answer-Until-Correct (AUC) procedure has been proposed in order to increase the reliability of multiple-choice items. A model for examinees' behavior when they must respond to each item until they answer it correctly is presented. An expression for the reliability of AUC items, as a function of the characteristics of the item and the scoring…
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.

PubMed

Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro

2013-01-01

the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing

PubMed Central

Huang, Wenhao; Chapman-Novakofski, Karen M

2017-01-01

Background The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. Objective The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps’ educational quality and technical functionality. Methods Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Results Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Conclusions Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps’ qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. PMID:29079554
Lawton IADL scale in dementia: can item response theory make it more informative?

PubMed

McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

2014-07-01

impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Samejima Items in Multiple-Choice Tests: Identification and Implications

ERIC Educational Resources Information Center

Rahman, Nazia

2013-01-01

Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).

PubMed

Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul

2018-03-01

Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Evidence for parallel consolidation of motion direction and orientation into visual short-term memory.

PubMed

Rideaux, Reuben; Apthorp, Deborah; Edwards, Mark

2015-02-12

Recent findings have indicated the capacity to consolidate multiple items into visual short-term memory in parallel varies as a function of the type of information. That is, while color can be consolidated in parallel, evidence suggests that orientation cannot. Here we investigated the capacity to consolidate multiple motion directions in parallel and reexamined this capacity using orientation. This was achieved by determining the shortest exposure duration necessary to consolidate a single item, then examining whether two items, presented simultaneously, could be consolidated in that time. The results show that parallel consolidation of direction and orientation information is possible, and that parallel consolidation of direction appears to be limited to two. Additionally, we demonstrate the importance of adequate separation between feature intervals used to define items when attempting to consolidate in parallel, suggesting that when multiple items are consolidated in parallel, as opposed to serially, the resolution of representations suffer. Finally, we used facilitation of spatial attention to show that the deterioration of item resolution occurs during parallel consolidation, as opposed to storage. © 2015 ARVO.
Space shuttle/food system study

NASA Technical Reports Server (NTRS)

1974-01-01

This document establishes the Functional, physical and performance interface requirements are studied between the space shuttle orbiter and the galley water system, the orbiter and the galley electrical system, and the orbiter and the galley structural system. Control of the configuration and design of the applicable interfacing items is intended to maintain compatibility between co-functioning and physically mating items and to assure those performance criteria that are dependent upon the interfacing items.
Binary Logistic Regression Analysis for Detecting Differential Item Functioning: Effectiveness of R[superscript 2] and Delta Log Odds Ratio Effect Size Measures

ERIC Educational Resources Information Center

Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.

2014-01-01

The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Differential item functioning by sex and race in the Hogan Personality Inventory.

PubMed

Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M; Dai, Guangdong; King, Daniel W

2006-12-01

The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.
Development and psychometric characteristics of the SCI-QOL Ability to Participate and Satisfaction with Social Roles and Activities item banks and short forms.

PubMed

Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S

2015-05-01

To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.
Gender fairness within the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Traxler, Adrienne; Henderson, Rachel; Stewart, John; Stewart, Gay; Papak, Alexis; Lindell, Rebecca

2018-01-01

Research on the test structure of the Force Concept Inventory (FCI) has largely ignored gender, and research on FCI gender effects (often reported as "gender gaps") has seldom interrogated the structure of the test. These rarely crossed streams of research leave open the possibility that the FCI may not be structurally valid across genders, particularly since many reported results come from calculus-based courses where 75% or more of the students are men. We examine the FCI considering both psychometrics and gender disaggregation (while acknowledging this as a binary simplification), and find several problematic questions whose removal decreases the apparent gender gap. We analyze three samples (total Npre=5391 , Npost=5769 ) looking for gender asymmetries using classical test theory, item response theory, and differential item functioning. The combination of these methods highlights six items that appear substantially unfair to women and two items biased in favor of women. No single physical concept or prior experience unifies these questions, but they are broadly consistent with problematic items identified in previous research. Removing all significantly gender-unfair items halves the gender gap in the main sample in this study. We recommend that instructors using the FCI report the reduced-instrument score as well as the 30-item score, and that credit or other benefits to students not be assigned using the biased items.
Rapid forgetting results from competition over time between items in visual working memory.

PubMed

Pertzov, Yoni; Manohar, Sanjay; Husain, Masud

2017-04-01

Working memory is now established as a fundamental cognitive process across a range of species. Loss of information held in working memory has the potential to disrupt many aspects of cognitive function. However, despite its significance, the mechanisms underlying rapid forgetting remain unclear, with intense recent debate as to whether it is interference between stored items that leads to loss of information or simply temporal decay. Here we show that both factors are essential and interact in a highly specific manner. Although a single item can be maintained in memory with high fidelity, multiple items compete in working memory, progressively degrading each other's representations as time passes. Specifically, interaction between items is associated with both worsening precision and increased reporting errors of object features over time. Importantly, during the period of maintenance, although items are no longer visible, maintenance resources can be selectively redeployed to protect the probability to recall the correct feature and the precision with which cued items can be recalled, as if it was the only item in memory. These findings reveal that the biased competition concept could be applied not only to perceptual processes but also to active maintenance of working memory representations over time. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The Modified Checklist for Autism in Toddlers in extremely low gestational age newborns: individual items associated with motor, cognitive, vision and hearing limitations.

PubMed

Luyster, Rhiannon J; Kuban, Karl C K; O'Shea, T Michael; Paneth, Nigel; Allred, Elizabeth N; Leviton, Alan

2011-07-01

The Modified Checklist for Autism in Toddlers (M-CHAT) has yielded elevated rates of screening failure for children born preterm or with low birthweight. We extended these findings with a detailed examination of M-CHAT items in a large sample of children born at extremely low gestational age. The sample was grouped according to children's current limitations and degree of impairment. The aim was to better understand how disabilities might influence M-CHAT scores. Fourteen participating institutions of the Extremely Low Gestational Age Newborns (ELGAN) Study prospectively collected information about 1086 infants who were born before the 28th week of gestation and had an assessment at age 24-months. The 24-month visit included a neurological assessment, the Bayley Scales of Infant Development, Second edition (BSID-II), M-CHAT and a medical history form. Outcome measures included the distribution of failed M-CHAT items among groups classified according to cerebral palsy diagnosis, gross motor function, BSID-II scores and vision or hearing impairments. M-CHAT items were failed more frequently by children with concurrently identified impairments (motor, cognitive, vision and hearing). In addition, the frequency of item failure increased with the severity of impairment. The failed M-CHAT items were often, but not consistently, related to children's specific impairments. Importantly, four of the six M-CHAT 'critical items' were commonly affected by presence and severity of concurrent impairments. The strong association between impaired sensory or motor function and M-CHAT results among extremely low gestational age children suggests that such impairments might give rise to false positive M-CHAT screening. © 2011 Blackwell Publishing Ltd.
[Differential item functioning: a bibliometric analysis of journals published in Spanish].

PubMed

Guilera, Georgina; Gómez, Juana; Hidalgo, M Dolores

2006-11-01

Differential item functioning: a bibliometric analysis of journals published in Spanish. This study aims to provide an overview of scientific productivity with respect to articles published in Spanish on the issue of DIF. The documents included in the study were identified using the Psicodoc database, as well as the Science Citation Index and Social Science Citation Index from the Web of Science. The analyses carried out are focused mainly on presenting the frequencies and percentages of publications with respect to various bibliometric indicators. The results reveal that interest in the issue of DIF has increased, and that the universities are the most productive institutions. The majority of articles have been published in the journal Psicothema.
Development of a PROMIS item bank to measure pain interference.

PubMed

Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei

2010-07-01

This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Development and Validation of a Six-Item Version of the Interpersonal Dependency Inventory.

PubMed

McClintock, Andrew S; McCarrick, Shannon M; Anderson, Timothy; Himawan, Lina; Hirschfeld, Robert

2017-04-01

The Interpersonal Dependency Inventory (IDI) is a frequently used, 48-item measure of maladaptive dependency. Our goal was to develop and psychometrically evaluate a very brief version of the IDI. An exploratory factor analysis of the IDI in Study 1 ( N = 838) yielded a six-item IDI (IDI-6), with three items loading on an emotional dependency factor (IDI-6-ED), and the other three items loading on a functional dependency factor (IDI-6-FD). This factor solution was validated by confirmatory factor analysis in Study 2 ( N = 916). The IDI-6-ED and IDI-6-FD demonstrated good convergent and divergent validity in Study 3 ( N = 100). In Study 4 ( N = 22-43), the IDI-6-ED and IDI-6-FD were generally stable over 4-week and 8-week intervals and were found to be responsive to the effects of psychological treatment. These results have implications for dependency conceptualizations and support the IDI-6 as a brief, psychometrically sound instrument.
A symptom profile of depression among Asian Americans: is there evidence for differential item functioning of depressive symptoms?

PubMed

Kalibatseva, Z; Leong, F T L; Ham, E H

2014-09-01

Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.
Measuring Psychobiosocial States in Sport: Initial Validation of a Trait Measure

PubMed Central

Bertollo, Maurizio; Ruiz, Montse C.; Bortoli, Laura

2016-01-01

We examined the item characteristics, the factor structure, and the concurrent validity of a trait measure of psychobiosocial states. In Study 1, Italian athletes (N = 342, 228 men, 114 women, Mage = 23.93, SD = 6.64) rated the intensity, the frequency, and the perceived impact dimensions of a psychobiosocial states scale, trait version (PBS-ST), which is composed of 20 items (10 functional and 10 dysfunctional) referring to how they usually felt before an important competition. In Study 2, the scale was cross validated in an independent sample (N = 251, 181 men, 70 women, Mage = 24.35, SD = 7.25). The concurrent validity of the PBS-ST scale scores were also examined in comparison with two sport-specific emotion-related measures and a general measure of affect. Exploratory structural equation modeling and confirmatory factor analysis of the data of Study 1 showed that a 2-factor, 15-item solution of the PBS-ST scale (8 functional items and 7 dysfunctional items) reached satisfactory fit indices for the three dimensions (i.e., intensity, frequency, and perceived impact). Results of Study 2 provided evidence of substantial measurement and structural invariance of all dimensions across samples. The low association of the PBS-ST scale with other measures suggests that the scale taps unique constructs. Findings of the two studies offer initial validity evidence for a sport-specific tool to measure psychobiosocial states. PMID:27907111
Individualism and the Extended-Self: Cross-Cultural Differences in the Valuation of Authentic Objects

PubMed Central

Gjersoe, Nathalia L.; Newman, George E.; Chituc, Vladimir; Hood, Bruce

2014-01-01

The current studies examine how valuation of authentic items varies as a function of culture. We find that U.S. respondents value authentic items associated with individual persons (a sweater or an artwork) more than Indian respondents, but that both cultures value authentic objects not associated with persons (a dinosaur bone or a moon rock) equally. These differences cannot be attributed to more general cultural differences in the value assigned to authenticity. Rather, the results support the hypothesis that individualistic cultures place a greater value on objects associated with unique persons and in so doing, offer the first evidence for how valuation of certain authentic items may vary cross-culturally. PMID:24658437

Individualism and the extended-self: cross-cultural differences in the valuation of authentic objects.

PubMed

Gjersoe, Nathalia L; Newman, George E; Chituc, Vladimir; Hood, Bruce

2014-01-01

The current studies examine how valuation of authentic items varies as a function of culture. We find that U.S. respondents value authentic items associated with individual persons (a sweater or an artwork) more than Indian respondents, but that both cultures value authentic objects not associated with persons (a dinosaur bone or a moon rock) equally. These differences cannot be attributed to more general cultural differences in the value assigned to authenticity. Rather, the results support the hypothesis that individualistic cultures place a greater value on objects associated with unique persons and in so doing, offer the first evidence for how valuation of certain authentic items may vary cross-culturally.
Expectations for Visual Function: An Initial Evaluation of a New Clinical Instrument.

ERIC Educational Resources Information Center

Corn, Anne L.; Webne, Steve L.

2001-01-01

A study explored the internal consistency of items in a visual screening instrument developed by Project PAVE: Expectations for Visual Functioning (EVF). The test includes 20 items that evaluate a child's functional use of vision. A pilot test involving 129 teachers indicates the EFV is internally consistent. (Contains three references.) (CR)
A Study on Detecting of Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

ERIC Educational Resources Information Center

Çikirikçi Demirtasli, Nükhet; Ulutas, Seher

2015-01-01

Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd

The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
An Alternative Methodology for Creating Parallel Test Forms Using the IRT Information Function.

ERIC Educational Resources Information Center

Ackerman, Terry A.

The purpose of this paper is to report results on the development of a new computer-assisted methodology for creating parallel test forms using the item response theory (IRT) information function. Recently, several researchers have approached test construction from a mathematical programming perspective. However, these procedures require…
Monitoring population health for Healthy People 2020: evaluation of the NIH PROMIS® Global Health, CDC Healthy Days, and satisfaction with life instruments

PubMed Central

Barile, John P.; Reeve, Bryce B.; Smith, Ashley Wilder; Zack, Matthew M.; Mitchell, Sandra A.; Kobau, Rosemarie; Cella, David F.; Luncheon, Cecily; Thompson, William W.

2015-01-01

Purpose Healthy People 2020 identified health-related quality of life and well-being (WB) as indicators of population health for the next decade. This study examined the measurement properties of the NIH PROMIS® Global Health Scale, the CDC Healthy Days items, and associations with the Satisfaction with Life Scale. Methods A total of 4,184 adults completed the Porter Novelli's HealthStyles mailed survey. Physical and mental health (9 items from PROMIS Global Scale and 3 items from CDC Healthy days measure), and 4 WB factor items were tested for measurement equivalence using multiple-group confirmatory factor analysis. Results The CDC items accounted for similar variance as the PROMIS items on physical and mental health factors; both factors were moderately correlated with WB. Measurement invariance was supported across gender and age; the magnitude of some factor loadings differed between those with and without a chronic medical condition. Conclusions The PROMIS, CDC, and WB items all performed well. The PROMIS items captured a broad range of functioning across the entire continuum of physical and mental health, while the CDC items appear appropriate for assessing burden of disease for chronic conditions and are brief and easily interpretable. All three measures under study appear to be appropriate measures for monitoring several aspects of the Healthy People 2020 goals and objectives. PMID:23404737
The hierarchy of the activities of daily living in the Katz index in residents of skilled nursing facilities.

PubMed

Gerrard, Paul

2013-01-01

Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
Seeking missing pieces in science concept assessments: Reevaluating the Brief Electricity and Magnetism Assessment through Rasch analysis

NASA Astrophysics Data System (ADS)

Ding, Lin

2014-02-01

Discipline-based science concept assessments are powerful tools to measure learners' disciplinary core ideas. Among many such assessments, the Brief Electricity and Magnetism Assessment (BEMA) has been broadly used to gauge student conceptions of key electricity and magnetism (E&M) topics in college-level introductory physics courses. Differing from typical concept inventories that focus only on one topic of a subject area, BEMA covers a broad range of topics in the electromagnetism domain. In spite of this fact, prior studies exclusively used a single aggregate score to represent individual students' overall understanding of E&M without explicating the construct of this assessment. Additionally, BEMA has been used to compare traditional physics courses with a reformed course entitled Matter and Interactions (M&I). While prior findings were in favor of M&I, no empirical evidence was sought to rule out possible differential functioning of BEMA that may have inadvertently advantaged M&I students. In this study, we used Rasch analysis to seek two missing pieces regarding the construct and differential functioning of BEMA. Results suggest that although BEMA items generally can function together to measure the same construct of application and analysis of E&M concepts, several items may need further revision. Additionally, items that demonstrate differential functioning for the two courses are detected. Issues such as item contextual features and student familiarity with question settings may underlie these findings. This study highlights often overlooked threats in science concept assessments and provides an exemplar for using evidence-based reasoning to make valid inferences and arguments.
A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

PubMed

Massof, Robert W

2014-10-01

A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

ERIC Educational Resources Information Center

Zwick, Rebecca

2012-01-01

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
Quantifying traditional Chinese medicine patterns using modern test theory: an example of functional constipation.

PubMed

Shen, Minxue; Cui, Yuanwu; Hu, Ming; Xu, Linyong

2017-01-13

The study aimed to validate a scale to assess the severity of "Yin deficiency, intestine heat" pattern of functional constipation based on the modern test theory. Pooled longitudinal data of 237 patients with "Yin deficiency, intestine heat" pattern of constipation from a prospective cohort study were used to validate the scale. Exploratory factor analysis was used to examine the common factors of items. A multidimensional item response model was used to assess the scale with the presence of multidimensionality. The Cronbach's alpha ranged from 0.79 to 0.89, and the split-half reliability ranged from 0.67 to 0.79 at different measurements. Exploratory factor analysis identified two common factors, and all items had cross factor loadings. Bidimensional model had better goodness of fit than the unidimensional model. Multidimensional item response model showed that the all items had moderate to high discrimination parameters. Parameters indicated that the first latent trait signified intestine heat, while the second trait characterized Yin deficiency. Information function showed that items demonstrated highest discrimination power among patients with moderate to high level of disease severity. Multidimensional item response theory provides a useful and rational approach in validating scales for assessing the severity of patterns in traditional Chinese medicine.
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding

ERIC Educational Resources Information Center

Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.

2015-01-01

Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…
Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF

ERIC Educational Resources Information Center

Lee, Soo; Bulut, Okan; Suh, Youngsuk

2017-01-01

A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

ERIC Educational Resources Information Center

Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

2016-01-01

Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
An in-depth psychometric analysis of the Connor-Davidson Resilience Scale: calibration with Rasch-Andrich model.

PubMed

Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P

2015-09-23

The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
Grouping in decomposition method for multi-item capacitated lot-sizing problem with immediate lost sales and joint and item-dependent setup cost

NASA Astrophysics Data System (ADS)

Narenji, M.; Fatemi Ghomi, S. M. T.; Nooraie, S. V. R.

2011-03-01

This article examines a dynamic and discrete multi-item capacitated lot-sizing problem in a completely deterministic production or procurement environment with limited production/procurement capacity where lost sales (the loss of customer demand) are permitted. There is no inventory space capacity and the production activity incurs a fixed charge linear cost function. Similarly, the inventory holding cost and the cost of lost demand are both associated with a linear no-fixed charge function. For the sake of simplicity, a unit of each item is assumed to consume one unit of production/procurement capacity. We analyse a different version of setup costs incurred by a production or procurement activity in a given period of the planning horizon. In this version, called the joint and item-dependent setup cost, an additional item-dependent setup cost is incurred separately for each produced or ordered item on top of the joint setup cost.
Effects of aging and divided attention on memory for items and their contexts.

PubMed

Craik, Fergus I M; Luo, Lin; Sakuta, Yuiko

2010-12-01

It is commonly found that memory for context declines disproportionately with aging, arguably due to a general age-related deficit in associative memory processes. One possible mechanism for such deficits is an age-related reduction in available processing resources. In two experiments we compared the effects of aging to the effects of division of attention in younger adults on memory for items and context. Using a technique proposed by Craik (1989), linear functions relating memory performance for items and their contexts were derived for a Young Full Attention group, a Young Divided Attention group, and an Older Adult group. Results suggested that the Old group showed an additional deficit in associative memory that was not mimicked by divided attention. It is speculated that both divided attention and aging are associated with a loss of available processing resources that may reflect inefficient frontal lobe functioning, whereas the additional age-related decrement in associative memory may reflect inefficient processing in medial-temporal regions. (c) 2010 APA, all rights reserved).
[Evaluation of the factorial and metric equivalence of the Sexual Assertiveness Scale (SAS) by sex].

PubMed

Sierra, Juan Carlos; Santos-Iglesias, Pablo; Vallejo-Medina, Pablo

2012-05-01

Sexual assertiveness refers to the ability to initiate sexual activity, refuse unwanted sexual activity, and use contraceptive methods to avoid sexually transmitted diseases, developing healthy sexual behaviors. The Sexual Assertiveness Scale (SAS) assesses these three dimensions. The purpose of this study is to evaluate, using structural equation modeling and differential item functioning, the equivalence of the scale between men and women. Standard scores are also provided. A total of 4,034 participants from 21 Spanish provinces took part in the study. Quota sampling method was used. Results indicate a strict equivalent dimensionality of the Sexual Assertiveness Scale across sexes. One item was flagged by differential item functioning, although it does not affect the scale. Therefore, there is no significant bias in the scale when comparing across sexes. Standard scores show similar Initiation assertiveness scores for men and women, and higher scores on Refusal and Sexually Transmitted Disease Prevention for women. This scale can be used on men and women with sufficient psychometric guarantees.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.