Differential Item Functioning Analysis Using Rasch Item Information Functions
ERIC Educational Resources Information Center
Wyse, Adam E.; Mapuranga, Raymond
2009-01-01
Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert
2016-01-01
Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
Rasch validation of the Arabic version of the lower extremity functional scale.
Alnahdi, Ali H
2018-02-01
The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano
2017-04-01
To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
A confirmative clinimetric analysis of the 36-item Family Assessment Device.
Timmerby, Nina; Cosci, Fiammetta; Watson, Maggie; Csillag, Claudio; Schmitt, Florence; Steck, Barbara; Bech, Per; Thastum, Mikael
2018-02-07
The Family Assessment Device (FAD) is a 60-item questionnaire widely used to evaluate self-reported family functioning. However, the factor structure as well as the number of items has been questioned. A shorter and more user-friendly version of the original FAD-scale, the 36-item FAD, has therefore previously been proposed, based on findings in a nonclinical population of adults. We aimed in this study to evaluate the brief 36-item version of the FAD in a clinical population. Data from a European multinational study, examining factors associated with levels of family functioning in adult cancer patients' families, were used. Both healthy and ill parents completed the 60-item version FAD. The psychometric analyses conducted were Principal Component Analysis and Mokken-analysis. A total of 564 participants were included. Based on the psychometric analysis we confirmed that the 36-item version of the FAD has robust psychometric properties and can be used in clinical populations. The present analysis confirmed that the 36-item version of the FAD (18 items assessing 'well-being' and 18 items assessing 'dysfunctional' family function) is a brief scale where the summed total score is a valid measure of the dimensions of family functioning. This shorter version of the FAD is, in accordance with the concept of 'measurement-based care', an easy to use scale that could be considered when the aim is to evaluate self-reported family functioning.
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models
ERIC Educational Resources Information Center
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol
2016-01-01
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
A Quasi-Parametric Method for Fitting Flexible Item Response Functions
ERIC Educational Resources Information Center
Liang, Longjuan; Browne, Michael W.
2015-01-01
If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure
McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.
2013-01-01
Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Tarrant, Marie; Ware, James; Mohammed, Ahmed M
2009-07-07
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei
2013-07-01
To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
ERIC Educational Resources Information Center
Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…
Differential Item Functioning Analysis of the Mental, Emotional, and Bodily Toughness Inventory
ERIC Educational Resources Information Center
Gao, Yong; Mack, Mick G.; Ragan, Moira A.; Ragan, Brian
2012-01-01
In this study the authors used differential item functioning analysis to examine if there were items in the Mental, Emotional, and Bodily Toughness Inventory functioning differently across gender and athletic membership. A total of 444 male (56.3%) and female (43.7%) participants (30.9% athletes and 69.1% non-athletes) responded to the Mental,…
Tian, Feng; Ni, Pengsheng; Mulcahey, M J; Hambleton, Ronald K; Tulsky, David; Haley, Stephen M; Jette, Alan M
2014-11-01
To use item response theory (IRT) methods to link scores from 2 recently developed contemporary functional outcome measures, the adult Spinal Cord Injury-Functional Index (SCI-FI) and the Pedi SCI (both the parent version and the child version). Secondary data analysis of the physical functioning items of the adult SCI-FI and the Pedi SCI instruments. We used a nonequivalent group design with items common to both instruments and the Stocking-Lord method for the linking. Linking was conducted so that the adult SCI-FI and Pedi SCI scaled scores could be compared. Community. This study included a total sample of 1558 participants. Pedi SCI items were administered to a sample of children (n=381) with SCI aged 8 to 21 years, and of parents/caregivers (n=322) of children with SCI aged 4 to 21 years. Adult SCI-FI items were administered to a sample of adults (n=855) with SCI aged 18 to 92 years. Not applicable. Five scales common to both instruments were included in the analysis: Wheelchair, Daily Routine/Self-care, Daily Routine/Fine Motor, Ambulation, and General Mobility functioning. Confirmatory factor analysis and exploratory factor analysis results indicated that the 5 scales are unidimensional. A graded response model was used to calibrate the items. Misfitting items were identified and removed from the item banks. Items that function differently between the adult and child samples (ie, exhibit differential item functioning) were identified and removed from the common items used for linking. Domain scores from the Pedi SCI instruments were transformed onto the adult SCI-FI metric. This IRT linking allowed estimation of adult SCI-FI scale scores based on Pedi SCI scale scores and vice versa; therefore, it provides clinicians with a means of tracking long-term functional data for children with an SCI across their entire lifespan. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Hamilton, Clayon B; Chesworth, Bert M
2013-11-01
The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0-100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.
Chesworth, Bert M.
2013-01-01
Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity. PMID:23813086
Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Liu, Qian
2011-01-01
For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…
ERIC Educational Resources Information Center
Fukuhara, Hirotaka; Kamata, Akihito
2011-01-01
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
2018-02-02
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Further evaluation of leisure items in the attention condition of functional analyses.
Roscoe, Eileen M; Carreau, Abbey; MacDonald, Jackie; Pence, Sacha T
2008-01-01
Research suggests that including leisure items in the attention condition of a functional analysis may produce engagement that masks sensitivity to attention. In this study, 4 individuals' initial functional analyses indicated that behavior was maintained by nonsocial variables (n = 3) or by attention (n = 1). A preference assessment was used to identify items for subsequent functional analyses. Four conditions were compared, attention with and without leisure items and control with and without leisure items. Following this, either high- or low-preference items were included in the attention condition. Problem behavior was more probable during the attention condition when no leisure items or low-preference items were included, and lower levels of problem behavior were observed during the attention condition when high-preference leisure items were included. These findings suggest how preferred items may hinder detection of behavioral function.
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.
Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl
2018-02-01
The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2017-01-01
This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.
ERIC Educational Resources Information Center
Muraki, Eiji
1999-01-01
Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Estabrook, Ryne; Sadler, Michael E; McGue, Matt
2015-12-01
A long-standing and critical problem in the study of aging and depression is the comparability of measurement across age groups. While psychological measures of depression typically show increased incidence of symptoms with increasing age, rates of depression diagnosis do not show the same age trend. This analysis presents tests of differential item functioning on the depression section of the CAMDEX interview schedule, using factor analysis-derived affective and somatic subscales (McGue & Christensen, 1997). Results for the affective subscale show significant differences in item functioning in the majority of the affective items as a function of age (items "Happy Life," "Lonely," "Nervous" "Worthless," and "Future": χ6(2) = [30.193, 255.971] across items, all p < .0001). Analyses for the somatic subscale show differential item functioning is limited to a single item relating to coping (χ6(2) = 180.754, p < .0001). These results indicate that differences in depression symptoms across age groups are not entirely consistent with a unidimensional depression trait, and that the measurement structure of depression varies over the life span. (c) 2015 APA, all rights reserved).
ERIC Educational Resources Information Center
Mitchelson, Jacqueline K.; Wicher, Eliza W.; LeBreton, James M.; Craig, S. Bartholomew
2009-01-01
The current study evaluates the measurement precision of the Abridged Big Five Circumplex (AB5C) of personality traits by identifying those items that demonstrate differential item functioning by gender and ethnicity. Differential item functioning is found in 33 of 45 (73%) of the AB5C scales, across gender and ethnic groups (Caucasian vs. African…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng
2016-12-01
Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Examination of the PROMIS upper extremity item bank.
Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R
Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire
ERIC Educational Resources Information Center
Gao, Yong; Zhu, Weimo
2011-01-01
Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
A measure of early physical functioning (EPF) post-stroke.
Finch, Lois E; Higgins, Johanne; Wood-Dauphinee, Sharon; Mayo, Nancy E
2008-07-01
To develop a comprehensive measure of Early Physical Functioning (EPF) post-stroke quantified through Rasch analysis and conceptualized using the International Classification of Functioning Disability and Health (ICF). An observational cohort study. A cohort of 262 subjects (mean age 71.6 (standard deviation 12.5) years) hospitalized post-acute stroke. Functional assessments were made within 3 days of stroke with items from valid and reliable indices commonly utilized to evaluate stroke survivors. Information on important variables was also collected. Principal component and Rasch analysis confirmed the factor structure, and dimensionality of the measure. Rasch analysis combined items across ICF components to develop the measure. Items were deleted iteratively, those retained fit the model and were related to the construct; reliability and validity were assessed. A 38-item unidimensional measure of the EPF met all Rasch model requirements. The item difficulty matched the person ability (mean person measure: -0.31; standard error 0.37 logits), reliability of the person-item-hierarchy was excellent at 0.97. Initial validity was adequate. The 38-item EPF measure was developed. It expands the range of assessment post acute stroke; it covers a broad spectrum of difficulty with good initial psychometric properties that, once revalidated, can assist in planning and evaluating early interventions.
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection
ERIC Educational Resources Information Center
Kim, Jihye; Oshima, T. C.
2013-01-01
In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
CTTITEM: SAS macro and SPSS syntax for classical item analysis.
Lei, Pui-Wa; Wu, Qiong
2007-08-01
This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.
McCabe, Erin; Gross, Douglas P; Bulut, Okan
2018-06-07
The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Female Sexual Function Index Short Version: A MsFLASH Item Response Analysis.
Carpenter, Janet S; Jones, Salene M W; Studts, Christina R; Heiman, Julia R; Reed, Susan D; Newton, Katherine M; Guthrie, Katherine A; Larson, Joseph C; Cohen, Lee S; Freeman, Ellen W; Jane Lau, R; Learman, Lee A; Shifren, Jan L
2016-11-01
The Female Sexual Function Index (FSFI) is a psychometrically sound and popular 19-item self-report measure, but its length may preclude its use in studies with multiple outcome measures, especially when sexual function is not a primary endpoint. Only one attempt has been made to create a shorter scale, resulting in the Italian FSFI-6, later translated into Spanish and Korean without further psychometric analysis. Our study evaluated whether a subset of items on the 19-item English-language FSFI would perform as well as the full-length FSFI in peri- and postmenopausal women. We used baseline data from 898 peri- and postmenopausal women recruited from multiple communities, ages 42-62 years, and enrolled in randomized controlled trials for vasomotor symptom management. Goals were to (1) create a psychometrically sound, shorter version of the FSFI for use in peri- and postmenopausal women as a continuous measure and (2) compare it to the Italian FSFI-6. Results indicated that a 9-item scale provided more information than the FSFI-6 across a spectrum of sexual functioning, was able to capture sample variability, and showed sufficient range without floor or ceiling effects. All but one of the items from the Italian 6-item version were included in the 9-item version. Most omitted FSFI items focused on frequency of events or experiences. When assessment of sexual function is a secondary endpoint and subject burden related to questionnaire length is a priority, the 9-item FSFI may provide important information about sexual function in English-speaking peri- and postmenopausal women.
ERIC Educational Resources Information Center
Zwick, Rebecca
2012-01-01
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange
ERIC Educational Resources Information Center
Penfield, Randall D.
2005-01-01
Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…
ERIC Educational Resources Information Center
Beinicke, Andrea; Pässler, Katja; Hell, Benedikt
2014-01-01
The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…
Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah
2015-01-01
The functionality of distracters in a multiple choice question plays a very important role. We examined the frequency and impact of functioning and non-functioning distracters on psychometric properties of 5-option items in clinical disciplines. We analyzed item statistics of 1115 multiple choice questions from 15 summative assessments of undergraduate medical students and classified the items into five groups by their number of non-functioning distracters. We analyzed the effect of varying degree of non-functionality ranging from 0 to 4, on test reliability, difficulty index, discrimination index and point biserial correlation. The non-functionality of distracters inversely affected the test reliability and quality of items in a predictable manner. The non-functioning distracters made the items easier and lowered the discrimination index significantly. Three non-functional distracters in a 5-option MCQ significantly affected all psychometric properties (p < 0.5). The corrected point biserial correlation revealed that the items with 3 functional options were psychometrically as effective as 5-option items. Our study reveals that a multiple choice question with 3 functional options provides lower most limit of item format that has adequate psychometric property. The test containing items with less number of functioning options have significantly lower reliability. The distracter function analysis and revision of nonfunctioning distracters can serve as important methods to improve the psychometrics and reliability of assessment.
Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches
ERIC Educational Resources Information Center
Kopf, Julia; Zeileis, Achim; Strobl, Carolin
2015-01-01
Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…
ERIC Educational Resources Information Center
Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.
Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182
Bravini, Elisabetta; Franchignoni, Franco; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano; Foti, Calogero
2015-01-01
To perform a comprehensive analysis of the psychometric properties and dimensionality of the Upper Limb Functional Index (ULFI) using both classical test theory and Rasch analysis (RA). Prospective, single-group observational design. Freestanding rehabilitation center. Convenience sample of Italian-speaking subjects with upper limb musculoskeletal disorders (N=174). Not applicable. The Italian version of the ULFI. Data were analyzed using parallel analysis, exploratory factor analysis, and RA for evaluating dimensionality, functioning of rating scale categories, item fit, hierarchy of item difficulties, and reliability indices. Parallel analysis revealed 2 factors explaining 32.5% and 10.7% of the response variance. RA confirmed the failure of the unidimensionality assumption, and 6 items out of the 25 misfitted the Rasch model. When the analysis was rerun excluding the misfitting items, the scale showed acceptable fit values, loading meaningfully to a single factor. Item separation reliability and person separation reliability were .98 and .89, respectively. Cronbach alpha was .92. RA revealed weakness of the scale concerning dimensionality and internal construct validity. However, a set of 19 ULFI items defined through the statistical process demonstrated a unidimensional structure, good psychometric properties, and clinical meaningfulness. These findings represent a useful starting point for further analyses of the tool (based on modern psychometric approaches and confirmatory factor analysis) in larger samples, including different patient populations and nationalities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trial.
Hahn, Elizabeth A; Bode, Rita K; Du, Hongyan; Cella, David
2006-01-01
In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning
ERIC Educational Resources Information Center
Finch, W. Holmes
2011-01-01
Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…
ERIC Educational Resources Information Center
Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan
2010-01-01
The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…
ERIC Educational Resources Information Center
Qi, Cathy Huaqing; Marley, Scott C.
2009-01-01
The study examined whether item bias is present in the "Preschool Language Scale-4" (PLS-4). Participants were 440 children (3-5 years old; 86% English-speaking Hispanic and 14% European American) who were enrolled in Head Start programs. The PLS-4 items were analyzed for differential item functioning (DIF) using logistic regression and…
Medvedev, Oleg N; Turner-Stokes, Lynne; Ashford, Stephen; Siegert, Richard J
2018-02-28
To determine whether the UK Functional Assessment Measure (UK FIM+FAM) fits the Rasch model in stroke patients with complex disability and, if so, to derive a conversion table of Rasch-transformed interval level scores. The sample included a UK multicentre cohort of 1,318 patients admitted for specialist rehabilitation following a stroke. Rasch analysis was conducted for the 30-item scale including 3 domains of items measuring physical, communication and psychosocial functions. The fit of items to the Rasch model was examined using 3 different analytical approaches referred to as "pathways". The best fit was achieved in the pathway where responses from motor, communication and psychosocial domains were summarized into 3 super-items and where some items were split because of differential item functioning (DIF) relative to left and right hemisphere location (χ2 (10) = 14.48, p = 0.15). Re-scoring of items showing disordered thresholds did not significantly improve the overall model fit. The UK FIM+FAM with domain super-items satisfies expectations of the unidimensional Rasch model without the need for re-scoring. A conversion table was produced to convert the total scale scores into interval-level data based on person estimates of the Rasch model. The clinical benefits of interval-transformed scores require further evaluation.
Higgins, Johanne; Finch, Lois E; Kopec, Jacek; Mayo, Nancy E
2010-02-01
To create and illustrate the development of a method to parsimoniously and hierarchically assess upper extremity function in persons after stroke. Data were analyzed using Rasch analysis. Re-analysis of data from 8 studies involving persons after stroke. Over 4000 patients with stroke who participated in various studies in Montreal and elsewhere in Canada. Data comprised 17 tests or indices of upper extremity function and health-related quality of life, for a total of 99 items related to upper extremity function. Tests and indices included, among others, the Box and Block Test, the Nine-Hole Peg Test and the Stroke Impact Scale. Data were collected at various times post-stroke from 3 days to 1 year. Once the data fit the model, a bank of items measuring upper extremity function with persons and items organized hierarchically by difficulty and ability in log units was produced. This bank forms the basis for eventual computer adaptive testing. The calibration of the items should be tested further psychometrically, as should the interpretation of the metric arising from using the item calibration to measure the upper extremity of individuals.
McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.
2014-01-01
Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402
McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K
2013-09-01
To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Cauffman, Elizabeth; MacIntosh, Randall
2006-01-01
The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.…
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.
Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A
2017-09-01
The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Gomez, Rapson
2012-01-01
Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D
2017-07-01
The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
An analysis of the DuPage County Regional Office of Education physics exam
NASA Astrophysics Data System (ADS)
Muehsler, Hans
In 2009, the DuPage County Regional Office of Education (ROE) tasked volunteer physics teachers with creating a basic skills physics exam reflecting what the participants valued and shared in common across curricula. Mechanics, electricity & magnetism (E&M), and wave phenomena emerged as the primary constructs. The resulting exam was intended for first-exposure physics students. The most recently completed version was psychometrically assessed for unidimensionality within the constructs using a robust WLS structural equation model and for reliability. An item analysis using a 3-PL IRT model was performed on the mechanics items and a 2-PL IRT model was performed on the E&M and waves items; a distractor analysis was also performed on all items. Lastly, differential item functioning (DIF) and differential test functioning (DTF) analyses, using the Mantel-Haenszel procedure, were performed using gender, ethnicity, year in school, ELL, physics level, and math level as groupings.
ERIC Educational Resources Information Center
Alavi, Seyed Mohammad; Bordbar, Soodeh
2017-01-01
Differential Item Functioning (DIF) analysis is a key element in evaluating educational test fairness and validity. One of the frequently cited sources of construct-irrelevant variance is gender which has an important role in the university entrance exam; therefore, it causes bias and consequently undermines test validity. The present study aims…
Development and initial evaluation of the SCI-FI/AT
Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-01-01
Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.
Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-05-01
To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.
Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A
2018-03-01
This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.
Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R.
2005-01-01
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the…
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument
Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.
2012-01-01
Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Glas, Cees A W; Vonkeman, Harald E; Taal, Erik; Krishnan, Eswar; Bernelot Moens, Hein J; Boers, Maarten; Terwee, Caroline B; van Riel, Piet L C M; van de Laar, Mart A F J
2015-12-01
To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Gonzalez-Roma, Vicente; Tomas, Ines; Ferreres, Doris; Hernandez, Ana
2005-01-01
The aims of this study were to investigate whether the 6 items of the Physical Appearance Scale (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) show differential item functioning (DIF) across gender groups of adolescents, and to show how this can be done using the multigroup mean and covariance structure (MG-MACS) analysis model. Two samples…
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.
Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro
2013-01-01
the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
2013-07-01
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Development of a Multidimensional Functional Health Scale for Older Adults in China.
Mao, Fanzhen; Han, Yaofeng; Chen, Junze; Chen, Wei; Yuan, Manqiong; Alicia Hong, Y; Fang, Ya
2016-05-01
A first step to achieve successful aging is assessing functional wellbeing of older adults. This study reports the development of a culturally appropriate brief scale (the Multidimensional Functional Health Scale for Chinese Elderly, MFHSCE) to assess the functional health of Chinese elderly. Through systematic literature review, Delphi method, cultural adaptation, synthetic statistical item selection, Cronbach's alpha and confirmatory factor analysis, we conducted development of item pool, two rounds of item selection, and psychometric evaluation. Synthetic statistical item selection and psychometric evaluation was processed among 539 and 2032 older adults, separately. The MFHSCE consists of 30 items, covering activities of daily living, social relationships, physical health, mental health, cognitive function, and economic resources. The Cronbach's alpha was 0.92, and the comparative fit index was 0.917. The MFHSCE has good internal consistency and construct validity; it is also concise and easy to use in general practice, especially in communities in China.
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2013-12-01
To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Health- and vision-related quality of life in intellectually disabled children.
Cui, Yu; Stapleton, Fiona; Suttle, Catherine; Bundy, Anita
2010-01-01
To investigate the psychometric properties of instruments for the assessment of self-reported functional vision performance and health-related quality of life in children with intellectual disabilities (IDs). Two instruments [Autoquestionnaire Enfant Image (AUQUEI), LV Prasad-Functional Vision Questionnaire (LVP-FVQ)] designed for the assessment of functional vision and health-related quality of life were adapted and administered to 168 school children with ID, aged 8 to 18 years. Rasch analysis was used to determine the appropriateness of the rating scales of these instruments and to identify any redundant items. Redundant items were excluded based on descriptive statistics and Rasch analysis, leaving 17 of 23 items in the revised AUQUEI and 16 of 22 in the LVP-FVQ. The AUQUEI items showed disordered thresholds on the rating scale. A modified step calibration (collapsed from four categories to three categories) resulted in ordered response thresholds for all items. The adjusted instrument produced an overall fit to the model (mean item infit = 1.06, SD = 0.32; mean item outfit = 1.11, SD = 0.35), indicating good construct validity. After Rasch analysis, the AUQUEI showed good content validity (person separation = 2.18; item reliability = 0.99; Cronbach alpha = 0.89). Increased similarity of person and item means and SDs on the logit scale after modification would indicate that the instrument was more applicable to the target population in its modified form. In contrast, the LVP-FVQ had a low person separation (1.35), suggesting that a more appropriate instrument is needed for assessment of vision-related quality of life in children with ID. The psychometric properties of two instruments were explored using Rasch analysis. By rescaling and reduction of items, the instruments were modified for use in a population of children with at least mild to moderate ID. However, an alternative instrument is needed for the assessment of vision-related quality of life in intellectually disabled children with normal vision or mild visual abnormalities.
Lerdal, Anners; Kottorp, Anders; Gay, Caryl; Aouizerat, Bradley E; Lee, Kathryn A; Miaskowski, Christine
2016-06-01
To accurately investigate diurnal variations in fatigue, a measure needs to be psychometrically sound and demonstrate stable item function in relationship to time of day. Rasch analysis is a modern psychometric approach that can be used to evaluate these characteristics. To evaluate, using Rasch analysis, the psychometric properties of the Lee Fatigue Scale (LFS) in a sample of oncology patients. The sample comprised 587 patients (mean age 57.3 ± 11.9 years, 80% women) undergoing chemotherapy for breast, gastrointestinal, gynecological, or lung cancer. Patients completed the 13-item LFS within 30 minutes of awakening (i.e., morning fatigue) and before going to bed (i.e., evening fatigue). Rasch analysis was used to assess validity and reliability. In initial analyses of differential item function, eight of the 13 items functioned differently depending on whether the LFS was completed in the morning or in the evening. Subsequent analyses were conducted separately for the morning and evening fatigue assessments. Nine of the morning fatigue items and 10 of the evening fatigue items demonstrated acceptable goodness-of-fit to the Rasch model. Principal components analyses indicated that both morning and evening assessments demonstrated unidimensionality. Person-separation indices indicated that both morning and evening fatigue scales were able to distinguish four distinct strata of fatigue severity. Excluding four items from the morning fatigue scale and three items from the evening fatigue scale improved the psychometric properties of the LFS for assessing diurnal variations in fatigue severity in oncology patients. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding
ERIC Educational Resources Information Center
Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.
2015-01-01
Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…
Souza, Mariana Angélica Peixoto; Coster, Wendy Jane; Mancini, Marisa Cotta; Dutra, Fabiana Caetano Martins Silva; Kramer, Jessica; Sampaio, Rosana Ferreira
2017-12-08
A person's participation is acknowledged as an important outcome of the rehabilitation process. The Participation Scale (P-Scale) is an instrument that was designed to assess the participation of individuals with a health condition or disability. The scale was developed in an effort to better describe the participation of people living in middle-income and low-income countries. The aim of this study was to use Rasch analysis to examine whether the Participation Scale is suitable to assess the perceived ability to take part in participation situations by patients with diverse levels of function. The sample was comprised by 302 patients from a public rehabilitation services network. Participants had orthopaedic or neurological health conditions, were at least 18 years old, and completed the Participation Scale. Rasch analysis was conducted using the Winsteps software. The mean age of all participants was 45.5 years (standard deviation = 14.4), 52% were male, 86% had orthopaedic conditions, and 52% had chronic symptoms. Rasch analysis was performed using a dichotomous rating scale, and only one item showed misfit. Dimensionality analysis supported the existence of only one Rasch dimension. The person separation index was 1.51, and the item separation index was 6.38. Items N2 and N14 showed Differential Item Functioning between men and women. Items N6 and N12 showed Differential Item Functioning between acute and chronic conditions. The item difficulty range was -1.78 to 2.09 logits, while the sample ability range was -2.41 to 4.61 logits. The P-Scale was found to be useful as a screening tool for participation problems reported by patients in a rehabilitation context, despite some issues that should be addressed to further improve the scale.
Latent Class Analysis of Differential Item Functioning on the Peabody Picture Vocabulary Test-III
ERIC Educational Resources Information Center
Webb, Mi-young Lee; Cohen, Allan S.; Schwanenflugel, Paula J.
2008-01-01
This study investigated the use of latent class analysis for the detection of differences in item functioning on the Peabody Picture Vocabulary Test-Third Edition (PPVT-III). A two-class solution for a latent class model appeared to be defined in part by ability because Class 1 was lower in ability than Class 2 on both the PPVT-III and the…
Chuang, I-Ching; Lin, Keh-Chung; Wu, Ching-Yi; Hsieh, Yu-Wei; Liu, Chien-Ting; Chen, Chia-Ling
2017-10-01
The Motor Activity Log (MAL) and Lower-Functioning MAL (LF-MAL) are used to assess the amount of use of the more impaired arm and the quality of movement during activities in real-life situations for patients with stroke. This study used Rasch analysis to examine the psychometric properties of the MAL and LF-MAL in patients with stroke. This is a methodological study. The MAL and LF-MAL include 2 scales: the amount of use (AOU) and the quality of movement (QOM). Rasch analysis was used to examine the unidimensionality, item difficulty hierarchy, targeting, reliability, and differential item functioning (DIF) of the MAL and LF-MAL. A total of 403 patients with mild or moderate stroke completed the MAL, and 134 patients with moderate/severe stroke finished the LF-MAL. Evidence of disordered thresholds and poor model fit were found both in the MAL and LF-MAL. After the rating categories were collapsed and misfit items were deleted, all items of the revised MAL and LF-MAL exhibited ordering and constituted unidimensional constructs. The person-item map showed that these assessments were difficult for our participants. The person reliability coefficients of these assessments ranged from .79 to .87. No items in the revised MAL and LF-MAL exhibited bias related to patients' characteristics. One limitation is the recruited patients, who have relatively high-functioning ability in the LF-MAL. The revised MAL and LF-MAL are unidimensional scales and have good reliability. The categories function well, and responses to all items in these assessments are not biased by patients' characteristics. However, the revised MAL and LF-MAL both showed floor effect. Further study might add easy items for assessing the performance of activity in real-life situations for patients with stroke. © 2017 American Physical Therapy Association
Factorial and Item-Level Invariance of a Principal Perspectives Survey: German and U.S. Principals.
Wang, Chuang; Hancock, Dawson R; Muller, Ulrich
This study examined the factorial and item-level invariance of a survey of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal with a sample of US principals and another sample of German principals. Confirmatory factor analysis (CFA) and differential item functioning (DIF) analysis were employed at the test and item level, respectively. A single group CFA was conducted first, and the model was found to fit the data collected. The factorial invariance between the German and the US principals was tested through three steps: (a) configural invariance; (b) measurement invariance; and (c) structural invariance. The results suggest that the survey is a viable measure of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal because principals from two different cultures shared a similar pattern on all three constructs. The DIF analysis further revealed that 22 out of the 28 items functioned similarly between German and US principals.
Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M
2018-03-01
Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.
Chen, Senlin; Zhu, Xihe; Kang, Minsoo
2017-05-01
A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
Examining the validity and reliability of the Taita symptom checklist using Rasch analysis.
Chen, Yun-Ling; Pan, Ay-Woan; Chung, LyInn; Chen, Tsyr-Jang
2015-03-01
The Taita symptom checklist (TSCL) is a standardized self-rating psychiatric symptom scale for outpatients with mental illness in Taiwan. This study aimed to examine the validity and reliability of the TSCL using Rasch analysis. The TSCL was given to 583 healthy people and 479 people with mental illness. Rasch analysis was used to examine the appropriateness of the rating scale, the unidimensionality of the scale, the differential item functioning across sex and diagnosis, and the Rasch cut-off score of the scale. Rasch analysis confirmed that the revised 37 items with a three-point rating scale of the TSCL demonstrated good internal consistency and met criteria for unidimensionality. The person and item reliability indices were high. The TSCL could reliably measure healthy participants and patients with mental illness. Differential item functioning due to sex or psychiatric diagnosis was evident for three items. A Rasch cut-off score for TSCL was produced for detecting participants' psychiatric symptoms based on an eight-level classification. The TSCL is a reliable and valid assessment to evaluate the participants' perceived disturbance of psychiatric symptoms based on Rasch analysis. Copyright © 2013. Published by Elsevier B.V.
A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis
ERIC Educational Resources Information Center
Cao, Mengyang; Tay, Louis; Liu, Yaowu
2017-01-01
This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…
Psychological distress in cancer survivors: the further development of an item bank.
Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P
2013-02-01
Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
Shen, Minxue; Cui, Yuanwu; Hu, Ming; Xu, Linyong
2017-01-13
The study aimed to validate a scale to assess the severity of "Yin deficiency, intestine heat" pattern of functional constipation based on the modern test theory. Pooled longitudinal data of 237 patients with "Yin deficiency, intestine heat" pattern of constipation from a prospective cohort study were used to validate the scale. Exploratory factor analysis was used to examine the common factors of items. A multidimensional item response model was used to assess the scale with the presence of multidimensionality. The Cronbach's alpha ranged from 0.79 to 0.89, and the split-half reliability ranged from 0.67 to 0.79 at different measurements. Exploratory factor analysis identified two common factors, and all items had cross factor loadings. Bidimensional model had better goodness of fit than the unidimensional model. Multidimensional item response model showed that the all items had moderate to high discrimination parameters. Parameters indicated that the first latent trait signified intestine heat, while the second trait characterized Yin deficiency. Information function showed that items demonstrated highest discrimination power among patients with moderate to high level of disease severity. Multidimensional item response theory provides a useful and rational approach in validating scales for assessing the severity of patterns in traditional Chinese medicine.
Shen, Minxue; Hu, Ming; Sun, Zhenqiu
2017-01-01
Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Wang, Zonghua; Zhou, Juan; Luo, Xingli; Xu, Yan; She, Xi; Chen, Ling; Yin, Honghua; Wang, Xianyuan
2015-01-01
The impact of strabismus on visual function, self-image, self-esteem, and social interactions decrease health-related quality of life (HRQoL).The purpose of this study was to evaluate and refine the adult strabismus quality of life questionnaire (AS-20) by using Rasch analysis among Chinese adult patients with strabismus. We evaluated the fitness of the AS-20 with Rasch model in Chinese population by assessing unidimensionality, infit and outfit, person and item separation index and reliability, response ordering, targeting and differential item functioning (DIF). The overall AS-20 did not demonstrate unidimensional; however, it was achieved separately in the two Rasch-revised subscales: the psychosocial subscale (11 items) and the function subscale (9 items). The features of good targeting, optimal item infit and outfit, and no notable local dependence were found for each of the subscales. The rating scale was appropriate for the psychosocial subscale but a reduction to four response categories was required for the function subscale. No significant DIF were revealed for any demographic and clinical factors (e.g., age, gender, and strabismus types). The AS-20 was demonstrated by Rasch analysis to be a rigorous instrument for measuring health-related quality of life in Chinese strabismus patents if some revisions were made regarding the subscale construct and response options.
Using Mixed Methods to Interpret Differential Item Functioning
ERIC Educational Resources Information Center
Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G.
2016-01-01
Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…
Hagquist, Curt; Andrich, David
2017-09-19
Rasch analysis with a focus on Differential Item Functioning (DIF) is increasingly used for examination of psychometric properties of health outcome measures. To take account of DIF in order to retain precision of measurement, split of DIF-items into separate sample specific items has become a frequently used technique. The purpose of the paper is to present and summarise recent advances of analysis of DIF in a unified methodology. In particular, the paper focuses on the use of analysis of variance (ANOVA) as a method to simultaneously detect uniform and non-uniform DIF, the need to distinguish between real and artificial DIF and the trade-off between reliability and validity. An illustrative example from health research is used to demonstrate how DIF, in this case between genders, can be identified, quantified and under specific circumstances accounted for using the Rasch model. Rasch analyses of DIF were conducted of a composite measure of psychosomatic problems using Swedish data from the Health Behaviour in School-aged Children study for grade 9 students collected during the 1985-2014 time periods. The procedures demonstrate how DIF can be identified efficiently by ANOVA of residuals, and how the magnitude of DIF can be quantified and potentially accounted for by resolving items according to identifiable groups and using principles of test equating on the resolved items. The results of the analysis also show that the real DIF in some items does affect person measurement estimates. Firstly, in order to distinguish between real and artificial DIF, the items showing DIF initially should not be resolved simultaneously but sequentially. Secondly, while resolving instead of deleting a DIF item may retain reliability, both options may affect the content validity negatively. Resolving items with DIF is not justified if the source of the DIF is relevant for the content of the variable; then resolving DIF may deteriorate the validity of the instrument. Generally, decisions on resolving items to deal with DIF should also rely on external information.
ERIC Educational Resources Information Center
Robitzsch, Alexander; Rupp, Andre A.
2009-01-01
This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
Bode, Rita K; Lai, Jin-shei; Dineen, Kelly; Heinemann, Allen W; Shevrin, Daniel; Von Roenn, Jamie; Cella, David
2006-01-01
We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT.
Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M
2016-01-01
Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Harris, K K; Price, A J; Beard, D J; Fitzpatrick, R; Jenkinson, C; Dawson, J
2014-11-01
The objective of this study was to explore dimensionality of the Oxford Hip Score (OHS) and examine whether self-reported pain and functioning can be distinguished in the form of subscales. This was a secondary data analysis of the UK NHS hospital episode statistics/patient-reported outcome measures dataset containing pre-operative OHS scores on 97 487 patients who were undergoing hip replacement surgery. The proposed number of factors to extract depended on the method of extraction employed. Velicer's Minimum Average Partial test and the Parallel Analysis suggested one factor, the Cattell's scree test and Kaiser-over-1 rule suggested two factors. Exploratory factor analysis demonstrated that the two-factor OHS had most of the items saliently loading either of the two factors. These factors were named 'Pain' and 'Function' and their respective subscales were created. There was some cross-loading of items: 8 (pain on standing up from a chair) and 11 (pain during work). These items were assigned to the 'Pain' subscale. The final 'Pain' subscale consisted of items 1, 8, 9, 10, 11 and 12. The 'Function' subscale consisted of items 2, 3, 4, 5, 6 and 7, with the recommended scoring of the subscales being from 0 (worst) to 100 (best). Cronbach's alpha was 0.855 for the 'Pain' subscale and 0.861 for the 'Function' subscale. A confirmatory factor analysis demonstrated that the two-factor model of the OHS had a better fit. However, none of the one-factor or two-factor models was rejected. Factor analyses demonstrated that, in addition to current usage as a single summary scale, separate information on pain and self-reported function can be extracted from the OHS in a meaningful way in the form of subscales. Cite this article: Bone Joint Res 2014;3:305-9. ©2014 The British Editorial Society of Bone & Joint Surgery.
Rasch measurement: the Arm Activity measure (ArmA) passive function sub-scale.
Ashford, Stephen; Siegert, Richard J; Alexandrescu, Roxana
2016-01-01
To evaluate the conformity of the Arm Activity measure (ArmA) passive function sub-scale to the Rasch model. A consecutive cohort of patients (n = 92) undergoing rehabilitation, including upper limb rehabilitation and spasticity management, at two specialist rehabilitation units were included. Rasch analysis was used to examine scaling and conformity to the model. Responses were analysed using Rasch unidimensional measurement models (RUMM 2030). The following aspects were considered: overall model and individual item fit statistics and fit residuals, internal reliability, item response threshold ordering, item bias, local dependency and unidimensionality. ArmA contains both active and passive function sub-scales, but in this analysis only the passive function sub-scale was considered. Four of the seven items in the ArmA passive function sub-scale initially had disordered thresholds. These items were rescored to four response options, which resulted in ordered thresholds for all items. Once the items with disordered thresholds had been rescored, item bias was not identified for age, global disability level or diagnosis, but with a small difference in difficulty between males and females for one item of the scale. Local dependency was not observed and the unidimensionality of the sub-scale was supported and good fit to the Rasch model was identified. The person separation index (PSI) was 0.95 indicating that the scale is able to reliably differentiate at least two groups of patients. The ArmA passive function sub-scale was shown in this evaluation to conform to the Rasch model once disordered thresholds had been addressed. Using the logit scores produced by the Rasch model it was possible to convert this back to the original scale range. Implications for Rehabilitation The ArmA passive function sub-scale was shown, in this evaluation, to conform to the Rasch model once disordered thresholds had been addressed and therefore to be a clinically applicable and potentially useful hierarchical measure. Using Rasch logit scores it has be possible to convert back to the original ordinal scale range and provide an indication of real change to enable evaluation of clinical outcome of importance to patients and clinicians.
Psychometric properties of a revised version of the Assisting Hand Assessment (Kids-AHA 5.0).
Holmefur, Marie M; Krumlinde-Sundholm, Lena
2016-06-01
The aim of this study was to scrutinize the Assisting Hand Assessment (AHA) version 4.4 for possible improvements and to evaluate the psychometric properties regarding internal scale validity and aspects of reliability of a revised version of the AHA. In collaboration with experts, scoring criteria were changed for four items, and one fully new item was constructed. Twenty-two original, one new, and four revised items were scored for 164 assessments of children with unilateral cerebral palsy aged 18 months to 12 years. Rasch measurement analysis was used to evaluate internal scale validity by exploring rating-scale functioning, item and person goodness-of-fit, and principal component analysis. Targeting and scale reliability were also evaluated. After removal of misfitting items, a 20-item scale showed satisfactory goodness-of-fit. Unidimensionality was confirmed by principal component analysis. The rating scale functioned well for the 20 items, and the item difficulty was well suited to the ability level of the sample. The person reliability coefficient was 0.98, indicating high separation ability of the scale. A conversion table of AHA scores between the previous version (4.4) and the new version (5.0) was constructed. The new, 20-item version of the Kids-AHA (version 5.0), demonstrated excellent internal scale validity, suggesting improved responsiveness to changes and shortened scoring time. For comparison of scores from version 4.4 to 5.0, a transformation table is presented. © 2015 Mac Keith Press.
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment
ERIC Educational Resources Information Center
Alsadaawi, Abdullah Saleh
2017-01-01
The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea
2016-12-01
To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Rasch model based analysis of the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana
2010-06-01
The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear measures for persons and items from raw test scores and which can provide important insight in the structure and functioning of the test (how item difficulties are distributed within the test, how well the items fit the model, and how well the items work together to define the underlying construct). The data for the Rasch analysis come from the large-scale research conducted in 2006-07, which investigated Croatian high school students’ conceptual understanding of mechanics on a representative sample of 1676 students (age 17-18 years). The instrument used in research was the FCI. The average FCI score for the whole sample was found to be (27.7±0.4)% , indicating that most of the students were still non-Newtonians at the end of high school, despite the fact that physics is a compulsory subject in Croatian schools. The large set of obtained data was analyzed with the Rasch measurement computer software WINSTEPS 3.66. Since the FCI is routinely used as pretest and post-test on two very different types of population (non-Newtonian and predominantly Newtonian), an additional predominantly Newtonian sample ( N=141 , average FCI score of 64.5%) of first year students enrolled in introductory physics course at University of Zagreb was also analyzed. The Rasch model based analysis suggests that the FCI has succeeded in defining a sufficiently unidimensional construct for each population. The analysis of fit of data to the model found no grossly misfitting items which would degrade measurement. Some items with larger misfit and items with significantly different difficulties in the two samples of students do require further examination. The analysis revealed some problems with item distribution in the FCI and suggested that the FCI may function differently in non-Newtonian and predominantly Newtonian population. Some possible improvements of the test are suggested.
Examination of a Social-Networking Site Activities Scale (SNSAS) Using Rasch Analysis
ERIC Educational Resources Information Center
Alhaythami, Hassan; Karpinski, Aryn; Kirschner, Paul; Bolden, Edward
2017-01-01
This study examined the psychometric properties of a social-networking site (SNS) activities scale (SNSAS) using Rasch Analysis. Items were also examined with Rasch Principal Components Analysis (PCA) and Differential Item Functioning (DIF) across groups of university students (i.e., males and females from the United States [US] and Europe; N =…
Generalized Full-Information Item Bifactor Analysis
Cai, Li; Yang, Ji Seung; Hansen, Mark
2011-01-01
Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682
Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian
2017-03-04
painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
A Rasch Analysis of the Junior Metacognitive Awareness Inventory with Singapore Students
ERIC Educational Resources Information Center
Ning, Hoi Kwan
2018-01-01
The psychometric properties of the 2 versions of the Junior Metacognitive Awareness Inventory were examined with Singapore student samples. Other than 2 misfitting items and an underutilized response scale, Rasch analysis demonstrated that the instruments have good measurement precision, and no differential item functioning was detected across…
Bai, Mei; Dixon, Jane K
2014-01-01
The purpose of this study was to reexamine the factor pattern of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale (FACIT-Sp-12) using exploratory factor analysis in people newly diagnosed with advanced cancer. Principal components analysis (PCA) and 3 common factor analysis methods were used to explore the factor pattern of the FACIT-Sp-12. Factorial validity was assessed in association with quality of life (QOL). Principal factor analysis (PFA), iterative PFA, and maximum likelihood suggested retrieving 3 factors: Peace, Meaning, and Faith. Both Peace and Meaning positively related to QOL, whereas only Peace uniquely contributed to QOL. This study supported the 3-factor model of the FACIT-Sp-12. Suggestions for revision of items and further validation of the identified factor pattern were provided.
Development and validation of a measure of pediatric oral health-related quality of life: the POQL
Huntington, Noelle L; Spetter, Dante; Jones, Judith A.; Rich, Sharon E.; Garcia, Raul I.; Spiro, Avron
2011-01-01
Objective To develop a brief measure of oral health-related quality of life in children and demonstrate its reliability and validity in a diverse population. Methods We administered the initial 20-item POQL to children (Child Self-Report) and parents (Parent Report on Child) from diverse populations in both school-based and clinic-based settings. Clinical oral health status was measured on a subset of children. We used factor analysis to determine the underlying scales and then reduced the measure to 10 items based on several considerations. Multitrait analysis on the resulting 10-item POQL was used to reaffirm the discrimination of scales and assess the measure’s internal consistency and interscale correlations. We established discriminant and convergent validity with clinical status, perceived oral health and responses on the PedsQL and determined sensitivity to change with children undergoing ECC surgical repair. Results Factor analysis returned a four-scale solution for the initial items – Physical Functioning, Role Functioning, Social Functioning and Emotional Functioning. The reduced items represented the same four scales – two each on Physical and Role and three each on Social and Emotional. Good reliability and validity were shown for the POQL as a whole and for each of the scales. Conclusions The POQL is a valid and reliable measure of oral health-related quality of life for use in pre-school and school-aged children, with high utility for both clinical assessments and large-scale population studies. PMID:21972458
Development and validation of a measure of pediatric oral health-related quality of life: the POQL.
Huntington, Noelle L; Spetter, Dante; Jones, Judith A; Rich, Sharron E; Garcia, Raul I; Spiro, Avron
2011-01-01
To develop a brief measure of oral health-related quality of life (OHQL) in children and demonstrate its reliability and validity in a diverse population. We administered the initial 20-item Pediatric Oral Health-Related Quality of Life (POQL) to children (Child Self-Report) and parents (Parent Report on Child) from diverse populations in both school-based and clinic-based settings. Clinical oral health status was measured on a subset of children. We used factor analysis to determine the underlying scales and then reduced the measure to 10 items based on several considerations. Multitrait analysis on the resulting 10-item POQL was used to reaffirm the discrimination of scales and assess the measure's internal consistency and interscale correlations. We established discriminant and convergent validity with clinical status, perceived oral health and responses on the PedsQL, and determined sensitivity to change with children undergoing ECC surgical repair. Factor analysis returned a four-scale solution for the initial items--Physical Functioning, Role Functioning, Social Functioning, and Emotional Functioning. The reduced items represented the same four scales--two each on Physical and Role and three each on Social and Emotional. Good reliability and validity were shown for the POQL as a whole and for each of the scales. The POQL is a valid and reliable measure of OHQL for use in preschool and school-aged children, with high utility for both clinical assessments and large-scale population studies.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications
ERIC Educational Resources Information Center
Chiu, Chia-Yi; Douglas, Jeffrey A.; Li, Xiaodong
2009-01-01
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and…
Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A
2013-12-01
This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.
Is Going Beyond Rasch Analysis Necessary to Assess the Construct Validity of a Motor Function Scale?
Guillot, Tiffanie; Roche, Sylvain; Rippert, Pascal; Hamroun, Dalil; Iwaz, Jean; Ecochard, René; Vuillerot, Carole
2018-04-03
To examine whether a Rasch analysis is sufficient to establish the construct validity of the Motor Function Measure (MFM) and discuss whether weighting the MFM item scores would improve the MFM construct validity. Observational cross-sectional multicenter study. Twenty-three physical medicine departments, neurology departments, or reference centers for neuromuscular diseases. Patients (N=911) aged 6 to 60 years with Charcot-Marie-Tooth disease (CMT), facioscapulohumeral dystrophy (FSHD), or myotonic dystrophy type 1 (DM1). None. Comparison of the goodness-of-fit of the confirmatory factor analysis (CFA) model vs that of a modified multidimensional Rasch model on MFM item scores in each considered disease. The CFA model showed good fit to the data and significantly better goodness of fit than the modified multidimensional Rasch model regardless of the disease (P<.001). Statistically significant differences in item standardized factor loadings were found between DM1, CMT, and FSHD in only 6 of 32 items (items 6, 27, 2, 7, 9 and 17). For multidimensional scales designed to measure patient abilities in various diseases, a Rasch analysis might not be the most convenient, whereas a CFA is able to establish the scale construct validity and provide weights to adapt the item scores to a specific disease. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.
2008-01-01
Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Romero, Dulce; Ricarte, Jorge J.; Serrano, Juan P.; Nieto, Marta; Latorre, Jose M.
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used. PMID:29672583
Ros, Laura; Romero, Dulce; Ricarte, Jorge J; Serrano, Juan P; Nieto, Marta; Latorre, Jose M
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used.
Effect of Purification Procedures on DIF Analysis in IRTPRO
ERIC Educational Resources Information Center
Fikis, David R. J.; Oshima, T. C.
2017-01-01
Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item…
Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M
2016-01-01
We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
Differential item functioning by sex and race in the Hogan Personality Inventory.
Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M; Dai, Guangdong; King, Daniel W
2006-12-01
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.
Terluin, Berend; Brouwers, Evelien P M; Marchand, Miquelle A G; de Vet, Henrica C W
2018-05-01
Many paper-and-pencil (P&P) questionnaires have been migrated to electronic platforms. Differential item and test functioning (DIF and DTF) analysis constitutes a superior research design to assess measurement equivalence across modes of administration. The purpose of this study was to demonstrate an item response theory (IRT)-based DIF and DTF analysis to assess the measurement equivalence of a Web-based version and the original P&P format of the Four-Dimensional Symptom Questionnaire (4DSQ), measuring distress, depression, anxiety, and somatization. The P&P group (n = 2031) and the Web group (n = 958) consisted of primary care psychology clients. Unidimensionality and local independence of the 4DSQ scales were examined using IRT and Yen's Q3. Bifactor modeling was used to assess the scales' essential unidimensionality. Measurement equivalence was assessed using IRT-based DIF analysis using a 3-stage approach: linking on the latent mean and variance, selection of anchor items, and DIF testing using the Wald test. DTF was evaluated by comparing expected scale scores as a function of the latent trait. The 4DSQ scales proved to be essentially unidimensional in both modalities. Five items, belonging to the distress and somatization scales, displayed small amounts of DIF. DTF analysis revealed that the impact of DIF on the scale level was negligible. IRT-based DIF and DTF analysis is demonstrated as a way to assess the equivalence of Web-based and P&P questionnaire modalities. Data obtained with the Web-based 4DSQ are equivalent to data obtained with the P&P version.
Hart, Dennis L; Werneke, Mark W; George, Steven Z; Matheson, James W; Wang, Ying-Chih; Cook, Karon F; Mioduski, Jerome E; Choi, Seung W
2009-08-01
Screening people for elevated levels of fear-avoidance beliefs is uncommon, but elevated levels of fear could worsen outcomes. Developing short screening tools might reduce the data collection burden and facilitate screening, which could prompt further testing or management strategy modifications to improve outcomes. The purpose of this study was to develop efficient yet accurate screening methods for identifying elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient rehabilitation. A secondary analysis of data collected prospectively from people with a variety of common neuromusculoskeletal diagnoses was conducted. Intake Fear-Avoidance Beliefs Questionnaire (FABQ) data were collected from 17,804 people who had common neuromusculoskeletal conditions and were receiving outpatient rehabilitation in 121 clinics in 26 states (in the United States). Item response theory (IRT) methods were used to analyze the FABQ data, with particular emphasis on differential item functioning among clinically logical groups of subjects, and to identify screening items. The accuracy of screening items for identifying subjects with elevated levels of fear was assessed with receiver operating characteristic analyses. Three items for fear of physical activities and 10 items for fear of work activities represented unidimensional scales with adequate IRT model fit. Differential item functioning was negligible for variables known to affect functional status outcomes: sex, age, symptom acuity, surgical history, pain intensity, condition severity, and impairment. Items that provided maximum information at the median for the FABQ scales were selected as screening items to dichotomize subjects by high versus low levels of fear. The accuracy of the screening items was supported for both scales. This study represents a retrospective analysis, which should be replicated using prospective designs. Future prospective studies should assess the reliability and validity of using one FABQ item to screen people for high levels of fear-avoidance beliefs. The lack of differential item functioning in the FABQ scales in the sample tested in this study suggested that FABQ screening could be useful in routine clinical practice and allowed the development of single-item screening for fear-avoidance beliefs that accurately identified subjects with elevated levels of fear. Because screening was accurate and efficient, single IRT-based FABQ screening items are recommended to facilitate improved evaluation and care of heterogeneous populations of people receiving outpatient rehabilitation.
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
DIF Detection Using Multiple-Group Categorical CFA with Minimum Free Baseline Approach
ERIC Educational Resources Information Center
Chang, Yu-Wei; Huang, Wei-Kang; Tsai, Rung-Ching
2015-01-01
The aim of this study is to assess the efficiency of using the multiple-group categorical confirmatory factor analysis (MCCFA) and the robust chi-square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all…
Item Analysis and Differential Item Functioning of a Brief Conduct Problem Screen
ERIC Educational Resources Information Center
Wu, Johnny; King, Kevin M.; Witkiewitz, Katie; Racz, Sarah Jensen; McMahon, Robert J.
2012-01-01
Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item…
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.
Gibbons, C J; Skevington, S M
2018-04-01
Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Nielsen, Marie Germund; Ørnbøl, Eva; Vestergaard, Mogens; Bech, Per; Christensen, Kaj Sparle
2017-06-01
We aimed to assess the measurement properties of the ten-item Major Depression Inventory when used on clinical suspicion in general practice by performing a Rasch analysis. General practitioners asked consecutive persons to respond to the web-based Major Depression Inventory on clinical suspicion of depression. We included 22 practices and 245 persons. Rasch analysis was performed using RUMM2030 software. The Rasch model fit suggests that all items contribute to a single underlying trait (defined as internal construct validity). Mokken analysis was used to test dimensionality and scalability. Our Rasch analysis showed misfit concerning the sleep and appetite items (items 9 and 10). The response categories were disordered for eight items. After modifying the original six-point to a four-point scoring system for all items, we achieved ordered response categories for all ten items. The person separation reliability was acceptable (0.82) for the initial model. Dimensionality testing did not support combining the ten items to create a total score. The scale appeared to be well targeted to this clinical sample. No significant differential item functioning was observed for gender, age, work status and education. The Rasch and Mokken analyses revealed two dimensions, but the Major Depression Inventory showed fit to one scale if items 9 and 10 were excluded. Our study indicated scalability problems in the current version of the Major Depression Inventory. The conducted analysis revealed better statistical fit when items 9 and 10 were excluded. Copyright © 2017 Elsevier Inc. All rights reserved.
Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica
2018-02-01
The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
ERIC Educational Resources Information Center
Ögretmen, Tuncay
2015-01-01
The purpose of this study is to carry out differential item functioning (DIF) analysis for content areas of a reading comprehension subtest using four area indices within Item Response Theory (IRT) framework. The differences in the magnitudes of the area indices were compared based on the subject areas. The DIF analysis was carried out across…
Assessing psychological well-being: self-report instruments for the NIH Toolbox.
Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David
2014-02-01
Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
2011-01-01
We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Mulcahey, M J; Merenda, Lisa; Tian, Feng; Kozin, Scott; James, Michelle; Gogola, Gloria; Ni, Pengsheng
2013-01-01
This study examined the psychometric properties of item pools relevant to upper-extremity function and activity performance and evaluated simulated 5-, 10-, and 15-item computer adaptive tests (CATs). In a multicenter, cross-sectional study of 200 children and youth with brachial plexus birth palsy (BPBP), parents responded to upper-extremity (n = 52) and activity (n = 34) items using a 5-point response scale. We used confirmatory and exploratory factor analysis, ordinal logistic regression, item maps, and standard errors to evaluate the psychometric properties of the item banks. Validity was evaluated using analysis of variance and Pearson correlation coefficients. Results show that the two item pools have acceptable model fit, scaled well for children and youth with BPBP, and had good validity, content range, and precision. Simulated CATs performed comparably to the full item banks, suggesting that a reduced number of items provide similar information to the entire set of items. Copyright © 2013 by the American Occupational Therapy Association, Inc.
Using a Mixture IRT Model to Understand English Learner Performance on Large-Scale Assessments
ERIC Educational Resources Information Center
Shea, Christine A.
2013-01-01
The purpose of this study was to determine whether an eighth grade state-level math assessment contained items that function differentially (DIF) for English Learner students (EL) as compared to English Only students (EO) and if so, what factors might have caused DIF. To determine this, Differential Item Functioning (DIF) analysis was employed.…
Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R
2015-11-05
We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
The Development of a Nystagmus-Specific Quality-of-Life Questionnaire.
McLean, Rebecca J; Maconachie, Gail D E; Gottlob, Irene; Maltby, John
2016-09-01
To develop a nystagmus-specific quality-of-life (QOL) questionnaire derived from patient concerns based on eudaimonic aspects of well-being. Cross-sectional study. A total of 206 participants with nystagmus for factor analysis phase and an additional 42 participants with nystagmus for construct validity phase. Questionnaire items were written on the basis of the 6 domains of everyday living affected by nystagmus that were elicited by previous semistructured interviews conducted with 21 people with nystagmus. After consultation with 8 nystagmus experts, 37 items were administered to 206 people with nystagmus. Factor analysis was used to identify latent factors among the items and identify items to propose new nystagmus QOL scales. Cronbach's alpha was used to assess the internal reliability of the new scales. To assess for discriminate and concurrent validity between the new nystagmus scales and an existing vision-related QOL tool, the Visual Function Questionnaire-25 (VFQ-25) was administered to 42 additional participants. Questionnaire response scores on nystagmus-specific QOL items. The factor analysis revealed the retention of 29 items to form a measure comprising 2 distinct subscales reflecting "personal and social" and "physical and environmental" functioning as relating to nystagmus-specific QOL. The Cronbach's alpha coefficients for the "personal and social" functioning scale and "physical and environmental" functioning were 0.95 and 0.93, respectively. Tests for validity of the measure, consistent with a priori predictions, when compared with the VFQ-25, revealed the "physical and environmental" subscale showed concurrent validity (0.88), whereas the "personal and social" subscale was demonstrated to have discriminative validity (0.81). We have developed a 29-item, nystagmus-specific QOL questionnaire (NYS-29) based on eudaimonic aspects of well-being with subscales that address not only physical functioning but also psycho-social issues. The NYS-29 is grounded in the perspectives and concerns of those who have nystagmus and can be used to determine the impact of nystagmus on daily living in terms of both physical and psychosocial aspects. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Balboni, Giulia; Tasso, Alessandra; Muratori, Filippo; Cubelli, Roberto
2016-01-01
We investigated which item subsets of the Vineland-II can discriminate low-functioning preschoolers with ASD from matched peers with other neurodevelopmental disorders, using a regression analysis derived from a normative sample to account for cognitive and linguistic competencies. At variance with the typical profile, a pattern with Communication…
Methodology for the development and calibration of the SCI-QOL item banks
Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David
2015-01-01
Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.
Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David
2015-05-01
To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
The second version of the L. V. Prasad-functional vision questionnaire.
Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K
2012-11-01
The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.
Detecting Gender Bias Through Test Item Analysis
NASA Astrophysics Data System (ADS)
González-Espada, Wilson J.
2009-03-01
Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Pasternak, Amy; Sideridis, Georgios; Fragala-Pinkham, Maria; Glanzman, Allan M; Montes, Jacqueline; Dunaway, Sally; Salazar, Rachel; Quigley, Janet; Pandya, Shree; O'Riley, Susan; Greenwood, Jonathan; Chiriboga, Claudia; Finkel, Richard; Tennekoon, Gihan; Martens, William B; McDermott, Michael P; Fournier, Heather Szelag; Madabusi, Lavanya; Harrington, Timothy; Cruz, Rosangel E; LaMarca, Nicole M; Videon, Nancy M; Vivo, Darryl C De; Darras, Basil T
2016-12-01
In this study we evaluated the suitability of a caregiver-reported functional measure, the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test (PEDI-CAT), for children and young adults with spinal muscular atrophy (SMA). PEDI-CAT Mobility and Daily Activities domain item banks were administered to 58 caregivers of children and young adults with SMA. Rasch analysis was used to evaluate test properties across SMA types. Unidimensional content for each domain was confirmed. The PEDI-CAT was most informative for type III SMA, with ability levels distributed close to 0.0 logits in both domains. It was less informative for types I and II SMA, especially for mobility skills. Item and person abilities were not distributed evenly across all types. The PEDI-CAT may be used to measure functional performance in SMA, but additional items are needed to identify small changes in function and best represent the abilities of all types of SMA. Muscle Nerve 54: 1097-1107, 2016. © 2016 Wiley Periodicals, Inc.
Garcia-Barrera, Mauricio A; Karr, Justin E; Duran, Victor; Direnfeld, Esther; Pineda, David A
2015-12-01
Garcia-Barrera, Kamphaus, and Bandalos (2011) derived a 25-item executive functioning screener from the Behavior Assessment System for Children (BASC), measuring 4 latent executive constructs: problem solving, attentional control, behavioral control, and emotional control. The current study included a cross-cultural examination of this screener in Colombian children with and without attention-deficit/hyperactivity disorder (ADHD). BASC teacher ratings were collected for Colombian children ages 6-11 years (848 healthy children [53% boys] and 155 children with ADHD [76% boys]). To examine the psychometric properties of the screener, a multistep procedure was implemented, including (a) confirmatory factor analysis (CFA) and factorial invariance testing across gender, age group (6-8 years, 9-11 years), and ADHD status to replicate and extend the original derivation; (b) item response theory (IRT) analysis to evaluate the information provided by individual items; and (c) given IRT results, a repeated CFA and invariance testing after the exclusion of 1 item from the problem-solving factor. The 24-item 4-factor model fit was adequate for controls and for ADHD participants. Results support the use of the 24-item executive functioning screener in a cross-cultural context. In turn, in supplemental material, normative data for the Colombian sample are reported along with bilingual guidelines (i.e., Spanish/English) for implementing the screener in clinical practice. Even though the screener is useful when examining executive functions, it was not designed as a diagnostic measure for developmental disorders such as ADHD; as such, it should only inform about status of executive functioning. (c) 2015 APA, all rights reserved).
Deutscher, Daniel; Hart, Dennis L; Crane, Paul K; Dickstein, Ruth
2010-12-01
Comparative effectiveness research across cultures requires unbiased measures that accurately detect clinical differences between patient groups. The purpose of this study was to assess the presence and impact of differential item functioning (DIF) in knee functional status (FS) items administered using computerized adaptive testing (CAT) as a possible cause for observed differences in outcomes between 2 cultural patient groups in a polyglot society. This study was a secondary analysis of prospectively collected data. We evaluated data from 9,134 patients with knee impairments from outpatient physical therapy clinics in Israel. Items were analyzed for DIF related to sex, age, symptom acuity, surgical history, exercise history, and language used to complete the functional survey (Hebrew versus Russian). Several items exhibited DIF, but unadjusted FS estimates and FS estimates that accounted for DIF were essentially equal (intraclass correlation coefficient [2,1]>.999). No individual patient had a difference between unadjusted and adjusted FS estimates as large as the median standard error of the unadjusted estimates. Differences between groups defined by any of the covariates considered were essentially unchanged when using adjusted instead of unadjusted FS estimates. The greatest group-level impact was <0.3% of 1 standard deviation of the unadjusted FS estimates. Complete data where patients answered all items in the scale would have been preferred for DIF analysis, but only CAT data were available. Differences in FS outcomes between groups of patients with knee impairments who answered the knee CAT in Hebrew or Russian in Israel most likely reflected true differences that may reflect societal disparities in this health outcome.
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
2014-01-01
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
ERIC Educational Resources Information Center
Finch, Holmes
2011-01-01
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
ERIC Educational Resources Information Center
Siefert, Caleb J.; Sinclair, Samuel J.; Kehl-Fie, Kendra A.; Blais, Mark A.
2009-01-01
Multi-item multiscale self-report measures are increasingly used in inpatient assessments. When considering a measure for this setting, it is important to evaluate the psychometric properties of the clinical scales and items to ensure that they are functioning as intended in a highly distressed clinical population. The present study examines scale…
ERIC Educational Resources Information Center
Deville, Craig W.; Chalhoub-Deville, Micheline
A study demonstrated the utility of item analyses to investigate which items function well or poorly in a second language reading recall protocol instrument. Data were drawn from a larger study of 56 learners of German as a second language at various proficiency levels. Pausal units of scored recall protocols were analyzed using both classical…
Tang, Jennifer Yee-Man; Ho, Andy Hau-Yan; Luo, Hao; Wong, Gloria Hoi-Yan; Lau, Bobo Hi-Po; Lum, Terry Yat-Sang; Cheung, Karen Siu-Lan
2016-09-01
The present study aimed to develop and validate a Cantonese short version of the Zarit Burden Interview (CZBI-Short) for Hong Kong Chinese dementia caregivers. The 12-item Zarit Burden Interview (ZBI) was translated into spoken Cantonese and back-translated by two bilingual research assistants and face validated by a panel of experts. Five hundred Chinese dementia caregivers showing signs of stress reported their burden using the translated ZBI and rated their depressive symptoms, overall health, and care recipients' physical functioning and behavioral problems. The factor structure of the translated scale was identified using principal component analysis and confirmatory factor analysis; internal consistency and item-total correlations were assessed; and concurrent validity was tested by correlating the ZBI with depressive symptoms, self-rated health, and care recipients' physical functioning and behavioral problems. The principal component analysis resulted in 11 items loading on a three-factor model comprised role strain, self-criticism, and negative emotion, which accounted for 59% of the variance. The confirmatory factor analysis supported the three-factor model (CZBI-Short) that explained 61% of the total variance. Cronbach's alpha (0.84) and item-total correlations (rho = 0.39-0.71) indicated CZBI-Short had good reliability. CZBI-Short showed correlations with depressive symptoms (r = 0.50), self-rated health (r = -0.26) and care recipients' physical functioning (r = 0.18-0.26) and disruptive behaviors (r = 0.36). The 12-item CZBI-Short is a concise, reliable, and valid instrument to assess burden in Chinese dementia caregivers in clinical and social care settings.
Honda, Yukiko; Meguro, Kenichi; Meguro, Mitsue; Akanuma, Kyoko
2013-01-01
Patients with vascular dementia (VaD) are often isolated, withdrawn from society because of negative symptoms and functional disabilities. The aim of this study was to detect factors associated with social withdrawal in patients with VaD. The participants were 36 institutionalized patients with VaD. Social withdrawal was assessed with the social withdrawal of the Multidimensional Observation Scale for Elderly Subjects (MOSES). Possible explanatory variables were the MOSES items depression and self-care, Cognitive Abilities Screening Instrument (CASI), apathy evaluation scale (AES), and Behavioral Pathology in Alzheimer's Disease Frequency-Weighted Severity Scale (BEHAVE-AD-FW). Multiple regression analyses were conducted for two groups: Analysis 1 was performed in all patients (N = 36) and Analysis 2 was performed in the patients with the ability to move by themselves (i.e., independent walking or independent movement with a cane or a wheelchair; n = 28). In Analysis 1, MOSES item social withdrawal was correlated with AES and MOSES item self-care. In Analysis 2, MOSES item social withdrawal was correlated with AES and CASI domain abstraction and judgment. Decreased social activities of VaD were not related to general cognitive function or depression. Disturbed activities of daily living (ADLs) for self-care may involve decreased frontal lobe function, indicating that comprehensive rehabilitation for both ADL and dementia are needed to improve the social activities of patients with VaD.
Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina
2017-01-01
As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.
Lebedeva, Elena; Huang, Mei; Koski, Lisa
2016-03-01
The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).
Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul
2018-03-01
Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
2015-06-01
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P
2015-09-23
The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
Marfeo, Elizabeth E.; Ni, Pengsheng; Haley, Stephen M.; Jette, Alan M.; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Brandt, Diane E.; Rasch, Elizabeth K.
2014-01-01
Objectives To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Design Cross-sectional. Setting Community. Participants Item pools of behavioral health functioning were developed, refined, and field-tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working due to mental or both mental and physical conditions. Interventions None. Main Outcome Measure Social Security Administration Behavioral Health (SSA-BH) measurement instrument Results Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, and social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the four scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these four distinct scales of function. Conclusion This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. PMID:23548542
Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Jette, Alan M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K
2013-09-01
To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Cross-sectional. Community. Item pools of behavioral health functioning were developed, refined, and field tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working because of mental or both mental and physical conditions. None. Social Security Administration Behavioral Health (SSA-BH) measurement instrument. Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the 4 scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these 4 distinct scales of function. This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work-related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work-related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Comins, J D; Krogsgaard, M R; Kreiner, S; Brodersen, J
2013-10-01
The benefit of anterior cruciate ligament (ACL) reconstruction has been questioned based on patient-reported outcome measures (PROMs). Valid interpretation of such results requires confirmation of the psychometric properties of the PROM. Rasch analysis is the gold standard for validation of PROMs, yet PROMs used for ACL reconstruction have not been validated using Rasch analysis. We used Rasch analysis to investigate the psychometric properties of the Knee Numeric-Entity Evaluation Score (KNEES-ACL), a newly developed PROM for patients treated for ACL deficiency. Two-hundred forty-two patients pre- and post-ACL reconstruction completed the pilot PROM. Rasch models were used to assess the psychometric properties (e.g., unidimensionality, local response dependency, and differential item functioning). Forty-one items distributed across seven unidimensional constructs measuring impairment, functional limitations, and psychosocial consequences were confirmed to fit Rasch models. Fourteen items were removed because of statistical lack of fit and inadequate face validity. Local response dependency and differential item functioning were identified and adjusted. The KNEES-ACL is the first Rasch-validated condition-specific PROM constructed for patients with ACL deficiency and patients with ACL reconstruction. Thus, this instrument can be used for within- and between-group comparisons. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya
2009-06-01
The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U
2010-07-01
This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.
Phyland, Debra J; Pallant, Julie F; Benninger, Michael S; Thibeault, Susan L; Greenwood, Ken M; Smith, Julian A; Vallance, Neil
2013-07-01
Most voice self-rating tools are disease-specific measures and are not suitable for use with healthy voice users. There is a need for a tool that is sensitive to the subtleties of a singer's voice and to perceived physical changes in the singing voice mechanism as a function of load. The aim of this study was to devise and validate a scale to assess singer's perceptions of the current status of their singing voice. Ninety-five vocal health descriptors were collected from focus group interviews of singers. These were reviewed by 25 currently performing music theater (MT) singers. Based on a consensus technique, the number of descriptors was decreased to 42 items. These were administered to a sample of 284 professional MT singers using an online survey to evaluate their perception of current singing voice status. Principal component analysis identified two subsets of items. Rasch analysis was used to evaluate and refine these sets of items to form two 10-item subscales. Both subscales demonstrated good overall fit to the Rasch model, no differential item functioning by sex or age, and good internal consistency reliability. The two subscales were strongly correlated and subsequent Rasch analysis supported their combination to form a single 20-item scale with good psychometric properties. The Evaluation of the Ability to Sing Easily (EASE) is a concise clinical tool to assess singer's perceptions of the current status of their singing voice with good measurement properties. EASE may prove a useful tool to measure changes in the singing voice as indicators of the effect of vocal load. Furthermore, it may offer a valuable means for the prediction or screening of singers "at risk" of developing voice disorders. Copyright © 2013 The Voice Foundation. All rights reserved.
Claesson, Margareta; Armitage, W John; Byström, Berit; Montan, Per; Samolov, Branka; Stenvi, Ulf; Lundström, Mats
2017-09-01
Catquest-9SF is a 9-item visual disability questionnaire developed for evaluating patient-reported outcome measures after cataract surgery. The aim of this study was to use Rasch analysis to determine the responsiveness of Catquest-9SF for corneal transplant patients. Patients who underwent corneal transplantation primarily to improve vision were included. One group (n = 199) completed the Catquest-9SF questionnaire before corneal transplantation and a second independent group (n = 199) completed the questionnaire 2 years after surgery. All patients were recorded in the Swedish Cornea Registry, which provided clinical and demographic data for the study. Winsteps software v.3.91.0 (Winsteps.com, Beaverton, OR) was used to assess the fit of the Catquest-9SF data to the Rasch model. Rasch analysis showed that Catquest-9SF applied to corneal transplant patients was unidimensional (infit range, 0.73-1.32; outfit range, 0.81-1.35), and therefore, measured a single underlying construct (visual disability). The Rasch model explained 68.5% of raw variance. The response categories of the 9-item questionnaire were ordered, and the category thresholds were well defined. Item difficulty matched the level of patients' ability (0.36 logit difference between the means). Precision in terms of person separation (3.09) and person reliability (0.91) was good. Differential item functioning was notable for only 1 item (satisfaction with vision), which had a differential item functioning contrast of 1.08 logit. Rasch analysis showed that Catquest-9SF is a valid instrument for measuring visual disability in patients who have undergone corneal transplantation primarily to improve vision.
Independent Orbiter Assessment (IOA): Analysis of the auxiliary power unit
NASA Technical Reports Server (NTRS)
Barnes, J. E.
1986-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Auxiliary Power Unit (APU). The APUs are required to provide power to the Orbiter hydraulics systems during ascent and entry flight phases for aerosurface actuation, main engine gimballing, landing gear extension, and other vital functions. For analysis purposes, the APU system was broken down into ten functional subsystems. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. A preponderance of 1/1 criticality items were related to failures that allowed the hydrazine fuel to escape into the Orbiter aft compartment, creating a severe fire hazard, and failures that caused loss of the gas generator injector cooling system.
Efficient Algorithms for Segmentation of Item-Set Time Series
NASA Astrophysics Data System (ADS)
Chundi, Parvathi; Rosenkrantz, Daniel J.
We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time series segmentation techniques to analyze the temporal content of three different data sets—Enron email, stock market data, and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining the item set data at the time points, and can be used to analyze different types of temporal data.
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
2014-12-01
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
Kostuj, Tanja; Stief, Felix; Hartmann, Kirsten Anna; Schaper, Katharina; Arabmotlagh, Mohammad; Baums, Mike H; Meurer, Andrea; Krummenauer, Frank; Lieske, Sebastian
2018-01-01
Objective After cross-cultural adaption for the German translation of the Ankle-Hindfoot Scale of the American Orthopaedic Foot and Ankle Society (AOFAS-AHS) and agreement analysis with the Foot Function Index (FFI-D), the following gait analysis study using the Oxford Foot Model (OFM) was carried out to show which of the two scores better correlates with objective gait dysfunction. Design and participants Results of the AOFAS-AHS and FFI-D, as well as data from three-dimensional gait analysis were collected from 20 patients with mild to severe ankle and hindfoot pathologies. Kinematic and kinetic gait data were correlated with the results of the total AOFAS scale and FFI-D as well as the results of those items representing hindfoot function in the AOFAS-AHS assessment. With respect to the foot disorders in our patients (osteoarthritis and prearthritic conditions), we correlated the total range of motion (ROM) in the ankle and subtalar joints as identified by the OFM with values identified during clinical examination ‘translated’ into score values. Furthermore, reduced walking speed, reduced step length and reduced maximum ankle power generation during push-off were taken into account and correlated to gait abnormalities described in the scores. An analysis of correlations with CIs between the FFI-D and the AOFAS-AHS items and the gait parameters was performed by means of the Jonckheere-Terpstra test; furthermore, exploratory factor analysis was applied to identify common information structures and thereby redundancy in the FFI-D and the AOFAS-AHS items. Results Objective findings for hindfoot disorders, namely a reduced ROM, in the ankle and subtalar joints, respectively, as well as reduced ankle power generation during push-off, showed a better correlation with the AOFAS-AHS total score—as well as AOFAS-AHS items representing ROM in the ankle, subtalar joints and gait function—compared with the FFI-D score. Factor analysis, however, could not identify FFI-D items consistently related to these three indicator parameters (pain, disability and function) found in the AOFAS-AHS. Furthermore, factor analysis did not support stratification of the FFI-D into two subscales. Conclusions The AOFAS-AHS showed a good agreement with objective gait parameters and is therefore better suited to evaluate disability and functional limitations of patients suffering from foot and ankle pathologies compared with the FFI-D. PMID:29626046
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
2017-06-01
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
ERIC Educational Resources Information Center
Benítez, Isabel; Padilla, José-Luis
2014-01-01
Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase…
ERIC Educational Resources Information Center
Abd-El-Fattah, Sabry M.; AL-Sinani, Yousra; El Shourbagi, Sahar; Fakhroo, Hessa A.
2014-01-01
This study uses the Rasch model technique to examine the dimensionality structure and differential item functioning of the Arabic version of the Perceived Physical Ability Scale for Children (PPASC). A sample of 220 Omani fourth graders (120 males and 100 females) responded to an Arabic translated version of the PPASC. Data on students'…
Dalton, Megan; Davidson, Megan; Keating, Jenny
2011-01-01
Is the Assessment of Physiotherapy Practice (APP) a valid instrument for the assessment of entry-level competence in physiotherapy students? Cross-sectional study with Rasch analysis of initial (n=326) and validation samples (n=318). Students were assessed on completion of 4, 5, or 6-week clinical placements across one university semester. 298 clinical educators and 456 physiotherapy students at nine universities in Australia and New Zealand provided 644 completed APP instruments. APP data in both samples showed overall fit to a Rasch model of expected item functioning for interval scale measurement. Item 6 (Written communication) exhibited misfit in both samples, but was retained as an important element of competence. The hierarchy of item difficulty was the same in both samples with items related to professional behaviour and communication the easiest to achieve and items related to clinical reasoning the most difficult. Item difficulty was well targeted to person ability. No Differential Item Functioning was identified, indicating that the scale performed in a comparable way regardless of the student's age, gender or amount of prior clinical experience, and the educator's age, gender, or experience as an educator, or the type of facility, university, or clinical area. The instrument demonstrated unidimensionality confirming the appropriateness of summing the scale scores on each item to provide an overall score of clinical competence and was able to discriminate four levels of professional competence (Person Separation Index=0.96). Person ability and raw APP scores had a linear relationship (r(2)=0.99). Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice. Copyright © 2011 Australian Physiotherapy Association. Published by .. All rights reserved.
Poulsen, Ingrid; Kreiner, Svend; Engberg, Aase W
2018-02-13
The Early Functional Abilities scale assesses the restoration of brain function after brain injury, based on 4 dimensions. The primary objective of this study was to evaluate the validity, objectivity, reliability and measurement precision of the Early Functional Abilities scale by Rasch model item analysis. A secondary objective was to examine the relationship between the Early Functional Abilities scale and the Functional Independence Measurement™, in order to establish the criterion validity of the Early Functional Abilities scale and to compare the sensitivity of measurements using the 2 instruments. The Rasch analysis was based on the assessment of 408 adult patients at admission to sub-acute rehabilitation in Copenhagen, Denmark after traumatic brain injury. The Early Functional Abilities scale provides valid and objective measurement of vegetative (autonomic), facio-oral, sensorimotor and communicative/cognitive functions. Removal of one item from the sensorimotor scale confirmed unidimensionality for each of the 4 subscales, but not for the entire scale. The Early Functional Abilities subscales are sensitive to differences between patients in ranges in which the Functional Independence Measurement™ has a floor effect. The Early Functional Abilities scale assesses the early recovery of important aspects of brain function after traumatic brain injury, but is not unidimensional. We recommend removal of the "standing" item and calculation of summary subscales for the separate dimensions.
Validation of the Modified Fatigue Impact Scale in Parkinson's disease.
Schiehser, Dawn M; Ayers, Catherine R; Liu, Lin; Lessig, Stephanie; Song, David S; Filoteo, J Vincent
2013-03-01
Fatigue is a common symptom in Parkinson's disease (PD); however, a multidimensional scale that measures the impact of fatigue on functioning has yet to be validated in this population. The aim of this study was to examine the validity of the Modified Fatigue Impact Scale (MFIS), a self-report measure that assesses the effects of fatigue on physical, cognitive, and psychosocial functioning, in a sample of nondemented PD patients. PD patients (N = 100) completed the MFIS, the Positive and Negative Affect Schedule (PANAS-X), and several additional measures of psychosocial, cognitive, and motor functioning. A Principal Component Analysis (PCA) and item analysis using Cronbach's alpha were conducted to determine structural validity and internal consistency of the MFIS. Correlational analyses were performed between the MFIS and the PANAS-X fatigue subscale to evaluate convergent validity and between the MFIS and measures of depression, anxiety, apathy, and disease-related symptoms to determine divergent validity. The PCA identified two viable MFIS subscales: a cognitive subscale and a combination of the original scale's physical and psychosocial subscales as one factor. Item analysis revealed high internal consistency of all 21 items and the items within the two subscales. The MFIS had strong convergent validity with the PANAS-X fatigue subscale and adequate divergent validity with measures of disease stage, motor function, and cognition. Overall, this study demonstrates that the MFIS is a valid multidimensional measure that can be used to evaluate the impact of fatigue on cognitive and physical/social functioning in PD patients without dementia. Published by Elsevier Ltd.
Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike
2018-01-01
To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B
2018-03-01
The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Sakamoto, Ai; Ukawa, Shigekazu; Okada, Emiko; Sasaki, Sachiko; Zhao, Wenjing; Kishi, Tomoko; Kondo, Katsunori; Tamakoshi, Akiko
2017-10-01
To study the association between the number of area-level and individual-level social participation items and cognitive function in the community-dwelling older populations of three towns in Hokkaido, Japan. A survey on the frequency of social participation was mailed to those in the Japan Gerontological Evaluation Study 2013 who were aged ≥65 years, were not certified as needing long-term care, and lived in Higashikawa, Higashikagura, or Biei. A subset of participants aged 70-74 years completed the Japanese version of the Montreal Cognitive Assessment in a home visit survey. Both the area-level and individual-level social participation and demographic information were obtained on the self-administered questionnaire. A multilevel analysis using a generalized linear mixed-effects model was used to examine the association between variables in the area-level and individual-level social participation items and cognitive function. Out of 4042 respondents, data from 2576 were used in the area-level analysis. Of those, 180 were aged 70-74 years and completed the home visit survey for the individual-level analysis. A greater number of higher social participation items at the individual level was associated with higher cognitive function scores after adjusting for area-level social participation variables and confounders (regression coefficient: 0.19; 95% confidence interval: 0.03, 0.35). There were no significant associations between area-level social participation item averages and individual-level cognitive function scores. Older populations participating in many kinds of social activities exhibited preserved cognitive function even after adjusting for area-level social participation variables. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Babiar, Tasha Calvert
2011-01-01
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.
ERIC Educational Resources Information Center
White, Pamela; O'Reilly, Mark; Fragale, Christina; Kang, Soyeon; Muhich, Kimberly; Falcomata, Terry; Lang, Russell; Sigafoos, Jeff; Lancioni, Giulio
2011-01-01
Two children with autism who engaged in aggression and stereotypy were assessed using common analogue functional analysis procedures. Aggression was maintained by access to specific preferred items. Data on the rates of stereotypy and appropriate play were collected during an extended functional analysis tangible condition. These data reveal that…
Kostuj, Tanja; Stief, Felix; Hartmann, Kirsten Anna; Schaper, Katharina; Arabmotlagh, Mohammad; Baums, Mike H; Meurer, Andrea; Krummenauer, Frank; Lieske, Sebastian
2018-04-05
After cross-cultural adaption for the German translation of the Ankle-Hindfoot Scale of the American Orthopaedic Foot and Ankle Society (AOFAS-AHS) and agreement analysis with the Foot Function Index (FFI-D), the following gait analysis study using the Oxford Foot Model (OFM) was carried out to show which of the two scores better correlates with objective gait dysfunction. Results of the AOFAS-AHS and FFI-D, as well as data from three-dimensional gait analysis were collected from 20 patients with mild to severe ankle and hindfoot pathologies.Kinematic and kinetic gait data were correlated with the results of the total AOFAS scale and FFI-D as well as the results of those items representing hindfoot function in the AOFAS-AHS assessment. With respect to the foot disorders in our patients (osteoarthritis and prearthritic conditions), we correlated the total range of motion (ROM) in the ankle and subtalar joints as identified by the OFM with values identified during clinical examination 'translated' into score values. Furthermore, reduced walking speed, reduced step length and reduced maximum ankle power generation during push-off were taken into account and correlated to gait abnormalities described in the scores. An analysis of correlations with CIs between the FFI-D and the AOFAS-AHS items and the gait parameters was performed by means of the Jonckheere-Terpstra test; furthermore, exploratory factor analysis was applied to identify common information structures and thereby redundancy in the FFI-D and the AOFAS-AHS items. Objective findings for hindfoot disorders, namely a reduced ROM, in the ankle and subtalar joints, respectively, as well as reduced ankle power generation during push-off, showed a better correlation with the AOFAS-AHS total score-as well as AOFAS-AHS items representing ROM in the ankle, subtalar joints and gait function-compared with the FFI-D score.Factor analysis, however, could not identify FFI-D items consistently related to these three indicator parameters (pain, disability and function) found in the AOFAS-AHS. Furthermore, factor analysis did not support stratification of the FFI-D into two subscales. The AOFAS-AHS showed a good agreement with objective gait parameters and is therefore better suited to evaluate disability and functional limitations of patients suffering from foot and ankle pathologies compared with the FFI-D. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Lai, Jin-Shei; Hammel, Joy; Jerousek, Sara; Goldsmith, Arielle; Miskovic, Ana; Baum, Carolyn; Wong, Alex W; Dashner, Jessica; Heinemann, Allen W
2016-12-01
To develop a measure of perceived systems, services, and policies facilitators (see Chapter 5 of the International Classification of Functioning, Disability and Health) for people with neurologic disabilities and to evaluate the effect of perceived systems, services, and policies facilitators on health-related quality of life. Qualitative approaches to develop and refine items. Confirmatory factor analysis including 1-factor confirmatory factor analysis and bifactor analysis to evaluate unidimensionality of items. Rasch analysis to identify misfitting items. Correlational and analysis of variance methods to evaluate construct validity. Community-dwelling individuals participated in telephone interviews or traveled to the academic medical centers where this research took place. Participants (N=571) had a diagnosis of spinal cord injury, stroke, or traumatic brain injury. They were 18 years or older and English speaking. Not applicable. An item bank to evaluate environmental access and support levels of services, systems, and policies for people with disabilities. We identified a general factor defined as "access and support levels of the services, systems, and policies at the level of community living" and 3 local factors defined as "health services," "community living," and "community resources." The systems, services, and policies measure correlated moderately with participation measures: Community Participation Indicators (CPI) - Involvement, CPI - Control over Participation, Quality of Life in Neurological Disorders - Ability to Participate, Quality of Life in Neurological Disorders - Satisfaction with Role Participation, Patient-Reported Outcomes Measurement Information System (PROMIS) Ability to Participate, PROMIS Satisfaction with Role Participation, and PROMIS Isolation. The measure of systems, services, and policies facilitators contains items pertaining to health services, community living, and community resources. Investigators and clinicians can measure perceptions of systems, services, and policies resources reliably with the items described here. Moderate relations between systems, services, and policies facilitators and PROMIS and CPI variables provide support for the measurement and theory of environmental effects on social functioning related to participation. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment
Lebedeva, Elena; Huang, Mei; Koski, Lisa
2016-01-01
Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
Church, A Timothy; Alvarez, Juan M; Mai, Nhu T Q; French, Brian F; Katigbak, Marcia S; Ortiz, Fernando A
2011-11-01
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.
NASA Astrophysics Data System (ADS)
Rahmani, B. D.
2018-01-01
The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Bluck, Susan; Alea, Nicole
2011-07-01
Theory suggests that autobiographical remembering serves several functions. This research builds on previous empirical efforts (Bluck, Alea, Habermas, & Rubin, 2005) with the aim of constructing a brief, valid measure of three functions of autobiographical memory. Participants (N=306) completed 28 theoretically derived items concerning the frequency with which they use autobiographical memory to serve a variety of functions. To examine convergent and discriminant validity, participants rated their tendency to think about and talk about the past, and measures of future time orientation, self-concept clarity, and trait personality. Confirmatory factor analysis of the function items resulted in a respecified model with 15 items in three factors. The newly developed Thinking about Life Experiences scale (TALE) shows good internal consistency as well as convergent validity for three subscales: Self-Continuity, Social-Bonding, and Directing-Behaviour. Analyses demonstrate factorial equivalence across age and gender groups. Potential use and limitations of the TALE are discussed.
Factor structure and gender stability in the multidimensional condom attitudes scale.
Starosta, Amy J; Berghoff, Christopher R; Earleywine, Mitch
2015-06-01
Sexually transmitted infections continue to trouble the United States and can be attenuated through increased condom use. Attitudes about condoms are an important multidimensional factor that can affect sexual health choices and have been successfully measured using the Multidimensional Condom Attitudes Scale (MCAS). Such attitudes have the potential to vary between men and women, yet little work has been undertaken to identify if the MCAS accurately captures attitudes without being influenced by underlying gender biases. We examined the factor structure and gender invariance on the MCAS using confirmatory factor analysis and item response theory, within-subscale differential item functioning analyses. More than 770 participants provided data via the Internet. Results of differential item functioning analyses identified three items as differentially functioning between the genders, and removal of these items is recommended. Findings confirmed the previously hypothesized multidimensional nature of condom attitudes and the five-factor structure of the MCAS even after the removal of the three problematic items. In general, comparisons across genders using the MCAS seem reasonable from a methodological standpoint. Results are discussed in terms of improving sexual health research and interventions. © The Author(s) 2014.
Hong, Ickpyo; Reistetter, Timothy A; Díaz-Venegas, Carlos; Michaels-Obregon, Alejandra; Wong, Rebeca
2018-05-10
Cross-national comparisons of patterns of population aging have emerged as comparable national micro-data have become available. This study creates a metric using Rasch analysis and determines the health of American and Mexican older adult populations. Secondary data analysis using representative samples aged 50 and older from 2012 U.S. Health and Retirement Study (n = 20,554); 2012 Mexican Health and Aging Study (n = 14,448). We developed a function measurement scale using Rasch analysis of 22 daily tasks and physical function questions. We tested psychometrics of the scale including factor analysis, fit statistics, internal consistency, and item difficulty. We investigated differences in function using multiple linear regression controlling for demographics. Lastly, we conducted subgroup analyses for chronic conditions. The created common metric demonstrated a unidimensional structure with good item fit, an acceptable precision (person reliability = 0.78), and an item difficulty hierarchy. The American adults appeared less functional than adults in Mexico (β = - 0.26, p < 0.0001) and across two chronic conditions (arthritis, β = - 0.36; lung problems, β = - 0.62; all p < 0.05). However, American adults with stroke were more functional than Mexican adults (β = 0.46, p = 0.047). The Rasch model indicates that Mexican adults were more functional than Americans at the population level and across two chronic conditions (arthritis and lung problems). Future studies would need to elucidate other factors affecting the function differences between the two countries.
Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd
2017-01-01
The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Shedden-Mora, Meike C.; Löwe, Bernd
2017-01-01
Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Tesio, L; Cantagallo, A
1998-01-01
The Functional Assessment Measure (FAM) has been proposed as a measure of disability in post-acute Traumatic Brain Injury (TBI) outpatients. It is comprised of the 18 items of The Functional Independence Measure (FIMSM), scored in terms of dependence, and of 12 newly designed items, scored in terms of dependence (7 items) or performance (5 items). The FIMSM covers the domains of self-care, sphincter management, mobility, locomotion, communication and social cognition. The 12 new items explore the domains of community integration, emotional status, orientation, attention, reading/writing skills, swallowing and speech intelligibility. By addressing a set of problems quite specific for TBI outpatients the FAM was intended to raise the ceiling of the FIMSM and to allow a more precise estimate of their disability. These claims, however, were never supported in previous studies. We administered the FAM to 60 TBI outpatient, 2-88 months (median 16) from trauma. Rasch analysis (rating scale model) was adopted to test the psychometric properties of the scale. The FAM was reliable (Rasch item and person reliability 0.91 and 0.93, respectively). Two of the 12 FAM-specific items were severely misfitting with the general construct, and were deleted. Within the 28-item refined FAM scale, 4 new items and 2 FIMSM items still retained signs of misfit. The FAM was on average too easy. The most difficult item (a new one, Employability) did not attain the average ability of the subjects. Also, it was only slightly more difficult than than the most difficult FIMSM item (Memory). The FAM does not seem to improve the FIMSM as a far as TBI outpatients are to be assessed.
Kramer, Jessica M; Schwartz, Ariel
2017-10-01
This study examined the item interpretability and rating scale use of the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) by young people with developmental disabilities. The PEDI-PRO assesses the functional performance of discrete functional tasks in the context of everyday life situations. A two-phase cognitive interview design was implemented with a convenience sample of 37 young people (mean age 19y, SD 2y 5mo; 13 males and 24 females; 68% with intellectual disability) with developmental disabilities. In phase I, 182 item candidates were each reviewed by an average of four young people. In phase II, 103 items were carried forward or revised and each reviewed by an average of seven additional young people. Two raters coded responses for intended item interpretation and performance quality; codes were analysed using descriptive statistics. Qualitative analysis explored young people's self-evaluation process. Items were interpreted as intended by most young people (mean 86%). Young people can use PEDI-PRO response categories appropriately to describe their performance: 94% of positive performance descriptions coincided with a positive response category choice; 73% of negative descriptions coincided with a negative response category choice. Young people interpreted items in a literal manner, and their self-evaluation incorporated the use of supports that facilitate functional performance. The PEDI-PRO's measurement framework appears to support the self-evaluation of functional performance of young people with developmental disabilities. © 2017 Mac Keith Press.
Nishigami, Tomohiko; Mibu, Akira; Tanaka, Katsuyoshi; Yamashita, Yuh; Watanabe, Akihisa; Tanabe, Akihito
2017-03-01
The Pain Catastrophizing Scale (PCS) is a commonly used as measure of pain catastrophizing. The scale comprises 13 items related to magnification, rumination, and helplessness. To facilitate quick screening and to reduce participant's burden, the four-item and six-item short forms of the English version of the PCS were developed. The purpose of the present study was to evaluate the psychometric properties of a Japanese version of the short forms of PCS using a contemporary approach called Rasch analysis. A total of 216 patients with musculoskeletal disorders were recruited in this study. Participants completed study measures, which included the pain intensity, the Pain Catastrophizing Scale (PCS), and the Tampa Scale of Kinesiophobia (TSK). Furthermore, the four-item (items 3, 6, 8, and 11) and six-item (items 4, 5, 6, 10, 11, and 13) short forms of the Japanese version of PCS were measured. We used Rasch analysis to analyze the psychometric properties of the original, four-item, and six-item short forms of PCS. Rasch analysis showed that both short forms of PCS had acceptable internal consistency, unidimensionality, and no notable DIF and were functional on the category rating scale. However, four-item short form of PCS had two misfit items. Six-item short form of PCS has acceptable psychometric properties and is suitable for use in participants with musculoskeletal pain. Thus, six-item can be used as brief instruments to evaluate pain catastrophizing. Copyright © 2016 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?
ERIC Educational Resources Information Center
Royal, Kenneth D.; Stockdale, Myrah R.
2015-01-01
The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Gibbons, Chris J; Thornton, Everard W; Ealing, John; Shaw, Pamela J; Talbot, Kevin; Tennant, Alan; Young, Carolyn A
2013-11-15
Social withdrawal is described as the condition in which an individual experiences a desire to make social contact, but is unable to satisfy that desire. It is an important issue for patients with motor neurone disease who are likely to experience severe physical impairment. This study aims to reassess the psychometric and scaling properties of the MND Social Withdrawal Scale (MND-SWS) domains and examine the feasibility of a summary scale, by applying scale data to the Rasch model. The MND Social Withdrawal Scale was administered to 298 patients with a diagnosis of MND, alongside the Hospital Anxiety and Depression Scale. The factor structure of the MND Social Withdrawal Scale was assessed using confirmatory factor analysis. Model fit, category threshold analysis, differential item functioning (DIF), dimensionality and local dependency were evaluated. Factor analysis confirmed the suitability of the four-factor solution suggested by the original authors. Mokken scale analysis suggested the removal of item five. Rasch analysis removed a further three items; from the Community (one item) and Emotional (two items) withdrawal subscales. Following item reduction, each scale exhibited excellent fit to the Rasch model. A 14-item Summary scale was shown to fit the Rasch model after subtesting the items into three subtests corresponding to the Community, Family and Emotional subscales, indicating that items from these three subscales could be summed together to create a total measure for social withdrawal. Removal of four items from the Social Withdrawal Scale led to a four factor solution with a 14-item hierarchical Summary scale that were all unidimensional, free for DIF and well fitted to the Rasch model. The scale is reliable and allows clinicians and researchers to measure social withdrawal in MND along a unidimensional construct. © 2013. Published by Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Baylor, Carolyn; McAuliffe, Megan J.; Hughes, Louise E.; Yorkston, Kathryn; Anderson, Tim; Jiseon, Kim; Amtmann, Dagmar
2014-01-01
Purpose: To examine the cross-cultural applicability of the Communicative Participation Item Bank (CPIB) through a comparison of respondents with Parkinson's disease (PD) from the United States and New Zealand. Method: A total of 428 respondents--218 from the United States and 210 from New Zealand-completed the self-report CPIB and a series of…
Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S
2011-10-01
We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Lawton IADL scale in dementia: can item response theory make it more informative?
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
2014-07-01
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Kalibatseva, Z; Leong, F T L; Ham, E H
2014-09-01
Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.
Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard
2016-01-01
Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S
2015-05-01
To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.
Crins, Martine H P; van der Wees, Philip J; Klausch, Thomas; van Dulmen, Simone A; Roorda, Leo D; Terwee, Caroline B
2018-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs), measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF) in Dutch patients receiving physical therapy. Cross-sectional study. 805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items). Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]). Reliability (standard errors of theta) was assessed. The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance). Some local dependence was found (8.2% of item pairs). The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33) and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85). Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI. The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of physical function of physiotherapy patients.
Functional recovery in patients with schizophrenia: recommendations from a panel of experts.
Lahera, Guillermo; Gálvez, José L; Sánchez, Pedro; Martínez-Roig, Miguel; Pérez-Fuster, J V; García-Portilla, Paz; Herrera, Berta; Roca, Miquel
2018-06-05
The management of schizophrenia is evolving towards a more comprehensive model based on functional recovery. The concept of functional recovery goes beyond clinical remission and encompasses multiple aspects of the patient's life, making it difficult to settle on a definition and to develop reliable assessment criteria. In this consensus process based on a panel of experts in schizophrenia, we aimed to provide useful insights on functional recovery and its involvement in clinical practice and clinical research. After a literature review of functional recovery in schizophrenia, a scientific committee of 8 members prepared a 75-item questionnaire, including 6 sections: (I) the concept of functional recovery (9 items), (II) assessment of functional recovery (23 items), (III) factors influencing functional recovery (16 items), (IV) psychosocial interventions and functional recovery (8 items), (V) pharmacological treatment and functional recovery (14 items), and (VI) the perspective of patients and their relatives on functional recovery (5 items). The questionnaire was sent to a panel of 53 experts, who rated each item on a 9-point Likert scale. Consensus was achieved in a 2-round Delphi dynamics, using the median (interquartile range) scores to consider consensus in either agreement (scores 7-9) or disagreement (scores 1-3). Items not achieving consensus in the first round were sent back to the experts for a second consideration. After the two recursive rounds, consensus was achieved in 64 items (85.3%): 61 items (81.3%) in agreement and 3 (4.0%) in disagreement, all of them from section II (assessment of functional recovery). Items not reaching consensus were related to the concepts of functional recovery (1 item, 1.3%), functional assessment (5 items, 6.7%), factors influencing functional recovery (3 items, 4.0%), and psychosocial interventions (2 items, 5.6%). Despite the lack of a well-defined concept of functional recovery, we identified a trend towards a common archetype of the definition and factors associated with functional recovery, as well as its applicability in clinical practice and clinical research.
NASA Astrophysics Data System (ADS)
Ding, Lin
2014-02-01
Discipline-based science concept assessments are powerful tools to measure learners' disciplinary core ideas. Among many such assessments, the Brief Electricity and Magnetism Assessment (BEMA) has been broadly used to gauge student conceptions of key electricity and magnetism (E&M) topics in college-level introductory physics courses. Differing from typical concept inventories that focus only on one topic of a subject area, BEMA covers a broad range of topics in the electromagnetism domain. In spite of this fact, prior studies exclusively used a single aggregate score to represent individual students' overall understanding of E&M without explicating the construct of this assessment. Additionally, BEMA has been used to compare traditional physics courses with a reformed course entitled Matter and Interactions (M&I). While prior findings were in favor of M&I, no empirical evidence was sought to rule out possible differential functioning of BEMA that may have inadvertently advantaged M&I students. In this study, we used Rasch analysis to seek two missing pieces regarding the construct and differential functioning of BEMA. Results suggest that although BEMA items generally can function together to measure the same construct of application and analysis of E&M concepts, several items may need further revision. Additionally, items that demonstrate differential functioning for the two courses are detected. Issues such as item contextual features and student familiarity with question settings may underlie these findings. This study highlights often overlooked threats in science concept assessments and provides an exemplar for using evidence-based reasoning to make valid inferences and arguments.
Massof, Robert W
2014-10-01
A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
North American Veterinary Licensing Examination pacing study.
Subhiyah, Raja G; Boyce, John R
2010-01-01
The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-03-29
To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.
Walton, David M; Beattie, Tyler; Putos, Joseph; MacDermid, Joy C
2016-06-01
The Brief Pain Inventory is composed of two quantifiable scales: pain severity and pain interference. The reported factor structure of the interference subscale is not consistent in the extant literature, with no clear choice between a single- or two-factor structure. Here, we report on the results of Rasch-based analysis of the interference subscale using a large population-based ambulatory patient database (the Quebec Pain Registry). Observational cohort. A total of 1,000 responses were randomly drawn from a total database of 5,654 for this analysis. Both the original 7-item and an expanded 10-item version (Tyler 2002) of the interference subscale were evaluated. Rasch analysis revealed significant misfit of both versions of the scale, with the original 7-item version outperforming the expanded 10-item version. Analysis of dimensionality revealed that both versions showed improved model fit when considered two subscales (affective and physical interference) with the item on sleep interference removed or considered separately. Additionally, significant uniform differential item functioning was identified for 6 of the 7 original items when the sample was stratified by age above or below 55 years. The interference subscale achieved adequate model fit when considered as two separate subscales with age as a mediator of response, while interpreting the sleep interference item separately. A transformation matrix revealed that in all cases, ordinal-level change at the extreme ends of the scale appears to be more meaningful than does a similar change at the midpoints. The Interference subscale of the BPI should be interpreted as two separate subscales (Affective Interference, Physical Interference) with the sleep item removed or interpreted separately for optimal fit to the Rasch model. Implications for research and clinical use are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Systems Analysis Directorate Activities Summary August 1977
1977-09-01
are: x a. Cataloging direction b. Requirements computation c. Procurement direction d. Distribution management e. Disposal direction f...34inventory management," as a responsibility of NICP’s, includes cataloging, requirements computation, procurement direction, distribution management , maintenance...functions are cataloging, major item management, secondary item management, procurement direction, distribution management , overhaul and rebuild
Rasch analysis of the carers quality of life questionnaire for parkinsonism.
Pillas, Marios; Selai, Caroline; Schrag, Anette
2017-03-01
To assess the psychometric properties of the Carers Quality of Life Questionnaire for Parkinsonism using a Rasch modeling approach and determine the optimal cut-off score. We performed a Rasch analysis of the survey answers of 430 carers of patients with atypical parkinsonism. All of the scale items demonstrated acceptable goodness of fit to the Rasch model. The scale was unidimensional and no notable differential item functioning was detected in the items regarding age and disease type. Rating categories were functioning adequately in all scale items. The scale had high reliability (.95) and construct validity and a high degree of precision, distinguishing between 5 distinct groups of carers with different levels of quality of life. A cut-off score of 62 was found to have the optimal screening accuracy based on Hospital Anxiety and Depression Scale subscores. The results suggest that the Carers Quality of Life Questionnaire for Parkinsonism is a useful scale to assess carers' quality of life and allows analyses requiring interval scaling of variables. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan
2007-01-01
Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities
Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.
2017-01-01
Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.
Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M
2016-09-01
The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Validation of a Health Literacy Measure for Adolescents and Young Adults Diagnosed with Cancer.
McDonald, Fiona E J; Patterson, Pandora; Costa, Daniel S J; Shepherd, Heather L
2016-03-01
Health literacy can influence long-term health outcomes. This study aimed to validate an adapted version of the Functional, Communicative and Critical Health Literacy measure for adolescent and young adult (AYA) cancer patients and survivors (N = 105; age 12-24 years). Exploratory factor analysis was used to validate the measure, and indicated that a slightly modified item structure better fit the results. Furthermore, item response theory analysis highlighted location and discrimination parameter differences among items. Acceptability of the measure was high. This is the first validation of a health literacy measure among AYAs with an illness such as cancer.
Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael
2017-04-01
Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Cook, Karon F; Kallen, Michael A; Bombardier, Charles; Bamer, Alyssa M; Choi, Seung W; Kim, Jiseon; Salem, Rana; Amtmann, Dagmar
2017-01-01
To evaluate whether items of three measures of depressive symptoms function differently in persons with spinal cord injury (SCI) than in persons from a primary care sample. This study was a retrospective analysis of responses to the Patient Health Questionnaire depression scale, the Center for Epidemiological Studies Depression scale, and the National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS ® ) version 1.0 eight-item depression short form 8b (PROMIS-D). The presence of differential item function (DIF) was evaluated using ordinal logistic regression. No items of any of the three target measures were flagged for DIF based on standard criteria. In a follow-up sensitivity analyses, the criterion was changed to make the analysis more sensitive to potential DIF. Scores were corrected for DIF flagged under this criterion. Minimal differences were found between the original scores and those corrected for DIF under the sensitivity criterion. The three depression screening measures evaluated in this study did not perform differently in samples of individuals with SCI compared to general and community samples. Transdiagnostic symptoms did not appear to spuriously inflate depression severity estimates when administered to people with SCI.
NASA Technical Reports Server (NTRS)
Schmeckpeper, K. R.
1987-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Electrical Power Distribution and Control (EPD and C) hardware. The EPD and C hardware performs the functions of distributing, sensing, and controlling 28 volt DC power and of inverting, distributing, sensing, and controlling 117 volt 400 Hz AC power to all Orbiter subsystems from the three fuel cells in the Electrical Power Generation (EPG) subsystem. Volume 2 continues the presentation of IOA analysis worksheets and contains the potential critical items list.
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia
2014-06-01
Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.
Sheldon, Signy; Levine, Brian
2015-12-01
During autobiographical memory retrieval, the medial temporal lobes (MTL) relate together multiple event elements, including object (within-item relations) and context (item-context relations) information, to create a cohesive memory. There is consistent support for a functional specialization within the MTL according to these relational processes, much of which comes from recognition memory experiments. In this study, we compared brain activation patterns associated with retrieving within-item relations (i.e., associating conceptual and sensory-perceptual object features) and item-context relations (i.e., spatial relations among objects) with respect to naturalistic autobiographical retrieval. We developed a novel paradigm that cued participants to retrieve information about past autobiographical events, non-episodic within-item relations, and non-episodic item-context relations with the perceptuomotor aspects of retrieval equated across these conditions. We used multivariate analysis techniques to extract common and distinct patterns of activity among these conditions within the MTL and across the whole brain, both in terms of spatial and temporal patterns of activity. The anterior MTL (perirhinal cortex and anterior hippocampus) was preferentially recruited for generating within-item relations later in retrieval whereas the posterior MTL (posterior parahippocampal cortex and posterior hippocampus) was preferentially recruited for generating item-context relations across the retrieval phase. These findings provide novel evidence for functional specialization within the MTL with respect to naturalistic memory retrieval. © 2015 Wiley Periodicals, Inc.
Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian
2015-01-01
The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400
NASA Astrophysics Data System (ADS)
Dong, Sunghee; Jeong, Jichai
2018-02-01
Objective. Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. Approach. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. Main results. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. Significance. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
Dong, Sunghee; Jeong, Jichai
2018-02-01
Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L
2017-04-11
The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Ashford, Stephen; Jackson, Diana; Turner-Stokes, Lynne
2015-03-01
Following stroke or brain injury, goals for rehabilitation of the hemiparetic upper limb include restoring active function if there is return of motor control or, if none is possible, improving passive function, and facilitating care for the limb. To inform development of a new patient reported outcome measure (PROM) of active and passive function in the hemiparetic upper limb, the Arm Activity measure, we examined functional goals for the upper limb, identified during goal setting for spasticity intervention (physical therapy and concomitant botulinum toxin A interventions). Using secondary analysis of a prospective observational cohort study, functional goals determined between patients, their carers and the clinical team were assigned into categories by two raters. Goal category identification, followed by assignment of goals to a category, was undertaken and then confirmed by a second reviewer. Participants comprised nine males and seven females of mean (SD) age 54.5 (15.7) years and their carers. Fifteen had sustained a stroke and one a traumatic brain injury. Goals were used to identify five categories: passive function, active function, symptoms, cosmesis and impairment. Two passive function items not previously identified by a previous systematic review were identified. Analysis of goals important to patients and carers revealed items for inclusion in a new measure of arm function and provide a useful alternative method to involve patients and carers in standardised measure development. Copyright © 2014 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E
2015-01-01
The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Examining the Measurement Precision and Invariance of the Revised Get Ready to Read!
Farrington, Amber L.; Lonigan, Christopher J.
2016-01-01
Children's emergent literacy skills are highly predictive of later reading abilities. To determine which children have weaker emergent literacy skills and are in need of intervention, it is necessary to assess emergent literacy skills accurately and reliably. In this study, 1,351 children were administered the Revised Get Ready to Read! (GRTR-R), and an item response theory analysis was used to evaluate the item-level reliability of the measure. Differential item functioning (DIF) analyses were conducted to examine whether items function similarly between subpopulations of children. The GRTR-R had acceptable reliability for children whose ability level was just below the mean. DIF for a small number of items was present for only two comparisons—children who were older versus younger and children who were White versus African American. These results demonstrate that the GRTR-R has acceptable reliability and limited DIF, enabling the screener to identify those at risk for developing reading problems. PMID:23851136
Daker-White, Gavin; Crowley, Tessa
2003-05-01
A cross-sectional questionnaire survey of 216 men and 191 women attending a genitourinary medicine (GUM) clinic was undertaken to explore the relationship between sexual symptoms and quality of sexual life, and to test the psychometric validity of a pilot self-report measure of Sexual Function and Quality of Sexual Life (SFQoSL). Statistical comparisons were made with three reference groups: volunteers attending GUM for psychosexual counselling, outpatients at an Obstetrics and Gynaecology Department, and staff. Exploratory principal components analysis (with varimax rotation) of questionnaire item responses suggested an 11 (in women) and 13 (in men) factor solution, incorporating four multi-item scales. Internal consistency (Cronbach's alpha) of core items was 0.84 in 186 women (19 items) and 0.87 in 210 men (22 items). Construct validity was supported in comparisons with reference groups using one-way analysis of variance and post-hoc Scheffé testing. Overall, 116 (54%) male and 132 (69%) female GUM outpatients had scores indicating sexual dysfunction. Thirty-seven (17%) men reported erectile dysfunction; 54 (28%) women reported vaginal dryness affecting sex; 48 (25%) women reported genital changes affecting sex; 45 (21%) men and 64 (34%) women reported problems reaching orgasm.
Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis
Pallant, Julie F; Miller, Renée L; Tennant, Alan
2006-01-01
Background The Edinburgh Postnatal Depression Scale (EPDS) is a 10 item self-rating post-natal depression scale which has seen widespread use in epidemiological and clinical studies. Concern has been raised over the validity of the EPDS as a single summed scale, with suggestions that it measures two separate aspects, one of depressive feelings, the other of anxiety. Methods As part of a larger cross-sectional study conducted in Melbourne, Australia, a community sample (324 women, ranging in age from 18 to 44 years: mean = 32 yrs, SD = 4.6), was obtained by inviting primiparous women to participate voluntarily in this study. Data from the EPDS were fitted to the Rasch measurement model and tested for appropriate category ordering, for item bias through Differential Item Functioning (DIF) analysis, and for unidimensionality through tests of the assumption of local independence. Results Rasch analysis of the data from the ten item scale initially demonstrated a lack of fit to the model with a significant Item-Trait Interaction total chi-square (chi Square = 82.8, df = 40; p < .001). Removal of two items (items 7 and 8) resulted in a non-significant Item-Trait Interaction total chi-square with a residual mean value for items of -0.467 with a standard deviation of 0.850, showing fit to the model. No DIF existed in the final 8-item scale (EPDS-8) and all items showed fit to model expectations. Principal Components Analysis of the residuals supported the local independence assumption, and unidimensionality of the revised EPDS-8 scale. Revised cut points were identified for EPDS-8 to maintain the case identification of the original scale. Conclusion The results of this study suggest that EPDS, in its original 10 item form, is not a viable scale for the unidimensional measurement of depression. Rasch analysis suggests that a revised eight item version (EPDS-8) would provide a more psychometrically robust scale. The revised cut points of 7/8 and 9/10 for the EPDS-8 show high levels of agreement with the original case identification for the EPDS-10. PMID:16768803
NASA Astrophysics Data System (ADS)
Giraldo, Diana L.; Sijbers, Jan; Romero, Eduardo
2017-11-01
The diagnosis of Alzheimer's disease (AD) and mild cognitive impairment (MCI) is based on neuropsychological evaluation of the patient. Different cognitive and memory functions are assessed by a battery of tests that are composed of items devised to specifically evaluate such upper functions. This work aims to identify and quantify the factors that determine the performance in neuropsychological evaluation by conducting an Exploratory Factor Analysis (EFA). For this purpose, using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), EFA was applied to 67 item scores taken from the baseline neuropsychological battery of the three phases of ADNI study. The found factors are directly related to specific brain functions such as memory, behavior, orientation, or verbal fluency. The identification of factors is followed by the calculation of factor scores given by weighted linear combinations of the items scores.
Testing for DIF in a Model with Single Peaked Item Characteristic Curves: The PARELLA Model.
ERIC Educational Resources Information Center
Hoijtink, Herbert; Molenaar, Ivo W.
1992-01-01
The PARallELogram Analysis (PARELLA) model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. A method is presented for testing for differential item functioning (DIF) for the PARELLA model using the approach of D. Thissen and others (1988). (SLD)
ERIC Educational Resources Information Center
Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin
2015-01-01
This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…
Evaluation of MIMIC-Model Methods for DIF Testing with Comparison to Two-Group Analysis
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on the construct. This study focuses on the use of multiple-indicator multiple-cause (MIMIC) structural equation models for DIF testing, parameterized as item…
Developing an African youth psychosocial assessment: an application of item response theory.
Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise
2014-06-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Developing an African youth psychosocial assessment: an application of item response theory
BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE
2014-01-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
[Differential item functioning: a bibliometric analysis of journals published in Spanish].
Guilera, Georgina; Gómez, Juana; Hidalgo, M Dolores
2006-11-01
Differential item functioning: a bibliometric analysis of journals published in Spanish. This study aims to provide an overview of scientific productivity with respect to articles published in Spanish on the issue of DIF. The documents included in the study were identified using the Psicodoc database, as well as the Science Citation Index and Social Science Citation Index from the Web of Science. The analyses carried out are focused mainly on presenting the frequencies and percentages of publications with respect to various bibliometric indicators. The results reveal that interest in the issue of DIF has increased, and that the universities are the most productive institutions. The majority of articles have been published in the journal Psicothema.
An item response curves analysis of the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.
2012-09-01
Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.
How do top cable news websites portray cognition as an aging issue?
Vandenberg, Anna E; Price, Anna E; Friedman, Daniela B; Marchman, Graham; Anderson, Lynda A
2012-06-01
We examined messages that the websites of the top cable news companies (MSNBC, FOX, and CNN) conveyed about cognition between January 2007 and March 2010. Drawing on agenda-setting theory, this work assessed the frequency, prominence, and attributes of cognitive topics in messages targeting an aging audience. We used quantitative content analysis to examine the frequency and prominence of cognitive topics and cognitive goals, as well as how the cognitive discussions were framed. Chi-square analyses were conducted to compare cognitive health information discussed in news items that did and did not target an "aging audience." Qualitative analysis of the aging audience subgroup was used to further examine age-associated cognitive messages. Within the 229 cognitive items identified, we found significantly more coverage of cognitive functioning and unspecified dementia and significantly less coverage of cognitive disease not dementia, specified dementia, and accidents or injury for the aging audience. Our qualitative analysis of news items aimed at an aging audience documented a focus on maintaining functioning and avoiding decline through various individual lifestyle behaviors. However, contextual information about level of cognition to be maintained, particular cognitive functions targeted, specific norms about cognitive aging, and how cognitive function is determined was lacking. Our research points to a communication gap in the delivery of academic research findings to a lay audience through online journalism. We suggest more clarity by researchers in defining cognitive concepts and measurement of cognitive function for journalistic translation and public consumption.
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
NASA Astrophysics Data System (ADS)
Nomoto, Yohei; Yamashita, Kazuhiko; Ohya, Tetsuya; Koyama, Hironori; Kawasumi, Masashi
There is the increasing concern of the society to prevent the fall of the aged. The improvement in aged people's the muscular strength of the lower-limb, postural control and walking ability are important for quality of life and fall prevention. The aim of this study was to develop multiple evaluation methods in order to advise for improvement and maintenance of lower limb function between aged and young. The subjects were 16 healthy young volunteers (mean ± S.D: 19.9 ± 0.6 years) and 10 healthy aged volunteers (mean ± S.D: 80.6 ± 6.1 years). Measurement items related to lower limb function were selected from the items which we have ever used. Selected measurement items of function of lower are distance of extroversion of the toe, angle of flexion of the toe, maximum width of step, knee elevation, moving distance of greater trochanter, walking balance, toe-gap force and rotation range of ankle joint. Measurement items summarized by the principal component analysis into lower ability evaluation methods including walking ability and muscle strength of lower limb and flexibility of ankle. The young group demonstrated the factor of 1.6 greater the assessment score of walking ability compared with the aged group. The young group demonstrated the factor of 1.4 greater the assessment score of muscle strength of lower limb compared with the aged group. The young group demonstrated the factor of 1.2 greater the assessment score of flexibility of ankle compared with the aged group. The results suggested that it was possible to assess the lower limb function of aged and young numerically and to advise on their foot function.
Mouthon, L; Rannou, F; Bérezné, A; Pagnoux, C; Arène, J‐P; Foïs, E; Cabane, J; Guillevin, L; Revel, M; Fermanian, J; Poiraudeau, S
2007-01-01
Objective To develop and assess the reliability and construct validity of a scale assessing disability involving the mouth in systemic sclerosis (SSc). Methods We generated a 34‐item provisional scale from mailed responses of patients (n = 74), expert consensus (n = 10) and literature analysis. A total of 71 other SSc patients were recruited. The test–retest reliability was assessed using the intraclass coefficient correlation and divergent validity using the Spearman correlation coefficient. Factor analysis followed by varimax rotation was performed to assess the factorial structure of the scale. Results The item reduction process retained 12 items with 5 levels of answers (total score range 0–48). The mean total score of the scale was 20.3 (SD 9.7). The test–retest reliability was 0.96. Divergent validity was confirmed for global disability (Health Assessment Questionnaire (HAQ), r = 0.33), hand function (Cochin Hand Function Scale, r = 0.37), inter‐incisor distance (r = −0.34), handicap (McMaster‐Toronto Arthritis questionnaire (MACTAR), r = 0.24), depression (Hospital Anxiety and Depression (HAD); HADd, r = 0.26) and anxiety (HADa, r = 0.17). Factor analysis extracted 3 factors with eigenvalues of 4.26, 1.76 and 1.47, explaining 63% of the variance. These 3 factors could be clinically characterised. The first factor (5 items) represents handicap induced by the reduction in mouth opening, the second (5 items) handicap induced by sicca syndrome and the third (2 items) aesthetic concerns. Conclusion We propose a new scale, the Mouth Handicap in Systemic Sclerosis (MHISS) scale, which has excellent reliability and good construct validity, and assesses specifically disability involving the mouth in patients with SSc. PMID:17502364
Item Information and Discrimination Functions for Trinary PCM Items.
ERIC Educational Resources Information Center
Akkermans, Wies; Muraki, Eiji
1997-01-01
For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)
ERIC Educational Resources Information Center
Thurman, Carol
2009-01-01
The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
Development and assessment of floor and ceiling items for the PROMIS physical function item bank
2013-01-01
Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2016-01-01
To report piloting and initial validation of the VQoL_CYP, a novel age-appropriate vision-related quality of life (VQoL) instrument for self-reporting by children with visual impairment (VI). Participants were a random patient sample of children with VI aged 10-15 years. 69 patients, drawn from patient databases at Great Ormond Street Hospital and Moorfields Eye Hospital, United Kingdom, participated in piloting of the draft 47-item VQoL instrument, which enabled preliminary item reduction. Subsequent administration of the instrument, alongside functional vision (FV) and generic health-related quality of life (HRQoL) self-report measures, to 101 children with VI comprising a nationally representative sample enabled further item reduction and evaluation of psychometric properties using Rasch analysis. Construct validity was assessed through Pearson correlation coefficients. Item reduction through piloting (8 items removed for skewness and individual item response pattern) and validation (1 item removed for skewness and 3 for misfit in Rasch) produced a 35-item scale, with fit values within acceptable limits, no notable differential item functioning, good measurement precision, ordered response categories and acceptable targeting in Rasch. The VQoL_CYP showed good construct validity, correlating strongly with HRQoL scores, moderately with FV scores but not with acuity. Robust child-appropriate self-report VQoL measures for children with VI are necessary for understanding the broader impacts of living with a visual disability, distinguishing these from limited functioning per se. Future planned use in larger patient samples will allow further psychometric development of the VQoL_CYP as an adjunct to objective outcomes assessment.
Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias
2018-04-10
To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Zampetakis, Leonidas A.; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G.; Moustakis, Vassilis
2017-01-01
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens’ and women’s entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women’s reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender. PMID:28386244
Zampetakis, Leonidas A; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G; Moustakis, Vassilis
2017-01-01
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens' and women's entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women's reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis.
Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén
2017-01-01
Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis
Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén
2017-01-01
Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument. PMID:28298898
Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J
2016-05-11
The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
Assessing cross-cultural validity of scales: a methodological review and illustrative example.
Beckstead, Jason W; Yang, Chiu-Yueh; Lengacher, Cecile A
2008-01-01
In this article, we assessed the cross-cultural validity of the Women's Role Strain Inventory (WRSI), a multi-item instrument that assesses the degree of strain experienced by women who juggle the roles of working professional, student, wife and mother. Cross-cultural validity is evinced by demonstrating the measurement invariance of the WRSI. Measurement invariance is the extent to which items of multi-item scales function in the same way across different samples of respondents. We assessed measurement invariance by comparing a sample of working women in Taiwan with a similar sample from the United States. Structural equation models (SEMs) were employed to determine the invariance of the WRSI and to estimate the unique validity variance of its items. This article also provides nurse-researchers with the necessary underlying measurement theory and illustrates how SEMs may be applied to assess cross-cultural validity of instruments used in nursing research. Overall performance of the WRSI was acceptable but our analysis showed that some items did not display invariance properties across samples. Item analysis is presented and recommendations for improving the instrument are discussed.
Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J
2016-07-01
Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
Use of multilevel logistic regression to identify the causes of differential item functioning.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
2010-11-01
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Dascălu, Cristina Gena; Antohe, Magda Ecaterina
2009-01-01
Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.
Tulsky, David S.; Jette, Alan; Kisala, Pamela A.; Kalpakjian, Claire; Dijkers, Marcel P.; Whiteneck, Gale; Ni, Pengsheng; Kirshblum, Steven; Charlifue, Susan; Heinemann, Allen W.; Forchheimer, Martin; Slavin, Mary; Houlihan, Bethlyn; Tate, Denise; Dyson-Hudson, Trevor; Fyffe, Denise; Williams, Steve; Zanca, Jeanne
2012-01-01
Objective To develop a comprehensive set of patient reported items to assess multiple aspects of physical functioning relevant to the lives of people with spinal cord injury (SCI) and to evaluate the underlying structure of physical functioning. Design Cross-sectional Setting Inpatient and community Participants Item pools of physical functioning were developed, refined and field tested in a large sample of 855 individuals with traumatic spinal cord injury stratified by diagnosis, severity, and time since injury Interventions None Main Outcome Measure SCI-FI measurement system Results Confirmatory factor analysis (CFA) indicated that a 5-factor model, including basic mobility, ambulation, wheelchair mobility, self care, and fine motor, had the best model fit and was most closely aligned conceptually with feedback received from individuals with SCI and SCI clinicians. When just the items making up basic mobility were tested in CFA, the fit statistics indicate strong support for a unidimensional model. Similar results were demonstrated for each of the other four factors indicating unidimensional models. Conclusions Though unidimensional or 2-factor (mobility and upper extremity) models of physical functioning make up outcomes measures in the general population, the underlying structure of physical function in SCI is more complex. A 5-factor solution allows for comprehensive assessment of key domain areas of physical functioning. These results informed the structure and development of the SCI-FI measurement system of physical functioning. PMID:22609299
Development and Validation of a Six-Item Version of the Interpersonal Dependency Inventory.
McClintock, Andrew S; McCarrick, Shannon M; Anderson, Timothy; Himawan, Lina; Hirschfeld, Robert
2017-04-01
The Interpersonal Dependency Inventory (IDI) is a frequently used, 48-item measure of maladaptive dependency. Our goal was to develop and psychometrically evaluate a very brief version of the IDI. An exploratory factor analysis of the IDI in Study 1 ( N = 838) yielded a six-item IDI (IDI-6), with three items loading on an emotional dependency factor (IDI-6-ED), and the other three items loading on a functional dependency factor (IDI-6-FD). This factor solution was validated by confirmatory factor analysis in Study 2 ( N = 916). The IDI-6-ED and IDI-6-FD demonstrated good convergent and divergent validity in Study 3 ( N = 100). In Study 4 ( N = 22-43), the IDI-6-ED and IDI-6-FD were generally stable over 4-week and 8-week intervals and were found to be responsive to the effects of psychological treatment. These results have implications for dependency conceptualizations and support the IDI-6 as a brief, psychometrically sound instrument.
Levis, Alexander W; Harel, Daphna; Kwakkenbos, Linda; Carrier, Marie-Eve; Mouthon, Luc; Poiraudeau, Serge; Bartlett, Susan J; Khanna, Dinesh; Malcarne, Vanessa L; Sauve, Maureen; van den Ende, Cornelia H M; Poole, Janet L; Schouffoer, Anne A; Welling, Joep; Thombs, Brett D
2016-11-01
To develop and validate a short form of the Cochin Hand Function Scale (CHFS), which measures hand disability, for use in systemic sclerosis, using objective criteria and reproducible techniques. Responses on the 18-item CHFS were obtained from English-speaking patients enrolled in the Scleroderma Patient-Centered Intervention Network Cohort. CHFS unidimensionality was verified using confirmatory factor analysis, and an item response theory model was fit to CHFS items. Optimal test assembly (OTA) methods identified a maximally precise short form for each possible form length between 1 and 17 items. The final short form selected was the form with the least number of items that maintained statistically equivalent convergent validity, compared to the full-length CHFS, with the Health Assessment Questionnaire (HAQ) disability index (DI) and the physical function domain of the 29-item Patient-Reported Outcomes Measurement Information System (PROMIS-29). There were 601 patients included. A 6-item short form of the CHFS (CHFS-6) was selected. The CHFS-6 had a Cronbach's alpha of 0.93. Correlations of the CHFS-6 summed score with HAQ DI (r = 0.79) and PROMIS-29 physical function (r = -0.54) were statistically equivalent to the CHFS (r = 0.81 and r = -0.56). The correlation with the full CHFS was high (r = 0.98). The OTA procedure generated a valid short form of the CHFS with minimal loss of information compared to the full-length form. The OTA method used was based on objective, prespecified criteria, but should be further studied for viability as a general procedure for shortening patient-reported outcome measures in health research. © 2016, American College of Rheumatology.
Cleanthous, Sophie; Strzok, Sara; Pompilus, Farrah; Cano, Stefan; Marquis, Patrick; Cohan, Stanley; Goldman, Myla D; Kresa-Reahl, Kiren; Petrillo, Jennifer; Castrillo-Viguera, Carmen; Cadavid, Diego; Chen, Shih-Yin
2018-01-01
ABILHAND, a manual ability patient-reported outcome instrument originally developed for stroke patients, has been used in multiple sclerosis clinical trials; however, psychometric analyses indicated the measure's limited measurement range and precision in higher-functioning multiple sclerosis patients. The purpose of this study was to identify candidate items to expand the measurement range of the ABILHAND-56, thus improving its ability to detect differences in manual ability in higher-functioning multiple sclerosis patients. A step-wise mixed methods design strategy was used, comprising two waves of patient interviews, a combination of qualitative (concept elicitation and cognitive debriefing) and quantitative (Rasch measurement theory) analytic techniques, and consultation interviews with three clinical neurologists specializing in multiple sclerosis. Original ABILHAND was well understood in this context of use. Eighty-two new manual ability concepts were identified. Draft supplementary items were generated and refined with patient and neurologist input. Rasch measurement theory psychometric analysis indicated supplementary items improved targeting to higher-functioning multiple sclerosis patients and measurement precision. The final pool of Early Multiple Sclerosis Manual Ability items comprises 20 items. The synthesis of qualitative and quantitative methods used in this study improves the ABILHAND content validity to more effectively identify manual ability changes in early multiple sclerosis and potentially help determine treatment effect in higher-functioning patients in clinical trials.
Catquest-9SF questionnaire: validation of Malay and Chinese-language versions using Rasch analysis.
Adnan, Tassha Hilda; Mohamed Apandi, Mokhlisoh; Kamaruddin, Haireen; Salowi, Mohamad Aziz; Law, Kian Boon; Haniff, Jamaiyah; Goh, Pik Pin
2018-01-05
Catquest questionnaire was originally developed in Swedish to measure patients' self-assessed visual function to evaluate the benefit of cataract surgery. The result of the Rasch analysis leading to the creation of the nine-item short form of Catquest, (Catquest-9SF), and it had been translated and validated in English. The aim is therefore to evaluate the translated Catquest-9SF questionnaire in Malay and Chinese (Mandarin) language version for measuring patient-reported visual function among cataract population in Malaysia. The English version of Catquest-9SF questionnaire was translated and back translated into Malay and Chinese languages. The Malay and Chinese translated versions were self-administered by 236 and 202 pre-operative patients drawn from a cataract surgery waiting list, respectively. The translated Catquest-9SF data and its four response options were assessed for fit to the Rasch model. The Catquest-9SF performed well in the Malay and Chinese translated versions fulfilling all criteria for valid measurement, as demonstrated by Rasch analysis. Both versions of questionnaire had ordered response thresholds, with a good person separation (Malay 2.84; and Chinese 2.59) and patient separation reliability (Malay 0.89; Chinese 0.87). Targeting was 0.30 and -0.11 logits in Malay and Chinese versions respectively, indicating that the item difficulty was well suited to the visual abilities of the patients. All items fit a single overall construct (Malay infit range 0.85-1.26, outfit range 0.73-1.13; Chinese infit range 0.80-1.51, outfit range 0.71-1.36), unidimensional by principal components analysis, and was free of Differential Item Functioning (DIF). These results support the good overall functioning of the Catquest-9SF in patients with cataract. The translated questionnaire to Malay and Chinese-language versions are reliable and valid in measuring visual disability outcomes in the Malaysian cataract population.
Calibration of the Dutch-Flemish PROMIS Pain Behavior item bank in patients with chronic pain.
Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B
2016-02-01
The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®
Should the SCOPA-COG be modified? A Rasch analysis perspective.
Forjaz, M J; Frades-Payo, B; Rodriguez-Blazquez, C; Ayala, A; Martinez-Martin, P
2010-02-01
The SCales for Outcomes in PArkinson's disease-Cognition (SCOPA-COG) is a specific measure of cognitive function for Parkinson's disease (PD) patients. Previous studies, under the frame of the classic test theory, indicate satisfactory psychometric properties. The Rasch model, an item response theory approach, provides new information about the scale, as well as results in a linear scale. This study aims at analysing the SCOPA-COG according to the Rasch model and, on the basis of results, suggesting modification to the SCOPA-COG. Fit to the Rasch model was analysed using a sample of 384 PD patients. A good fit was obtained after rescoring for disordered thresholds. The person separation index, a reliability measure, was 0.83. Differential item functioning was observed by age for three items and by gender for one item. The SCOPA-COG is a unidimensional measure of global cognitive function in PD patients, with good scale targeting and no empirical evidence for use of the subscale scores. Its adequate reliability and internal construct validity were supported. The SCOPA-COG, with the proposed scoring scheme, generates true linear interval scores.
Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie
2009-01-01
Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie
2017-05-01
Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Jafari, Peyman; Bagheri, Zahra; Hashemi, Seyyedeh Zahra; Shalileh, Keivan
2013-06-06
Limited studies have examined the effect of differential item functioning (DIF) on comparing health related quality of life (HRQoL) scores across child self-reports and parent proxy-reports. This study aims to determine whether parents and children respond differently to the items in the Persian version of the PedsQoLTM 4.0 measure. The PedsQLTM 4.0 Generic Core Scales was completed by 938 child-parent dyads. The graded response model (GRM) was used to detect DIF between parents and children. The IRT analyses were conducted using IRTPRO 2.1.On the whole, our findings showed that 50% (4 out of 8) of the items in the physical subscale and 40% (2 out of 5) in both emotional and school subscales were flagged with DIF. Among the DIF items, 62.5% (5 out of 8) were uniform and the remaining 37.5% (3 out of 8) were non-uniform. Parents and children interpret certain items of the PedsQLTM 4.0 in a different ways, except for the social subscale. Hence, we should be cautious about using parent proxy-report as a substitute for a child's ratings.
Selivanova, Alexandra; Shin, Hyun Joon; Miller, Joan W.; Jackson, Mary Lou
2018-01-01
Purpose Vision loss from age-related macular degeneration (AMD) has a profound effect on vision-related quality of life (VRQoL). The pupose of this study is to identify clinical factors associated with VRQoL using the Rasch- calibrated NEI VFQ-25 scales in bilateral advanced AMD patients. Methods We retrospectively reviewed 47 patients (mean age 83.2 years) with bilateral advanced AMD. Clinical assessment included age, gender, type of AMD, high contrast visual acuity (VA), history of medical conditions, contrast sensitivity (CS), central visual field loss, report of Charles Bonnet Syndrome, current treatment for AMD and Rasch-calibrated NEI VFQ-25 visual function and socioemotional function scales. The NEI VFQ visual function scale includes items of general vision, peripheral vision, distance vision and near vision-related activity while the socioemotional function scale includes items of vision related-social functioning, role difficulties, dependency, and mental health. Multiple regression analysis (structural regression model) was performed using fixed item parameters obtained from the one-parameter item response theory model. Results Multivariate analysis showed that high contrast VA and CS were two factors influencing VRQoL visual function scale (β = -0.25, 95% CI-0.37 to -0.12, p<0.001 and β = 0.35, 95% CI 0.25 to 0.46, p<0.001) and socioemontional functioning scale (β = -0.2, 95% CI -0.37 to -0.03, p = 0.023, and β = 0.3, 95% CI 0.18 to 0.43, p = 0.001). Central visual field loss was not assoicated with either VRQoL visual or socioemontional functioning scale (β = -0.08, 95% CI-0.28 to 0.12,p = 0.44 and β = -0.09, 95% CI -0.03 to 0.16, p = 0.50, respectively). Conclusion In patients with vision impairment secondary to bilateral advanced AMD, high contrast VA and CS are two important factors affecting VRQoL. PMID:29746512
Roh, Miin; Selivanova, Alexandra; Shin, Hyun Joon; Miller, Joan W; Jackson, Mary Lou
2018-01-01
Vision loss from age-related macular degeneration (AMD) has a profound effect on vision-related quality of life (VRQoL). The pupose of this study is to identify clinical factors associated with VRQoL using the Rasch- calibrated NEI VFQ-25 scales in bilateral advanced AMD patients. We retrospectively reviewed 47 patients (mean age 83.2 years) with bilateral advanced AMD. Clinical assessment included age, gender, type of AMD, high contrast visual acuity (VA), history of medical conditions, contrast sensitivity (CS), central visual field loss, report of Charles Bonnet Syndrome, current treatment for AMD and Rasch-calibrated NEI VFQ-25 visual function and socioemotional function scales. The NEI VFQ visual function scale includes items of general vision, peripheral vision, distance vision and near vision-related activity while the socioemotional function scale includes items of vision related-social functioning, role difficulties, dependency, and mental health. Multiple regression analysis (structural regression model) was performed using fixed item parameters obtained from the one-parameter item response theory model. Multivariate analysis showed that high contrast VA and CS were two factors influencing VRQoL visual function scale (β = -0.25, 95% CI-0.37 to -0.12, p<0.001 and β = 0.35, 95% CI 0.25 to 0.46, p<0.001) and socioemontional functioning scale (β = -0.2, 95% CI -0.37 to -0.03, p = 0.023, and β = 0.3, 95% CI 0.18 to 0.43, p = 0.001). Central visual field loss was not assoicated with either VRQoL visual or socioemontional functioning scale (β = -0.08, 95% CI-0.28 to 0.12,p = 0.44 and β = -0.09, 95% CI -0.03 to 0.16, p = 0.50, respectively). In patients with vision impairment secondary to bilateral advanced AMD, high contrast VA and CS are two important factors affecting VRQoL.
A mixed-effects regression model for longitudinal multivariate ordinal data.
Liu, Li C; Hedeker, Donald
2006-03-01
A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.
Development and validation of an instrument to assess job satisfaction in eye-care personnel.
Paudel, Prakash; Cronjé, Sonja; O'Connor, Patricia M; Khadka, Jyoti; Rao, Gullapalli N; Holden, Brien A
2017-11-01
The aim was to develop and validate an instrument to measure job satisfaction in eye-care personnel and assess the job satisfaction of one-year trained vision technicians in India. A pilot instrument for assessing job satisfaction was developed, based on a literature review and input from a public health expert panel. Rasch analysis was used to assess psychometric properties and to undertake an iterative item reduction. The instrument was then administered to vision technicians in vision centres of Andhra Pradesh in India. Associations between vision technicians' job satisfaction and factors such as age, gender and experience were analysed using t-test and one-way analysis of variance. Rasch analysis confirmed that the 15-item job satisfaction in eye-care personnel (JSEP) was a unidimensional instrument with good fit statistics, measurement precisions and absence of differential item functioning. Overall, vision technicians reported high rates of job satisfaction (0.46 logits). Age, gender and experience were not associated with high job satisfaction score. Item score analysis showed non-financial incentives, salary and workload were the most important determinants of job satisfaction. The 15-item JSEP instrument is a valid instrument for assessing job satisfaction among eye-care personnel. Overall, vision technicians in India demonstrated high rates of job satisfaction. © 2016 Optometry Australia.
Wong, Eric; Ungvari, Gabor S; Leung, Siu-Kau; Tang, Wai-Kwong
2007-01-01
Catatonic signs and symptoms are frequently observed in patients with chronic schizophrenia. Clinical surveys have suggested that the composition of catatonic syndrome occurring in chronic schizophrenia may be different from what is found in acute psychiatric disorders or medical conditions. Consequently, this patient population may need tailor-made rating instruments for catatonia. The aim of the present study was to examine the suitability and accuracy of using the Bush-Francis Catatonia Rating Scale (BFCRS) in chronic schizophrenia inpatients. The unidimensionality (optimal number of items; item fit), and the scoring scheme (the optimal number of scoring categories) of the BFCRS were determined in a random sample of 225 patients with chronic schizophrenia applying Rasch analysis. In addition, differential item functioning (DIF) analysis was also performed. The BFCRS proved to be unidimensional apart from three misfit and one marginally misfit items. The three misfit items were removed from the scale thereby constructing a revised version called BFCRS-R. Since the original BFCRS (BFCRS-O) showed no increase across items across steep gradients (poor endorsability of step calibrations), in BFCRS-R a binary scale ('absent' versus 'present' choices only) was constructed instead of the scoring scheme of 0-3. The 20-item BFCRS-R showed improved psychometric properties in that it had a higher item separation index than BFCRS-O. BFCRS-R mean logit was closer to zero indicating that the items on the scale and the subjects were better matched than in BFCRS-O. DIF analysis showed that certain items of both versions of BFCRS were influenced by the presence of negative symptoms. BFCRS-R is shorter and simpler than the original version and having better psychometric properties seems to be better suited for identifying and quantifying catatonia in chronic psychotic patients. Copyright (c) 2007 John Wiley & Sons, Ltd.
RhinAsthma patient perspective: A Rasch validation study.
Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara
2018-02-01
In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Independent Orbiter Assessment (IOA): Analysis of the DPS subsystem
NASA Technical Reports Server (NTRS)
Lowery, H. J.; Haufler, W. A.; Pietz, K. C.
1986-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis/Critical Items List (FMEA/CIL) is presented. The IOA approach features a top-down analysis of the hardware to independently determine failure modes, criticality, and potential critical items. The independent analysis results corresponding to the Orbiter Data Processing System (DPS) hardware are documented. The DPS hardware is required for performing critical functions of data acquisition, data manipulation, data display, and data transfer throughout the Orbiter. Specifically, the DPS hardware consists of the following components: Multiplexer/Demultiplexer (MDM); General Purpose Computer (GPC); Multifunction CRT Display System (MCDS); Data Buses and Data Bus Couplers (DBC); Data Bus Isolation Amplifiers (DBIA); Mass Memory Unit (MMU); and Engine Interface Unit (EIU). The IOA analysis process utilized available DPS hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. Due to the extensive redundancy built into the DPS the number of critical items are few. Those identified resulted from premature operation and erroneous output of the GPCs.
Correlates of cognitive function scores in elderly outpatients.
Mangione, C M; Seddon, J M; Cook, E F; Krug, J H; Sahagian, C R; Campion, E W; Glynn, R J
1993-05-01
To determine medical, ophthalmologic, and demographic predictors of cognitive function scores as measured by the Telephone Interview for Cognitive Status (TICS), an adaptation of the Folstein Mini-Mental Status Exam. A secondary objective was to perform an item-by-item analysis of the TICS scores to determine which items correlated most highly with the overall scores. Cross-sectional cohort study. The Glaucoma Consultation Service of the Massachusetts Eye and Ear Infirmary. 472 of 565 consecutive patients age 65 and older who were seen at the Glaucoma Consultation Service between November 1, 1987 and October 31, 1988. Each subject had a standard visual examination and review of medical history at entry, followed by a telephone interview that collected information on demographic characteristics, cognitive status, health status, accidents, falls, symptoms of depression, and alcohol intake. A multivariate linear regression model of correlates of TICS score found the strongest correlates to be education, age, occupation, and the presence of depressive symptoms. The only significant ocular condition that correlated with lower TICS score was the presence of surgical aphakia (model R2 = .46). Forty-six percent (216/472) of patients fell below the established definition of normal on the mental status scale. In a logistic regression analysis, the strongest correlates of an abnormal cognitive function score were age, diabetes, educational status, and occupational status. An item analysis using step-wise linear regression showed that 85 percent of the variance in the TICS score was explained by the ability to perform serial sevens and to repeat 10 items immediately after hearing them. Educational status correlated most highly with both of these items (Kendall Tau R = .43 and Kendall Tau R = .30, respectively). Education, occupation, depression, and age were the strongest correlates of the score on this new screening test for assessing cognitive status. These factors were stronger correlates of the TICS score than chronic medical conditions, visual loss, or medications. The Telephone Interview for Cognitive Status is a useful instrument, but it may overestimate the prevalence of dementia in studies with a high prevalence of persons with less than a high school education.
Lin, Ching-Hua; Yang, Wei-Cheng
2017-07-01
We aimed to compare the degree of symptom relief to psychosocial functional (abbreviated as "functional") improvement and explore the relationships between symptom relief and functional improvement during acute electroconvulsive therapy for patients with major depressive disorder. Major depressive disorder inpatients (n=130) requiring electroconvulsive therapy were recruited. Electroconvulsive therapy was generally performed for a maximum of 12 treatments. Symptom severity, using the 17-item Hamilton Depression Rating Scale, and psychosocial functioning (abbreviated as "functioning"), using the Modified Work and Social Adjustment Scale, were assessed before electroconvulsive therapy, after every 3 electroconvulsive therapy treatments, and after the final electroconvulsive therapy. Both 17-item Hamilton Depression Rating Scale and Modified Work and Social Adjustment Scale scores were converted to T-score units to compare the degrees of changes between depressive symptoms and functioning after electroconvulsive therapy. Structural equation modeling was used to test the relationships between 17-item Hamilton Depression Rating Scale and Modified Work and Social Adjustment Scale during acute electroconvulsive therapy. One hundred sixteen patients who completed at least the first 3 electroconvulsive therapy treatments entered the analysis. Reduction of 17-item Hamilton Depression Rating Scale T-scores was significantly greater than that of Modified Work and Social Adjustment Scale T-scores at assessments 2, 3, 4, and 5. The model analyzed by structural equation modeling satisfied all indices of goodness-of-fit (chi-square = 32.882, P =.107, TLI = 0.92, CFI = 0.984, RMSEA = 0.057). The 17-item Hamilton Depression Rating Scale change did not predict subsequent Modified Work and Social Adjustment Scale change. Functioning improved less than depressive symptoms during acute electroconvulsive therapy. Symptom reduction did not predict subsequent functional improvement. Depressive symptoms and functional impairment are distinct domains and should be assessed independently to accurately reflect the effectiveness of electroconvulsive therapy. © The Author 2017. Published by Oxford University Press on behalf of CINP.
Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Pereira, Gabriela Santos; Faria, Christina Danielli Coelho de Morais; Corrêa, João Carlos Ferrari
2018-01-01
Analyze the construct validity and internal consistency of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the ICF as well as analyze the ceiling and floor effects. One hundred subjects were analyzed: 85 community-dwelling and 15 institutionalized individuals. The analysis of construct validity was performed using classic psychometrics: (1) the comparison of known groups (individuals without restriction to participation vs. those with restriction to participation) using the Mann-Whitney test and (2) convergent validity - correlation between the scores on the SS-QOL items that address participation and the subscale scores of measures used to evaluate the similar constructs and concepts [the Short-Form Health Survey (SF-36), Functional Independence Measure (FIM) and grip strength test]. Spearman's correlation coefficients were calculated for this analysis. Cronbach's α was used for the analysis of internal consistency and both the ceiling and floor effects were analyzed. The level of significance for all analyses was α = 0.05. The a priori hypotheses regarding construct validity were partially demonstrated, as only five of the eight domains exhibited positive moderate to strong correlations (r > 0.40) with measures that address constructs similar to those addressed on the SS-QOL questionnaire. The items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The ceiling and floor effects were considered adequate for the total SS-QOL score, but beyond acceptable standards for some domains. The 26 items of the SS-QOL questionnaire measure a multidimensional construct and therefore do not only address participation. However, the items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. Implications for rehabilitation The 26 items of the SS-QOL questionnaire demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The present findings can guide healthcare professionals regarding the selection of an assessment tool for the evaluation of post-stroke participation. The findings can lead to consistent and standardization evaluations, which facilitates comparisons and discussion on functional health and social participation after stroke.
Prodinger, Birgit; Tennant, Alan; Stucki, Gerold; Cieza, Alarcos; Üstün, Tevfik Bedirhan
2016-10-01
Our aim was to specify the requirements of an architecture to serve as the foundation for standardized reporting of health information and to provide an exemplary application of this architecture. The World Health Organization's International Classification of Functioning, Disability and Health (ICF) served as the conceptual framework. Methods to establish content comparability were the ICF Linking Rules. The Rasch measurement model, as a special case of additive conjoint measurement, which satisfies the required criteria for fundamental measurement, allowed for the development of a common metric foundation for measurement unit conversion. Secondary analysis of data from the North Yorkshire Survey was used to illustrate these methods. Patients completed three instruments and the items were linked to the ICF. The Rasch measurement model was applied, first to each scale, and then to items across scales which were linked to a common domain. Based on the linking of items to the ICF, the majority of items were grouped into two domains, Mobility and Self-care. Analysis of the individual scales and of items linked to a common domain across scales satisfied the requirements of the Rasch measurement model. The measurement unit conversion between items from the three instruments linked to the Mobility and Self-care domains, respectively, was demonstrated. The realization of an ICF-based architecture for information on patients' functioning enables harmonization of health information while allowing clinicians and researchers to continue using their existing instruments. This architecture will facilitate access to comprehensive and consistently reported health information to serve as the foundation for informed decision-making. © The Author(s) 2016.
Screening Test Items for Differential Item Functioning
ERIC Educational Resources Information Center
Longford, Nicholas T.
2014-01-01
A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Gärtner, F R; Nieuwenhuijsen, K; van Dijk, F J H; Sluiter, J K
2012-02-01
Common mental disorders (CMD) negatively affect work functioning. In the health service sector not only the prevalence of CMDs is high, but work functioning problems are associated with a risk of serious consequences for patients and healthcare providers. If work functioning problems due to CMDs are detected early, timely help can be provided. Therefore, the aim of this study is to develop a detection questionnaire for impaired work functioning due to CMDs in nurses and allied health professionals working in hospitals. First, an item pool was developed by a systematic literature study and five focus group interviews with employees and experts. To evaluate the content validity, additional interviews were held. Second, a cross-sectional assessment of the item pool in 314 nurses and allied health professionals was used for item selection and for identification and corroboration of subscales by explorative and confirmatory factor analysis. The study results in the Nurses Work Functioning Questionnaire (NWFQ), a 50-item self-report questionnaire consisting of seven subscales: cognitive aspects of task execution, impaired decision making, causing incidents at work, avoidance behavior, conflicts and irritations with colleagues, impaired contact with patients and their family, and lack of energy and motivation. The questionnaire has a proven high content validity. All subscales have good or acceptable internal consistency. The Nurses Work Functioning Questionnaire gives insight into precise and concrete aspects of impaired work functioning of nurses and allied health professionals. The scores can be used as a starting point for purposeful interventions.
Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka
2013-10-01
Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
47 CFR 0.21 - Functions of the Office.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 1 2014-10-01 2014-10-01 false Functions of the Office. 0.21 Section 0.21... Planning and Policy Analysis § 0.21 Functions of the Office. The Office of Strategic Planning and Policy..., position papers, proposed Commission actions, or other agenda items as appropriate; (g) To manage the...
47 CFR 0.21 - Functions of the Office.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 1 2012-10-01 2012-10-01 false Functions of the Office. 0.21 Section 0.21... Planning and Policy Analysis § 0.21 Functions of the Office. The Office of Strategic Planning and Policy..., position papers, proposed Commission actions, or other agenda items as appropriate; (g) To manage the...
47 CFR 0.21 - Functions of the Office.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 1 2013-10-01 2013-10-01 false Functions of the Office. 0.21 Section 0.21... Planning and Policy Analysis § 0.21 Functions of the Office. The Office of Strategic Planning and Policy..., position papers, proposed Commission actions, or other agenda items as appropriate; (g) To manage the...
Using Loss Functions for DIF Detection: An Empirical Bayes Approach.
ERIC Educational Resources Information Center
Zwick, Rebecca; Thayer, Dorothy; Lewis, Charles
2000-01-01
Studied a method for flagging differential item functioning (DIF) based on loss functions. Builds on earlier research that led to the development of an empirical Bayes enhancement to the Mantel-Haenszel DIF analysis. Tested the method through simulation and found its performance better than some commonly used DIF classification systems. (SLD)
Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A
2018-04-01
A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
An Experimental Analysis of Memory Processing
Wright, Anthony A
2007-01-01
Rhesus monkeys were trained and tested in visual and auditory list-memory tasks with sequences of four travel pictures or four natural/environmental sounds followed by single test items. Acquisitions of the visual list-memory task are presented. Visual recency (last item) memory diminished with retention delay, and primacy (first item) memory strengthened. Capuchin monkeys, pigeons, and humans showed similar visual-memory changes. Rhesus learned an auditory memory task and showed octave generalization for some lists of notes—tonal, but not atonal, musical passages. In contrast with visual list memory, auditory primacy memory diminished with delay and auditory recency memory strengthened. Manipulations of interitem intervals, list length, and item presentation frequency revealed proactive and retroactive inhibition among items of individual auditory lists. Repeating visual items from prior lists produced interference (on nonmatching tests) revealing how far back memory extended. The possibility of using the interference function to separate familiarity vs. recollective memory processing is discussed. PMID:18047230
Llamas-Ramos, Inés; Llamas-Ramos, Rocío; Buz, José; Cortés-Rodríguez, María; Martín-Nogueras, Ana María
2018-06-01
The Memorial Symptom Assessment Scale (MSAS) is a self-rating instrument for the assessment of symptom distress in cancer patients. The Spanish version of the MSAS has recently been validated. However, we lack evidence of the internal construct validity of the shorter versions (short form [MSAS-SF] and condensed form [CMSAS]). In addition, rigorous testing of these scales with modern psychometric methods is needed. The aim of this study was to evaluate the internal construct validity and reliability of the Spanish versions of the MSAS-SF and CMSAS in oncology outpatients using Rasch analysis. Data from a convenience sample of oncology outpatients receiving chemotherapy (n = 306; mean age 60 years; 63% women) at a university hospital were analyzed. The Rasch unidimensional measurement model was used to examine response category functioning, item hierarchy, targeting, unidimensionality, reliability, and differential item functioning by age, gender, and marital status. The response category structure of the symptom distress items was improved by collapsing two categories. The scales were adequately targeted to the study patients, showed overall Rasch model fit (mean Infit MnSq ranged from 0.98 to 1.05), met criteria for unidimensionality, and the reliability of scores was good (person reliability > 0.80), except for the CMSAS prevalence scale. Only four items showed differential item functioning. The present study demonstrated that the Spanish versions of the MSAS-SF and CMSAS have adequate psychometric properties to evaluate symptom distress in oncology outpatients. Additional studies of the CMSAS are recommended. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Abdulelah, Juman; Sulaiman, Syed Azhar Syed; Hassali, Mohamed A; Blebil, Ali Q; Awaisu, Ahmed; Bredle, Jason M
2015-05-01
Various generic instruments exist to assess health-related quality of life (HRQOL) in patients with tuberculosis (TB), but a psychometrically sound disease-specific instrument is lacking. The present study aimed to develop and psychometrically validate a multidimensional TB-specific HRQOL instrument relevant to the value of patients with pulmonary TB in Iraq with an eye toward cross-cultural application. The core general HRQOL questionnaire is composed of the Functional Assessment of Cancer Therapy-General items. A modular approach was followed for the development of the Functional Assessment of Chronic Illness Therapy-Tuberculosis (FACIT-TB) questionnaire in which a set of items assessing quality-of-life (QOL) issues not sufficiently covered by the core Functional Assessment of Cancer Therapy-General items, but considered to be relevant to the target population, was added. Moreover, principal-component analysis was used to determine the new subscale structure of the questionnaire. In addition to the 27 items of the core questionnaire, a set of 20 items referring to disease symptoms related to the site of infection, adverse effects, and additional QOL dimensions such as fatigue, social stigma, and economic burden of the illness was included. Factor analysis demonstrated that the FACIT-TB construct comprised five domains. A rigorous method was applied in the development of the FACIT-TB measure to fully understand the impact of TB on patients' QOL. The instrument is psychometrically sound and portrays multiple important dimensions of HRQOL. FACIT-TB is relatively brief, is easy to administer and score, and is appropriate for use in clinical trials and practice. Copyright © 2015 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Rasch analysis of the hospital anxiety and depression scale among Chinese cataract patients.
Lin, Xianchai; Chen, Ziyan; Jin, Ling; Gao, Wuyou; Qu, Bo; Zuo, Yajing; Liu, Rongjiao; Yu, Minbin
2017-01-01
To analyze the validity of the Hospital Anxiety and Depression Scale (HADS) among Chinese cataract population. A total of 275 participants with unilateral or bilateral cataract were recruited to complete the Chinese version of HADS. The patients' demographic and ophthalmic characteristics were documented. Rasch analysis was conducted to examine the model fit statistics, the thresholds ordering of the polytomous items, targeting, person separation index and reliability, local dependency, unidimentionality, differential item functioning (DIF) and construct validity of the HADS individual and summary measures. Rasch analysis was performed on anxiety and depression subscales as well as HADS-Total score respectively. The items of original HADS-Anxiety, HADS-Depression and HADS-Total demonstrated evidence of misfit of the Rasch model. Removing items A7 for anxiety subscale and rescoring items D14 for depression subscale significantly improved Rasch model fit. A 12-item higher order total scale with further removal of D12 was found to fit the Rasch model. The modified items had ordered response thresholds. No uniform DIF was detected, whereas notable non-uniform DIF in high-ability group was found. The revised cut-off points were given for the modified anxiety and depression subscales. The modified version of HADS with HADS-A and HADS-D as subscale and HADS-T as a higher-order measure is a reliable and valid instrument that may be useful for assessing anxiety and depression states in Chinese cataract population.
Küçükdeveci, Ayse A; Sahin, Hülya; Ataman, Sebnem; Griffiths, Bridget; Tennant, Alan
2004-02-15
Guidelines have been established for cross-cultural adaptation of outcome measures. However, invariance across cultures must also be demonstrated through analysis of Differential Item Functioning (DIF). This is tested in the context of a Turkish adaptation of the Health Assessment Questionnaire (HAQ). Internal construct validity of the adapted HAQ is assessed by Rasch analysis; reliability, by internal consistency and the intraclass correlation coefficient; external construct validity, by association with impairments and American College of Rheumatology functional stages. Cross-cultural validity is tested through DIF by comparison with data from the UK version of the HAQ. The adapted version of the HAQ demonstrated good internal construct validity through fit of the data to the Rasch model (mean item fit 0.205; SD 0.998). Reliability was excellent (alpha = 0.97) and external construct validity was confirmed by expected associations. DIF for culture was found in only 1 item. Cross-cultural validity was found to be sufficient for use in international studies between the UK and Turkey. Future adaptation of instruments should include analysis of DIF at the field testing stage in the adaptation process.
A rasch analysis of the Manchester foot pain and disability index
Muller, Sara; Roddy, Edward
2009-01-01
Background There is currently no interval-level measure of foot-related disability and this has hampered research in this area. The Manchester Foot Pain and Disability Index (FPDI) could potentially fill this gap. Objective To assess the fit of the three subscales (function, pain, appearance) of the FPDI to the Rasch unidimensional measurement model in order to form interval-level scores. Methods A two-stage postal survey at a general practice in the UK collected data from 149 adults aged 50 years and over with foot pain. The 17 FPDI items, in three subscales, were assessed for their fit to the Rasch model. Checks were carried out for differential item functioning by age and gender. Results The function and pain items fit the Rasch model and interval-level scores can be constructed. There were too few people without extreme scores on the appearance subscale to allow fit to the Rasch model to be tested. Conclusion The items from the FPDI function and pain subscales can be used to obtain interval level scores for these factors for use in future research studies in older adults. Further work is needed to establish the interval nature of these subscale scores in more diverse populations and to establish the measurement properties of these interval-level scores. PMID:19878536
Jäger, B; Schmid-Ott, G; Ernst, G; Dölle-Lange, E; Sack, M
2012-06-01
The aim of this study was to construct and validate a short self-rating questionnaire for the assessment of ego functions and ability of self regulation. An item pool of 120 items covering 6 postulated dimensions was reduced by two steps in independent samples (n = 136 + 470) via factor and item analyses to the final version consisting of 35 items. The 5 resulting questionnaire scales "interpersonal disturbances", "frustration tolerance and impulse control", "identity disturbances", "affect differentiation and affect tolerance" and "self-esteem" were well interpretable and showed in confirmatory factor analysis the best fit to the data (CHI²/df = 3.48; RMSEA = 0.73). Total scores were found to differentiate well between diagnostic groups of patients with more or less ego pathology (FANOVA = 9.8; df = 11; p < 0.001), thus proving good concurrent validity. Reliability was shown by testing internal consistency and test-retest correlations. The "Hannover self-regulation questionnaire" (HSRQ) evidently is an appropriate and reliable screening instrument in order to assess ego functions and capacities of self regulation in an economic and user-friendly means. The scale structure allows differentiated diagnostics of weak vs. stable ego functions and may be used for detailed therapy planning. © Georg Thieme Verlag KG Stuttgart · New York.
Scientific literacy: Factor structure and gender differences
NASA Astrophysics Data System (ADS)
Manhart, James Joseph
The purpose of this study was to investigate the factor structure of scientific literacy and to document any gender differences with respect to each factor. Participants included 1139 students (574 females, 565 males) in grades 9 through 12 who were taking a science class at one of four Midwestern high schools. Based on National Science Education Standards, a 100 item multiple-choice test was constructed to assess scientific literacy. Confirmatory factor analysis of item parcels suggested a three factor model was the best way to explain the data resulting from the administration of this test. The factors were labeled constructs of science, abilities necessary to do scientific inquiry, and social aspects of science. Gender differences with respect to these factors were examined using analysis of variance procedures. Because differential enrollment in science classes could cause gender differences in grades 11 and 12, parallel analyses were conducted on the grades 9 and 10 subsample and the grades 11 and 12 subsample. However, the results of the two analyses were similar. The most consistent gender difference observed was that females performed better than males on the social aspects of science factor. Males tended to perform better than females on the constructs of science factor, although no consistent gender difference was noted for items dealing with life science. With respect to the abilities necessary to do scientific inquiry factor, females tended to perform better than males in grades 9 and 10, while no consistent gender difference was observed in grades 11 and 12. Gender differences were also examined using the Mantel-Haenszel procedure to flag individual items that functioned differently for females and males of the same ability. Twelve items were flagged for grades 9 and 10 (8 in favor of females, 4 in favor of males). Fourteen items were flagged for grades 11 and 12 (7 in favor of females, 7 in favor of males). All of the flagged items exhibited only small to moderate differential item functioning (DIF). Only three items were similarly flagged in both subsamples, one item from each factor.
Huang, Wenhao; Chapman-Novakofski, Karen M
2017-01-01
Background The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. Objective The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps’ educational quality and technical functionality. Methods Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Results Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Conclusions Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps’ qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. PMID:29079554
2017-01-01
Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Caronni, Antonio; Zaina, Fabio; Negrini, Stefano
2014-04-01
Scoliosis Research Society-22 (SRS-22) questionnaire was developed to evaluate health-related quality of life (HRQL) in adolescent idiopathic scoliosis (AIS) patients. Rasch analysis (RA) is a statistical procedure which turns questionnaire ordinal scores into interval measures. Measures from Rasch-compatible questionnaires can be used, similar to body temperature or blood pressure, to quantify disease severity progression and treatment efficacy. Purpose of the current work is to present Rasch analysis (RA) of the SRS-22 questionnaire and to develop an SRS-22 Rasch-approved short form. 300 SRS-22 were randomly collected from 2447 consecutive IS adolescents at their first evaluation (229 females; 13.9 ± 1.9 years; 26.9 ± 14.7 Cobb°) in a scoliosis outpatient clinic. RA showed both disordered thresholds and overall misfit of the SRS-22. Sixteen items were re-scored and two misfitting items (6 and 14) removed to obtain a Rasch-compatible questionnaire. Participants HRQL measured too high with the rearranged questionnaire, indicating a severe SRS-22 ceiling effect. RA also highlighted SRS-22 multidimensionality, with pain/function not merging with self-image/mental health items. Item 3 showed differential item functioning (DIF) for both curve and hump amplitude. A 7-item questionnaire (SRS-7) was prepared by selecting single items from the original SRS-22. SRS-7 showed fit to the model, unidimensionality and no DIF. Compared with the SRS-22, the short form scale shows better targeting of the participants' population. RA shows that SRS-22 has poor clinimetric properties; moreover, when used with AIS at first evaluation, SRS-22 is affected by a severe ceiling effect. SRS-7, an SRS-22 7-item short form questionnaire, provides an HRQL interval measure better tailored to these participants. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ohashi, Y; Tashiro, K; Itoyama, Y; Nakano, I; Sobue, G; Nakamura, S; Sumino, S; Yanagisawa, N
2001-04-01
Amyotrophic lateral sclerosis(ALS) is progressive, degenerative, fatal disease of the motor neuron. No efficacious therapy is available to slow the progressive loss of function, but several new approaches including neurotrophic factors, antioxidants and glutamate antagonists, are currently being evaluated as potential therapies. Mortality, and/or time to tracheostomy, muscle strength and pulmonary function are used as primary endpoints in clinical trials for treatment of ALS. The effect of new therapies on the quality of patients' lives are also important, so we sought to develop a rating scale to measure it. The revised ALS Functional Rating Scale(ALSFRS-R), which has addition of items to ALSFRS to enhance the ability to assess respiratory symptoms, is an assessment determining the degree of impairment in ALS patients' abilities to function independently in activities of daily living. It consists of 12 items to evaluate bulbar function, motor function and respiratory function and each item is scored from 0(unable) to 4(normal). We translated the English score into Japanese one with minor modification considering the inter cultural difference. And we examined reliability of the translated scale. As a measure of reliability, the intraclass correlation coefficient(ICC) was evaluated for total score and the Kappa coefficient proposed by Cohen and Kraemer was calculated for each item. Moreover, we examined sensitivity to clinical change over time and carried out the factor analysis to analyze the factorial structure. The subjects were 27 ALS patients and each was scored twice for reliability or three times for sensitivity by 2 to 5 neurologists and if possible, nurses. The ICC for total score was 0.97(95% C. I.; 0.94-0.98). Extension of the Kappa coefficients were 0.48 to 1.00 for inter-rater reliability and the averaged Kappa coefficients were 0.63 to 1.00 for intra rater reliability, respectively. Concerning the factorial structure, the contribution of the first factor(the first principal component) were 53.5% principal factor solution. The factor loadings of items were 0.52-0.91 except "salivation" and this factor almost equal to the simple sum of all items was interpreted as the general degree of deterioration. The promax votation revealed the riginally supposed factor structure with 3 factors(groups of items): neuromuscuclar function, respiratory function and bulbar function. The rating scale correlated with Global clinical impression of change(GCIC) scored by neurologists and declined with time, indicating its sensitivity to change. On the bases of these results, ALSFRS-R(Japanese version) is considered to be highly reliable enough for clinical use.
Rasch Analysis of the Edmonton Symptom Assessment System.
Sprague, Emma; Siegert, Richard J; Medvedev, Oleg; Roberts, Margaret H
2018-05-01
The Edmonton Symptom Assessment System (ESAS) is a widely used multisymptom assessment tool in cancer and palliative care settings, but its psychometric properties have not been widely tested using modern psychometric methods such as Rasch analysis. To apply Rasch analysis to the ESAS in a community palliative care setting and determine its suitability for assessing symptom burden in this group. ESAS data collected from 229 patients enrolled in a community hospice service were evaluated using a partial credit Rasch model with RUMM2030 software (RUMM Laboratory Pty, Ltd., Duncraig, WA). Where disordered thresholds were discovered, item rescoring was undertaken. Rasch model fit and differential item functioning were evaluated after each iterative phase. Uniform rescoring was necessary for all 12 items to display ordered thresholds. The best model fit was achieved after item rescoring and combining three pairs of locally dependent items into three superitems (χ 2 = 29.56 [27]; P = 0.33) that permitted ordinal-to-interval conversion. The ESAS satisfied unidimensional Rasch model expectations in a 12-item format after minor modifications. This included uniform rescoring of the disordered response categories and creating superitems to improve model fit and clinical utility. The accuracy of the ESAS scores can be improved by using ordinal-to-interval conversion tables published in the article. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara
2014-06-01
Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Fajrianthi; Zein, Rizqy Amelia
2017-01-01
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Doğanay Erdoğan, Beyza; Elhan, Atilla Halİl; Kaskatı, Osman Tolga; Öztuna, Derya; Küçükdeveci, Ayşe Adile; Kutlay, Şehim; Tennant, Alan
2017-10-01
This study aimed to explore the potential of an inclusive and fully integrated measurement system for the Activities component of the International Classification of Functioning, Disability and Health (ICF), incorporating four classical scales, including the Health Assessment Questionnaire (HAQ), and a Computerized Adaptive Testing (CAT). Three hundred patients with rheumatoid arthritis (RA) answered relevant questions from four questionnaires. Rasch analysis was performed to create an item bank using this item pool. A further 100 RA patients were recruited for a CAT application. Both real and simulated CATs were applied and the agreement between these CAT-based scores and 'paper-pencil' scores was evaluated with intraclass correlation coefficient (ICC). Anchoring strategies were used to obtain a direct translation from the item bank common metric to the HAQ score. Mean age of 300 patients was 52.3 ± 11.7 years; disease duration was 11.3 ± 8.0 years; 74.7% were women. After testing for the assumptions of Rasch analysis, a 28-item Activities item bank was created. The agreement between CAT-based scores and paper-pencil scores were high (ICC = 0.993). Using those HAQ items in the item bank as anchoring items, another Rasch analysis was performed with HAQ-8 scores as separate items together with anchoring items. Finally a conversion table of the item bank common metric to the HAQ scores was created. A fully integrated and inclusive health assessment system, illustrating the Activities component of the ICF, was built to assess RA patients. Raw score to metric conversions and vice versa were available, giving access to the metric by a simple look-up table. © 2015 Asia Pacific League of Associations for Rheumatology and Wiley Publishing Asia Pty Ltd.
ERIC Educational Resources Information Center
Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.
2015-01-01
The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…
Sauers, Eric L; Bay, R Curtis; Snyder Valier, Alison R; Ellery, Traci; Huxel Bliven, Kellie C
2017-03-01
Upper extremity (UE) region-specific, patient-reported outcome (PRO) scales assess injuries to the UE but do not account for the demands of overhead throwing athletes or measure patient-oriented domains of health-related quality of life (HRQOL). To develop the Functional Arm Scale for Throwers (FAST), a UE region-specific and population-specific PRO scale that assesses multiple domains of disablement in throwing athletes with UE injuries. In stage I, a beta version of the scale was developed for subsequent factor identification, final item reduction, and construct validity analysis during stage II. Descriptive laboratory study. Three-stage scale development was utilized: Stage I (item generation and initial item reduction) and stage II (factor analysis, final item reduction, and construct validity) are reported herein, and stage III (establishment of measurement properties [reliability and validity]) will be reported in a companion paper. In stage I, a beta version was developed, incorporating National Center for Medical Rehabilitation Research disablement domains and ensuring a blend of sport-related and non-sport-related items. An expert panel and focus group assessed importance and interpretability of each item. During stage II, the FAST was reduced, preserving variance characteristics and factor structure of the beta version and construct validity of the final FAST scale. During stage I, a 54-item beta version and a separate 9-item pitcher module were developed. During stage II, a 22-item FAST and 9-item pitcher module were finalized. The factor solution for FAST scale items included pain (n = 6), throwing (n = 10), activities of daily living (n = 5), psychological impact (n = 4), and advancement (n = 3). The 6-item pain subscale crossed factors. The remaining subscales and pitcher module are distinctive, correlated, and internally consistent and may be interpreted individually or combined. This article describes the development of the FAST, which assesses clinical outcomes and HRQOL of throwing athletes after UE injury. The FAST encompasses multiple domains of disability and demonstrates excellent construct validity. The FAST provides a single UE region-specific and population-specific PRO scale for high-demand throwers to facilitate measurement of impact of UE injuries on HRQOL and clinical outcomes while quantifying recovery for comparative effectiveness studies.
The Manual Work Instability Scale: development and validation.
Gilworth, G; Smyth, M G; Smith, J; Tennant, A
2016-06-01
Increasing awareness of the burden of absenteeism and reduced performance at work highlights the importance of early identification of individuals experiencing work instability (WI), a mismatch between functional and cognitive abilities and job demands. To develop and validate a screening questionnaire to measure WI in manual workers. Questionnaire items were generated via qualitative interviews with manual workers and a draft survey instrument was completed by workers in a variety of fields. Rasch analysis was used interactively to assess the psychometric aspects of the emerging scale, including unidimensionality and absence of item bias (differential item functioning). A total of 17 qualitative interviews generated 110 potential items for the questionnaire. The item set resolved to a 25-item scale, which satisfied model expectations (item residual mean = -0.13, SD = 1.04; person residual mean = -0.29, SD = 0.75), had good reliability (alpha = 0.86) and strict unidimensionality (t-test 7.5% confidence interval 3.8-11.2). The Manual Work Instability Scale is a short psychometrically robust questionnaire based on the concept of WI, which incorporates both musculoskeletal symptoms and relevant psychosocial factors. It may prove effective in screening and identifying WI in workers in predominantly physical occupations. © The Author 2016. Published by Oxford University Press on behalf of the Society of Occupational Medicine. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Pilatti, Angelina; Lozano, Oscar M; Cyders, Melissa A
2015-12-01
The present study was aimed at determining the psychometric properties of the Spanish version of the UPPS-P Impulsive Behavior Scale in a sample of college students. Participants were 318 college students (36.2% men; mean age = 20.9 years, SD = 6.4 years). The psychometric properties of this Spanish version were analyzed using the Rasch model, and the factor structure was examined using confirmatory factor analysis. The verification of the global fit of the data showed adequate indexes for persons and items. The reliability estimates were high for both items and persons. Differential item functioning across gender was found for 23 items, which likely reflects known differences in impulsivity levels between men and women. The factor structure of the Spanish version of the UPPS-P replicates previous work with the original UPPS-P Scale. Overall, results suggest that test scores from the Spanish version of the UPPS-P show adequate psychometric properties to accurately assess the multidimensional model of impulsivity, which represents the most exhaustive measure of this construct. (c) 2015 APA, all rights reserved).
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Sheehan, David V; Mancini, Michele; Wang, Jianing; Berggren, Lovisa; Cao, Haijun; Dueñas, Héctor José; Yue, Li
2016-01-01
We compared functional impairment outcomes assessed with Sheehan Disability Scale (SDS) after treatment with duloxetine versus selective serotonin reuptake inhibitors (SSRIs) in patients with major depressive disorder. Data were pooled from four randomized studies comparing treatment with duloxetine and SSRIs (three double blind and one open label). Analysis of covariance, with last-observation-carried-forward approach for missing data, explored treatment differences between duloxetine and SSRIs on SDS changes during 8 to 12 weeks of acute treatment for the intent-to-treat population. Logistic regression analysis examined the predictive capacity of baseline patient characteristics for remission in functional impairment (SDS total score ≤ 6 and SDS item scores ≤ 2) at endpoint. Included were 2193 patients (duloxetine n = 1029; SSRIs n = 835; placebo n = 329). Treatment with duloxetine and SSRIs resulted in significantly (p < 0.01) greater improvements in the SDS total score versus treatment with placebo. Higher SDS (p < 0.0001) or 17-item Hamilton Depression Rating Scale baseline scores (p < 0.01) predicted lower probability of functional improvement after treatment with duloxetine or SSRIs. Female gender (p ≤ 0.05) predicted higher probability of functional improvement after treatment with duloxetine or SSRIs. Treatment with SSRIs and duloxetine improved functional impairment in patients with major depressive disorder. Higher SDS or 17-item Hamilton Depression Rating Scale baseline scores predicted less probability of SDS improvement; female gender predicted better improvement in functional impairment at endpoint. © 2015 The Authors. Human Psychopharmacology: Clinical and Experimental published by John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, Edward, E-mail: Edward.Chow@sunnybrook.c; James, Jennifer; Barsevick, Andrea
Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up.more » Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.« less
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Doostfatemeh, Marziyeh; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2015-08-01
In child-parent agreement studies in the field of paediatric health-related quality of life (HRQoL), little attention has been paid to the effect of gender in parental proxy rating of children's HRQoL. This study aims to test the potential interchangeability of parent dyads in reporting children's HRQoL on both item and scale levels of the PedsQL™ 4.0 instrument, using the approach of differential item functioning (DIF). The PedsQL™ 4.0 Generic Core Scales were completed by 576 father-and-mother dyads. A polytomous item response theory model, graded response model, was used to detect DIF across fathers and mothers. Assessment at item level showed that fathers and mothers perceived the meaning of items of the PedsQL™ 4.0 consistently. Regarding the scale level, a moderate to high level of agreement was observed between mothers' and fathers' reports on all similar subscales. Although the significant mean score differences in total, physical and emotional functioning indicated that fathers gave higher scores to their children, the small effect size implied that this difference may not be practically meaningful. Our findings revealed that discrepancy in parent dyads in rating children's HRQoL is a "real" difference and not an artefact due to measurement non-invariance. Fathers were seen to have slightly different insights into their children, especially for emotional functioning, but overall the results were not all that different. This suggests that paternal proxy-reports can be included in studies along with maternal proxy-reports, and the two may be combined when looking at parent-child agreement. Parent-child agreement studies in Iran are not affected by parents' gender, and therefore, researchers may rely on the assumption of the interchangeability of fathers and mothers in these studies.
Luebbe, Aaron M; Mancini, Kathryn J; Kiel, Elizabeth J; Spangler, Brooke R; Semlak, Julie L; Fussner, Lauren M
2016-08-24
The current study tests the underlying structure of a multidimensional construct of helicopter parenting (HP), assesses reliability of the construct, replicates past relations of HP to poor emotional functioning, and expands the literature to investigate links of HP to emerging adults' decision-making and academic functioning. A sample of 377 emerging adults (66% female; ages 17-30; 88% European American) were administered several items assessing HP as well as measures of other parenting behaviors, depression, anxiety, decision-making style, grade point average, and academic functioning. Exploratory factor analysis results suggested a four-factor, 23-item measure that encompassed varying levels of parental involvement in the personal and professional lives of their children. A bifactor model was also fit to the data and suggested the presence of a reliable overarching HP factor in addition to three reliable subfactors. The fourth subfactor was not reliable and item variances were subsumed by the general HP factor. HP was found to be distinct from, but correlated in expected ways with, other reports of parenting behavior. HP was also associated with poorer functioning in emotional functioning, decision making, and academic functioning. Parents' information-seeking behaviors, when done in absences of other HP behaviors, were associated with better decision making and academic functioning. © The Author(s) 2016.
Gerrard, Paul
2013-01-01
Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J
2017-11-01
Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis.
ERIC Educational Resources Information Center
Zwick, Rebecca; Thayer, Dorothy T.; Lewis, Charles
1999-01-01
Developed an empirical Bayes enhancement to Mantel-Haenszel (MH) analysis of differential item functioning (DIF) in which it is assumed that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. (Author/SLD)
Sharafi, Zahra
2017-01-01
Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463
Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2017-01-01
The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
ERIC Educational Resources Information Center
Ahmadi, Alireza; Bazvand, Ali Darabi
2016-01-01
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…
Reynolds, Nicholas A; Ski, Chantal F; McEvedy, Samantha M; Thompson, David R; Cameron, Jan
2018-02-14
The aim of this study was to psychometrically evaluate the Heart Failure Screening Tool (Heart-FaST) via: (1) examination of internal construct validity; (2) testing of scale function in accordance with design; and (3) recommendation for change/s, if items are not well adjusted, to improve psychometric credential. Self-care is vital to the management of heart failure. The Heart-FaST may provide a prospective assessment of risk, regarding the likelihood that patients with heart failure will engage in self-care. Psychometric validation of the Heart-FaST using Rasch analysis. The Heart-FaST was administered to 135 patients (median age = 68, IQR = 59-78 years; 105 males) enrolled in a multidisciplinary heart failure management program. The Heart-FaST is a nurse-administered tool for screening patients with HF at risk of poor self-care. A Rasch analysis of responses was conducted which tested data against Rasch model expectations, including whether items serve as unbiased, non-redundant indicators of risk and measure a single construct and that rating scales operate as intended. The results showed that data met Rasch model expectations after rescoring or deleting items due to poor discrimination, disordered thresholds, differential item functioning, or response dependence. There was no evidence of multidimensionality which supports the use of total scores from Heart-FaST as indicators of risk. Aggregate scores from this modified screening tool rank heart failure patients according to their "risk of poor self-care" demonstrating that the Heart-FaST items constitute a meaningful scale to identify heart failure patients at risk of poor engagement in heart failure self-care. © 2018 John Wiley & Sons Ltd.
Gopinath, Bamini; Russell, Joanna; Flood, Victoria M; Burlutsky, George; Mitchell, Paul
2014-02-01
Nutritional parameters could influence self-perceived health and functional status of older adults. We prospectively determined the association between diet quality and quality of life and activities of daily living. This was an observational cohort study in which total diet scores, reflecting adherence to dietary guidelines, were determined. Dietary intakes were assessed using a food frequency questionnaire at baseline. Total diet scores were allocated for intake of selected food groups and nutrients for each participant as described in the Australian Guide to Healthy Eating. Higher scores indicated closer adherence to dietary guidelines. In Sydney, Australia, 1,305 and 895 participants (aged ≥ 55 years) with complete data were examined over 5 and 10 years, respectively. The 36-Item Short-Form Survey assesses quality of life and has eight subscales representing dimensions of health and well-being; higher scores reflect better quality of life. Functional status was determined once at the 10-year follow-up by the Older Americans Resources and Services activities of daily living scale. This scale has 14 items: seven items assess basic activities of daily living (eg, eating and walking) and seven items assess instrumental activities of daily living (eg, shopping or housework). Normalized 36-Item Short-Form Survey component scores were used in analysis of covariance to calculate multivariable adjusted mean scores. Logistic regression analysis was used to calculate adjusted odds ratios and 95% CIs to demonstrate the association between total diet score with the 5-year incidence of impaired activities of daily living. Participants in the highest vs lowest quartile of baseline total diet scores had adjusted mean scores 5.6, 4.0, 5.3, and 2.6 units higher in these 36-Item Short-Form Survey domains 5 years later: physical function (P trend=0.003), general health (P trend=0.02), vitality (P trend=0.001), and physical composite score (P trend=0.003), respectively. Participants in the highest vs lowest quartile of baseline total diet scores had 50% reduced risk of impaired instrumental activites of daily living at follow-up (multivariable-adjusted P trend=0.03). Higher diet quality was prospectively associated with better quality of life and functional ability. Copyright © 2014 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Rasch analysis of the Trypophobia Questionnaire.
Imaizumi, Shu; Tanno, Yoshihiko
2018-02-14
This study aimed to assess Rasch-based psychometric properties of the Trypophobia Questionnaire measuring proneness to trypophobia, which refers to disgust and unpleasantness induced by the observation of clusters of objects (e.g., lotus seed pods). Rasch analysis was performed on data from 582 healthy Japanese adults. The results suggested that Trypophobia Questionnaire has a unidimensional structure with ordered response categories and sufficient person and item reliabilities, and that it does not have differential item functioning across sexes and age groups, whereas the targeting of the scale leaves room for improvements. When items that did not fit the Rasch model were removed, the shortened version showed slightly improved psychometric properties. However, results were not conclusive in determining whether the full or shortened version is better for practical use. Further assessment and validation are needed.
Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions
Liharska, Lora; Harvey, Philip D.; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S.E.
2017-01-01
Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms. PMID:29410935
Optimising mobility outcome measures in Huntington's disease.
Busse, Monica; Quinn, Lori; Khalil, Hanan; McEwan, Kirsten
2014-01-01
Many of the performance-based mobility measures that are currently used in Huntington's disease (HD) were developed for assessment in other neurological conditions such as stroke. We aimed to assess the individual item-response of commonly used performance-based mobility measures, with a view to optimizing the scales for specific application in Huntington's Disease (HD). Data from a larger multicentre, observational study were used. Seventy-five people with HD (11 pre-manifest & 64 manifest) were assessed on the Six-Minute Walk Test, 10-Meter Walk Test, Timed "Up & Go" Test (TUG), Berg Balance Scale (BBS), Physical Performance Test (PPT), Four Square Step Test, and Tinetti Mobility Test (TMT). The Unified Huntington's Disease Rating Scale (UHDRS) Total Motor Score, Functional Assessment Scale and Total Functional Capacity scores were recorded, alongside cognitive measures. Standard regression analysis was used to assess predictive validity. Individual item responses were investigated using a sequence of approaches to allow for gradual removal of items and the subsequent creation of shortened versions. Psychometric properties (reliability and discriminant ability) of the shortened scales were assessed. TUG (β 0.46, CI 0.20-3.47), BBS (β -0.35, CI -2.10-0.14), and TMT (β -0.45, CI -3.14-0.64) were good disease-specific mobility measures. PPT was the best measure of functional performance (β 0.42, CI 0.00-0.43 for TFC & β 0.57 CI 0.15-0.81 for FAS). Shortened versions of BBS and TMT were developed based on item analysis. The resultant BBS and TMT shortened scales were reliable for use in manifest HD. ROC analysis showed that shortened scales were able to discriminate between manifest and pre-manifest disease states. Our data suggests that the PPT is appropriate as a general measure of function in individuals with HD, and we have identified shortened versions of the BBS and TMT that measure the unique gait and balance impairments in HD. These scales, alongside the TUG, may therefore be important measures to consider in future clinical trials.
Logistics Reduction and Repurposing Beyond Low Earth Orbit
NASA Technical Reports Server (NTRS)
Ewert, Michael K.; Broyan, James L., Jr.
2012-01-01
All human space missions, regardless of destination, require significant logistical mass and volume that is strongly proportional to mission duration. Anything that can be done to reduce initial mass and volume of supplies or reuse items that have been launched will be very valuable. Often, the logistical items require disposal and represent a trash burden. Logistics contributions to total mission architecture mass can be minimized by considering potential reuse using systems engineering analysis. In NASA's Advanced Exploration Systems "Logistics Reduction and Repurposing Project," various tasks will reduce the intrinsic mass of logistical packaging, enable reuse and repurposing of logistical packaging and carriers for other habitation, life support, crew health, and propulsion functions, and reduce or eliminate the nuisance aspects of trash at the same time. Repurposing reduces the trash burden and eliminates the need for hardware whose function can be provided by use of spent logistical items. However, these reuse functions need to be identified and built into future logical systems to enable them to effectively have a secondary function. These technologies and innovations will help future logistics systems to support multiple exploration missions much more efficiently.
Detecting Differential Person Functioning in Emotional Intelligence
ERIC Educational Resources Information Center
Alsmadi, Yahia M.; Alsmadi, Abdalla A.
2009-01-01
Differential Item Functioning (DIF) is a widely used term in test development literature. It is very important to analyze test's data for DIF because It is a serious threat to validity. If the same data matrix was transposed, similar analysis can be carried for Differential Person Functioning (DPF). The purpose of this paper is to introduce and…
Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel
2016-01-01
The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel
2016-01-01
The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Validation and reliability of the VF-14 questionnaire in a German population.
Chiang, Peggy Pei-Chia; Fenwick, Eva; Marella, Manjula; Finger, Robert; Lamoureux, Ecosse
2011-11-21
To evaluate the validity, reliability, and measurement characteristics of the Visual Function 14 (VF-14) in a German sample using Rasch analysis. This was a clinic-based, cross-sectional study with 184 patients with low vision recruited from an outpatient clinic at a German eye hospital. Participants underwent a clinical examination and completed the German VF-14 scale. The validity of the VF-14 scale was assessed using Rasch analysis. The main outcome measure was the overall functional score provided by the VF-14. After collapsing two response categories for items 13 and 14, the VF-14 scale satisfied fundamental criteria to achieve fit to the Rasch model, namely, ordered thresholds, the ability to distinguish between different strata of participant ability, absence of misfitting items, no evidence of unidimensionality, and no significant differential item functioning for key sociodemographic covariates. The VF-14 is able to discriminate between participants with different levels of vision impairment and across different cultural groups. The VF-14 is a valid, reliable, and unidimensional questionnaire for use in a German population. These findings contribute to the growing evidence base for second generation patient reported outcome measures in ophthalmology, and support the use of the German VF-14 in tertiary eye clinics in Germany to capture the impact of visual impairment on visual function from the patient's perspective and to inform low vision rehabilitation and interventions.
Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L
2015-07-01
The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Development and validation of a vision-specific quality-of-life questionnaire for Timor-Leste.
du Toit, Rènée; Palagyi, Anna; Ramke, Jacqueline; Brian, Garry; Lamoureux, Ecosse L
2008-10-01
To develop and determine the reliability and validity of a vision-specific quality-of-life instrument (TL-VSQOL) designed to assess the impact of distance and near vision impairment in adults living in Timor-Leste. A vision-specific quality-of-life questionnaire was developed, piloted, and administered to 704 Timorese aged >or=40 years during a population-based eye health rapid assessment. Rasch analysis was performed on the data of 457 participants with presenting near vision worse than N8 (78.5%) and/or distance vision worse than 6/18 (69.8%). Unidimensionality, item fit to the model, response category performance, differential item functioning, and targeting of items to participants were assessed. Initially, the questionnaire lacked fit to the Rasch model. Removal of two items concerning emotional well-being resulted in a fit of the data (overall item-trait interaction: chi(2) (df) = 81 (51); mean (SD) person and item fit residual values: -0.30 (1.02) and -0.32 (1.46), and good targeting of person ability and item difficulty was evident. Poorer distance and near visual acuities were significantly associated with worse quality-of-life scores (P < 0.001). Person separation reliability was substantial (0.93), indicating that the instrument can discriminate between groups with normal and impaired vision. All 17 items were free of differential item functioning, and there was no evidence of multidimensionality. This 17-item TL-VSQOL has high reliability, construct, and criterion validity and effective targeting. It can effectively assess the impact on quality of life of adult Timorese with distance and near vision impairment. The TL-VSQOL could be adapted for use in other low-resource settings.
Answering the call: a tool that measures functional breast cancer literacy.
Williams, Karen Patricia; Templin, Thomas N; Hines, Resche D
2013-01-01
There is a need for health care providers and health care educators to ensure that the messages they communicate are understood. The purpose of this research was to test the reliability and validity, in a culturally diverse sample of women, of a revised Breast Cancer Literacy Assessment Tool (Breast-CLAT) designed to measure functional understanding of breast cancer in English, Spanish, and Arabic. Community health workers verbally administered the 35-item Breast-CLAT to 543 Black, Latina, and Arab American women. A confirmatory factor analysis using a 2-parameter item response theory model was used to test the proposed 3-factor Breast-CLAT (awareness, screening and knowledge, and prevention and control). The confirmatory factor analysis using a 2-parameter item response theory model had a good fit (TLI = .91, RMSEA = .04) to the proposed 3-factor structure. The total scale reliability ranged from .80 for Black participants to .73 for total culturally diverse sample. The three subscales were differentially predictive of family history of cancer. The revised Breast-CLAT scales demonstrated internal consistency reliability and validity in this multiethnic, community-based sample.
Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E
2009-11-01
The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.
NASA Technical Reports Server (NTRS)
Brown, K. L.; Bertsch, P. J.
1986-01-01
Results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Electrical Power Generation (EPG)/Fuel Cell Powerplant (FCP) hardware. The EPG/FCP hardware is required for performing functions of electrical power generation and product water distribution in the Orbiter. Specifically, the EPG/FCP hardware consists of the following divisions: (1) Power Section Assembly (PSA); (2) Reactant Control Subsystem (RCS); (3) Thermal Control Subsystem (TCS); and (4) Water Removal Subsystem (WRS). The IOA analysis process utilized available EPG/FCP hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Ayala, Alba; Bilbao, Amaia; Garcia-Perez, Sonia; Escobar, Antonio; Forjaz, Maria João
2018-03-01
The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) measures the quality of life of patients with osteoarthritis (OA), and there is a specific scale for the physical functioning dimension, the short version with seven items WOMAC-pf. This study describes the application of the Rasch model to explore scale invariance and response stability of the WOMAC-pf short version across affected joint and over time. A sample of 884 patients with OA, from 15 hospitals in Spain, completed the WOMAC-pf before surgery (baseline) and at 3, 6 and 12 months post-surgery of hip or knee. The invariance by joint was explored through the differential item functioning (DIF) analysis of the Rasch model using baseline data, and time stability (DIF by time) were evaluated in stack data (each participant is represented four times, one by time point). Mean age of the patients was of 69.13 years (SD 10.01), 59.3% of them were women (n = 524), 59.2% had knee OA (n = 523) and 40.8% hip OA (n = 361). Item "putting on socks" showed DIF by joint and time. Fit to the Rasch model using stack data improved when this item was removed. Good reliability for individual use, local independency and unidimensionality of the models were confirmed. WOMAC-pf 7-item short version was invariant over time and joint when item "putting on socks" was removed. Researchers should carefully evaluate this item as it presents problems in scale invariance and stability, which could affect results when comparing data by joint or when computing change scores.
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
ERIC Educational Resources Information Center
Tay, Louis; Vermunt, Jeroen K.; Wang, Chun
2013-01-01
We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Rasch analysis of the Rosenberg Self-Esteem Scale with African Americans.
Chao, Ruth Chu-Lien; Vidacovich, Courtney; Green, Kathy E
2017-03-01
Effectively diagnosing African Americans' self-esteem has posed an unresolved challenge. To address this assessment issue, we conducted exploratory factor analysis and Rasch analysis to assess the psychometric characteristics of the Rosenberg Self-Esteem Scale (RSES, Rosenberg, 1965) for African American college students. The dimensional structure of the RSES was first identified with the first subsample (i.e., calibration subsample) and then held up under cross-validation with a second subsample (i.e., validation subsample). Exploratory factor analysis and Rasch analysis both supported unidimensionality of the measure, with that finding replicated for a random split of the sample. Response scale use was generally appropriate, items were endorsed at a high level reflecting high levels of self-esteem, and person separation and reliability of person separation were adequate, and reflected results similar to those found in prior research. However, as some categories were infrequently used, we also collapsed scale points and found a slight improvement in scale and item indices. No differential item functioning was found by sex or having received professional assistance versus not; there were no mean score differences by age group, marital status, or year in college. Two items were seen as problematic. Implications for theory and research on multicultural mental health are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Chang, Chih-Cheng; Su, Jian-An; Tsai, Ching-Shu; Yen, Cheng-Fang; Liu, Jiun-Horng; Lin, Chung-Ying
2015-06-01
To examine the psychometrics of the Affiliate Stigma Scale using rigorous psychometric analysis: classical test theory (CTT) (traditional) and Rasch analysis (modern). Differential item functioning (DIF) items were also tested using Rasch analysis. Caregivers of relatives with mental illness (n = 453; mean age: 53.29 ± 13.50 years) were recruited from southern Taiwan. Each participant filled out four questionnaires: Affiliate Stigma Scale, Rosenberg Self-Esteem Scale, Beck Anxiety Inventory, and one background information sheet. CTT analyses showed that the Affiliate Stigma Scale had satisfactory internal consistency (α = 0.85-0.94) and concurrent validity (Rosenberg Self-Esteem Scale: r = -0.52 to -0.46; Beck Anxiety Inventory: r = 0.27-0.34). Rasch analyses supported the unidimensionality of three domains in the Affiliate Stigma Scale and indicated four DIF items (affect domain: 1; cognitive domain: 3) across gender. Our findings, based on rigorous statistical analysis, verified the psychometrics of the Affiliate Stigma Scale and reported its DIF items. We conclude that the three domains of the Affiliate Stigma Scale can be separately used and are suitable for measuring the affiliate stigma of caregivers of relatives with mental illness. Copyright © 2015 Elsevier Inc. All rights reserved.
Schultz-Larsen, Kirsten; Kreiner, Svend; Lomholt, Rikke Kirstine
2007-03-01
This study published in two companion papers assesses properties of the Mini-Mental State Examination (MMSE) with the purpose of improving the efficiencies of the methods of screening for cognitive impairment and dementia. An item analysis by conventional and mixed Rasch models was used to explore empirically derived cognitive dimensions of the MMSE, to assess item bias, and to construct diagnostic cut-points. The scores of 1,189 elderly residents were analyzed. Two dimensions of cognitive function, which are statistically and conceptually different from those obtained in previous studies, were derived. The corresponding sum scales were (1) age-correlated MMSE scale (A-MMSE scale: orientation to time, attention/calculation, naming, repetition, and three-stage command) and (2) non-age-correlated MMSE scale (B-MMSE scale: orientation to place, registration, recall, reading, and copying). The "writing" item was not included due to differential effects of age and sex. The analysis also showed that the study sample consisted of two cognitively different groups of elderly. The findings indicate that a two-scale solution is a stable and statistically supported framework for interpreting data obtained by means of the MMSE. Supplementary analyses are presented in the companion paper to explore the performance of this item response theory calibration as a screening test for dementia.
Initial constructs for patient-centered outcome measures to evaluate brain-computer interfaces
Andresen, Elena M.; Fried-Oken, Melanie; Peters, Betts; Patrick, Donald L.
2016-01-01
Purpose The authors describe preliminary work toward the creation of patient-centered outcome (PCO) measures to evaluate brain-computer interface (BCI) as an assistive technology for individuals with severe speech and physical impairments (SSPI). Method In Phase 1, 591 items from 15 existing measures were mapped to the International Classification of Functioning, Disability and Health (ICF). In Phase 2, qualitative interviews were conducted with eight people with SSPI and seven caregivers. Resulting text data were coded in an iterative analysis. Results Most items (79%) mapped to the ICF environmental domain; over half (53%) mapped to more than one domain. The ICF framework was well suited for mapping items related to body functions and structures, but less so for items in other areas, including personal factors. Two constructs emerged from qualitative data: Quality of Life (QOL) and Assistive Technology. Component domains and themes were identified for each. Conclusions Preliminary constructs, domains, and themes were generated for future PCO measures relevant to BCI. Existing instruments are sufficient for initial items but do not adequately match the values of people with SSPI and their caregivers. Field methods for interviewing people with SSPI were successful, and support the inclusion of these individuals in PCO research. PMID:25806719
Snowden, Austyn; Watson, Roger; Stenhouse, Rosie; Hale, Claire
2015-12-01
To examine the construct validity of the Trait Emotional Intelligence Questionnaire Short form. Emotional intelligence involves the identification and regulation of our own emotions and the emotions of others. It is therefore a potentially useful construct in the investigation of recruitment and retention in nursing and many questionnaires have been constructed to measure it. Secondary analysis of existing dataset of responses to Trait Emotional Intelligence Questionnaire Short form using concurrent application of Rasch analysis and confirmatory factor analysis. First year undergraduate nursing and computing students completed Trait Emotional Intelligence Questionnaire-Short Form in September 2013. Responses were analysed by synthesising results of Rasch analysis and confirmatory factor analysis. Participants (N = 938) completed Trait Emotional Intelligence Questionnaire Short form. Rasch analysis showed the majority of the Trait Emotional Intelligence Questionnaire-Short Form items made a unique contribution to the latent trait of emotional intelligence. Five items did not fit the model and differential item functioning (gender) accounted for this misfit. Confirmatory factor analysis revealed a four-factor structure consisting of: self-confidence, empathy, uncertainty and social connection. All five misfitting items from the Rasch analysis belonged to the 'social connection' factor. The concurrent use of Rasch and factor analysis allowed for novel interpretation of Trait Emotional Intelligence Questionnaire Short form. Much of the response variation in Trait Emotional Intelligence Questionnaire Short form can be accounted for by the social connection factor. Implications for practice are discussed. © 2015 John Wiley & Sons Ltd.
Rasch Analysis of the Student Refractive Error and Eyeglass Questionnaire
Crescioni, Mabel; Messer, Dawn H.; Warholak, Terri L.; Miller, Joseph M.; Twelker, J. Daniel; Harvey, Erin M.
2014-01-01
Purpose To evaluate and refine a newly developed instrument, the Student Refractive Error and Eyeglasses Questionnaire (SREEQ), designed to measure the impact of uncorrected and corrected refractive error on vision-related quality of life (VRQoL) in school-aged children. Methods. A 38 statement instrument consisting of two parts was developed: Part A relates to perceptions regarding uncorrected vision and Part B relates to perceptions regarding corrected vision and includes other statements regarding VRQoL with spectacle correction. The SREEQ was administered to 200 Native American 6th through 12th grade students known to have previously worn and who currently require eyeglasses. Rasch analysis was conducted to evaluate the functioning of the SREEQ. Statements on Part A and Part B were analyzed to examine the dimensionality and constructs of the questionnaire, how well the items functioned, and the appropriateness of the response scale used. Results Rasch analysis suggested two items be eliminated and the measurement scale for matching items be reduced from a 4-point response scale to a 3-point response scale. With these modifications, categorical data were converted to interval level data, to conduct an item and person analysis. A shortened version of the SREEQ was constructed with these modifications, the SREEQ-R, which included the statements that were able to capture changes in VRQoL associated with spectacle wear for those with significant refractive error in our study population. Conclusions While the SREEQ Part B appears to be a have less than optimal reliability to assess the impact of spectacle correction on VRQoL in our student population, it is also able to detect statistically significant differences from pretest to posttest on both the group and individual levels to show that the instrument can assess the impact that glasses have on VRQoL. Further modifications to the questionnaire, such as those included in the SREEQ-R, could enhance its functionality. PMID:24811844
Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.
ERIC Educational Resources Information Center
Wang, Ning; Lane, Suzanne
This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…
Brod, Meryl; Højbjerre, Lise; Adalsteinsson, Johan Erpur; Rasmussen, Michael Højby
2014-04-01
Approximately 50 000 adults in the United States are diagnosed with GH deficiency, which has negative impacts on cognitive functioning, psychological well-being, and quality of life. This paper presents development and validation of a patient-reported outcome measure (PRO), the Treatment-Related Impact Measure-Adult Growth Hormone Deficiency (TRIM-AGHD). The TRIM-AGHD was developed to measure the impact of GH deficiency and its treatment. The development and validation of the TRIM-AGHD was conducted according to the Food and Drug Administration guidance on the development of PROs. Concept elicitation, conducted in three countries included interviews with patients, clinical experts, and literature review. Qualitative data were analyzed based on grounded theory principles, and draft items were cognitively debriefed. The measure underwent psychometric validation in a US clinic-based population. An a priori statistical analysis plan included assessment of the measurement model, reliability, and validity. Item functioning was reviewed using item response theory analyses. Forty-eight patients and six clinical experts participated in concept elicitation and 169 patients completed the validation study. TRIM-AGHD was measured. Factor analysis resulted in four domains: energy level, physical health, emotional health, and cognitive ability. The item response theory confirmed adequate item fit and placement within their domain. Internal consistency ranged from 0.82 to 0.95 and test-retest ranged from 0.80 to 0.92. All prespecified hypotheses for convergent validity and all but two for discriminant validity were met. The final 26-item TRIM-AGHD can be considered a reliable and valid PRO of the impact of disease and treatment for adult GH deficiency.
Vision and Quality of Life Index: validation of the Indian version using Rasch analysis.
Gothwal, Vijaya K; Bagga, Deepak K
2013-07-18
A multi-attribute utility instrument (MAUI) consists of a descriptive system in which the items and responses seek information about a concept of the universe of health-related quality of life (QoL), and responses to these items then are weighted and combined to produce the index. To our knowledge, the 6-item Vision and Quality of Life Index (VisQoL) is the only available vision-related MAUI, developed and validated in Australia, specifically for visually impaired (VI) populations. To our knowledge, the psychometric properties of the VisQoL have not yet been investigated in an Indian VI sample; this was the aim of our study. The Indian VisQoL was administered to 349 VI adults face-to-face by a trained interviewer at the Vision Rehabilitation Centres of a tertiary eye care facility, South India. Rasch analysis was used to assess the psychometric properties. Rescoring was necessary for all except one item before ordered thresholds were obtained. All items fit the Rasch model and unidimensionality was confirmed. Person separation was acceptable (2.01), indicating that the instrument can discriminate among three strata of participants" vision-related QoL (VRQoL). The VisQoL items were targeted substantially to the participants" VRQoL (-0.69 logits). One item ("ability to have friendships") demonstrated large differential item functioning by work status; working participants reported the item to be more difficult (-1.13 logits) relative to other items when compared to the nonworking participants. The 6-item Indian VisQoL satisfies unidimensional Rasch model expectations in VI patients. Disordering of response categories was evident; replication is required before a common rescoring option should be considered.
Lúcio, Patrícia Silva; Cogo-Moreira, Hugo; Puglisi, Marina; Polanczyk, Guilherme Vanoni; Little, Todd D
2017-11-01
The present study investigated the psychometric properties of the Raven's Colored Progressive Matrices (CPM) test in a sample of preschoolers from Brazil ( n = 582; age: mean = 57 months, SD = 7 months; 46% female). We investigated the plausibility of unidimensionality of the items (confirmatory factor analysis) and differential item functioning (DIF) for sex and age (multiple indicators multiple causes method). We tested four unidimensional models and the one with the best-fit index was a reduced form of the Raven's CPM. The DIF analysis was carried out with the reduced form of the test. A few items presented DIF (two for sex and one for age), confirming that the Raven's CPM items are mostly measurement invariant. There was no effect of sex on the general factor, but increasing age was associated with higher values of the g factor. Future research should indicate if the reduced form is suitable for evaluating the general ability of preschoolers.
LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G
2015-04-01
Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar
2013-11-01
To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Discriminant Analysis of Gross and Fine Motor Proficiency Data.
ERIC Educational Resources Information Center
Broadhead, Geoffrey D.; Church, Gabie E.
1982-01-01
Handicapped and nonhandicapped students were administered the Bruininks-Oseretsky Test of Motor Proficiency to determine regular or specially designed physical education placement. Two of the three functions on the test were significant, indicating usefulness in placement. Fewer than half the test items for each function contributed discriminatory…
Improving measures of work-related physical functioning.
McDonough, Christine M; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E; Marino, Molly E; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M; Chan, Leighton
2017-03-01
To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Newly developed questions were administered to 3532 recent SSA applicants for work disability benefits and 2025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to the existing WD-FAB, and computer-adaptive test simulations were conducted. Factor and IRT analyses supported integration of 44 new items into three existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants; identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability.
Improving Measures of Work-Related Physical Functioning
McDonough, Christine M.; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E.; Marino, Molly E.; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E.; Jette, Alan M; Chan, Leighton
2016-01-01
Purpose To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Methods Newly developed questions were administered to 3,532 recent SSA applicants for work disability benefits and 2,025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to existing WD-FAB, and computer-adaptive test simulations were conducted. Results Factor and IRT analyses supported integration of 44 new items into 3 existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. Conclusions The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants, identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability. PMID:28005243
KWICgrouper--Designing a Tool for Corpus-Driven Concordance Analysis
ERIC Educational Resources Information Center
O'Donnell, Matthew Brook
2008-01-01
The corpus-driven analysis of concordance data often results in the identification of groups of lines in which repeated patterns around the node item establish membership in a particular function meaning group (Mahlberg, 2005). This paper explains the KWICgrouper, a concept designed to support this kind of concordance analysis. Groups are defined…
Gomes, Áurea K V; Diniz, Leandro F M; Lage, Guilherme M; de Miranda, Débora M; de Paula, Jonas J; Costa, Danielle; Albuquerque, Maicon R
2017-01-01
Impulsivity has mainly been described as a negative or dysfunctional characteristic associated with several disorders. However, impulsivity is not only related to dysfunctional outcomes and may explain individual differences in optimal human functioning as well. The Dickman Impulsivity Inventory (DII) is a self-report instrument measuring both the dysfunctional and the functional aspects of impulsivity. In this study, we performed the translation and cultural adaptation of the DII to the Brazilian context and analyzed its psychometric properties. Translation and cultural adaptation followed a rigorous process, which relied on an expert panel in the cross-cultural adaptation of psychological instruments. Data from 405 undergraduate students were obtained for the Brazilian version of the DII (Br-DII). The 23 items of the Br-DII was considered unsuitable according to model fit indices of the Confirmatory Factor Analysis (both for Oblique and Orthogonal models). Exploratory Factor Analysis showed an 18 items version of the Br-DII to be suitable (CFI = 0.92; TLI = 0.90, and RMSEA = 0.057). The DII's 18 items version also showed adequate Cronbach's alpha, intraclass correlation coefficient, and convergent and discriminant validity with the BIS-11. Therefore, the Br-DII demonstrated reliability and validity in the measurement of functional and dysfunctional impulsivity.
Rasch analysis of the patient-rated wrist evaluation questionnaire.
Esakki, Saravanan; MacDermid, Joy C; Vincent, Joshua I; Packham, Tara L; Walton, David; Grewal, Ruby
2018-01-01
The Patient-Rated Wrist Evaluation (PRWE) was developed as a wrist joint specific measure of pain and disability and evidence of sound validity has been accumulated through classical psychometric methods. Rasch analysis (RA) has been endorsed as a newer method for analyzing the clinical measurement properties of self-report outcome measures. The purpose of this study was to evaluate the PRWE using Rasch modeling. We employed the Rasch model to assess overall fit, response scaling, individual item fit, differential item functioning (DIF), local dependency, unidimensionality and person separation index (PSI). A convenience sample of 382 patients with distal radius fracture was recruited from the hand and upper limb clinic at large academic healthcare organization, London, Ontario, Canada, 6-month post-injury scores of the PRWE was used. RA was conducted on the 3 subscales (pain, specific activities, and usual activities) of the PRWE separately. The pain subscale adequately fit the Rasch model when item 4 "Pain - When it is at its worst" was deleted to eliminate non-uniform DIF by age group, and item 5 "How often do you have pain" was rescored by collapsing into 8 intervals to eliminate disordered thresholds. Uniform DIF for "Use my affected hand to push up from the chair" (by work status) and "Use bathroom tissue with my affected hand" (by injured hand) was addressed by splitting the items for analysis. After background rescoring of 2 items in pain subscale, 2 items in specific activities and 3 items in usual activities, all three subscales of the PRWE were well targeted and had high reliability (PSI = 0.86). These changes provided a unidimensional, interval-level scaled measure. Like a previous analysis of the Patient-Rated Wrist and Hand Evaluation, this study found the PRWE could be fit to the Rasch model with rescoring of multiple items. However, the modifications required to achieve fit were not the same across studies, our fit statistics also suggested one of the pain items should be deleted. This study adds to the pool of evidence supporting the PRWE, but cannot confidently provide a Rasch-based scoring algorithm.
Hockenberry, S L; Billingham, R E
1987-12-01
Two hundred twenty-five [corrected] respondents (109 [corrected] heterosexuals and 116 [corrected] homosexuals) completed a survey containing a 20-item Boyhood Gender Conformity Scale (BGCS). This scale was largely composed of edited and abridged gender items from Part A of Freund et al.'s Feminine Gender Identity Scale (FGIS-A) and Whitam's "childhood indicators." The combined scale was developed in an attempt to obtain a reliable, valid, and potent discriminating instrument for accurately classifying adult male respondents for sexual orientation on the basis of their reported boyhood gender conformity or nonconforming behavior and identity. In addition, 33% of these respondents were administered the original FGIS-A and Whitam inventory during a 2-week test-retest analysis conducted to determine the validity and reliability of the new instrument. All the original items significantly discriminated between heterosexual and homosexual respondents. From these a 13-item function and a 5-item function proved to be the most powerful discriminators between the two groups. Significant correlations between each of the three scales and a very high test-retest correlation coefficient supported the reliability and validity assumption for the BGCS. The conclusion was made that the five-item function (playing with boys, preferring [corrected] boys' games, imagining self as sports figure, reading adventure and sports stories, considered a "sissy") was the most potent and parsimonious discriminator among adult males for sexual orientation. It was similarly noted that the absence of masculine behaviors and traits appeared to be a more powerful predictor of later homosexual orientation than the traditionally feminine or cross-sexed traits and behaviors.
Vegetable parenting practices scale. Item response modeling analyses
Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom
2015-01-01
Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
Development of autonomous grasping and navigating robot
NASA Astrophysics Data System (ADS)
Kudoh, Hiroyuki; Fujimoto, Keisuke; Nakayama, Yasuichi
2015-01-01
The ability to find and grasp target items in an unknown environment is important for working robots. We developed an autonomous navigating and grasping robot. The operations are locating a requested item, moving to where the item is placed, finding the item on a shelf or table, and picking the item up from the shelf or the table. To achieve these operations, we designed the robot with three functions: an autonomous navigating function that generates a map and a route in an unknown environment, an item position recognizing function, and a grasping function. We tested this robot in an unknown environment. It achieved a series of operations: moving to a destination, recognizing the positions of items on a shelf, picking up an item, placing it on a cart with its hand, and returning to the starting location. The results of this experiment show the applicability of reducing the workforce with robots.
Vivat, B; Young, T E; Winstanley, J; Arraras, J I; Black, K; Boyle, F; Bredart, A; Costantini, A; Guo, J; Irarrazaval, M E; Kobayashi, K; Kruizinga, R; Navarro, M; Omidvari, S; Rohde, G E; Serpentini, S; Spry, N; Van Laarhoven, H W M; Yang, G M
2017-11-01
The EORTC Quality of Life Group has just completed the final phase (field-testing and validation) of an international project to develop a stand-alone measure of spiritual well-being (SWB) for palliative cancer patients. Participants (n = 451)-from 14 countries on four continents; 54% female; 188 Christian; 50 Muslim; 156 with no religion-completed a provisional 36-item measure of SWB plus the EORTC QLQ-C15-PAL (PAL), then took part in a structured debriefing interview. All items showed good score distribution across response categories. We assessed scale structure using principal component analysis and Rasch analysis, and explored construct validity, and convergent/divergent validity with the PAL. Twenty-two items in four scoring scales (Relationship with Self, Relationships with Others, Relationship with Someone or Something Greater, and Existential) explained 53% of the variance. The measure also includes a global SWB item and nine other items. Scores on the PAL global quality-of-life item and Emotional Functioning scale weakly-moderately correlated with scores on the global SWB item and two of the four SWB scales. This new validated 32-item SWB measure addresses a distinct aspect of quality-of-life, and is now available for use in research and clinical practice, with a role as both a measurement and an intervention tool. © 2017 John Wiley & Sons Ltd.
Calibration of the Spanish PROMIS Smoking Item Banks.
Huang, Wenjing; Stucky, Brian D; Edelen, Maria O; Tucker, Joan S; Shadel, William G; Hansen, Mark; Cai, Li
2016-07-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) Smoking Initiative has developed item banks for assessing six smoking behaviors and biopsychosocial correlates of smoking among adult cigarette smokers. The goal of this study is to evaluate the performance of the Spanish version of the PROMIS smoking item banks as compared to the original banks developed in English. The six PROMIS banks for daily smokers were translated into Spanish and administered to a sample of Spanish-speaking adult daily smokers in the United States (N = 302). We first evaluated the unidimensionality of each bank using confirmatory factor analysis. We then conducted a two-group item response theory calibration, including an item response theory-based Differential Item Functioning (DIF) analysis by language of administration (Spanish vs. English). Finally, we generated full bank and short form scores for the translated banks and evaluated their psychometric performance. Unidimensionality of the Spanish smoking item banks was supported by confirmatory factor analysis results. Out of a total of 109 items that were evaluated for language DIF, seven items in three of the six banks were identified as having levels of DIF that exceeded an established criterion. The psychometric performance of the Spanish daily smoker banks is largely comparable to that of the English versions. The Spanish PROMIS smoking item banks are highly similar, but not entirely equivalent, to the original English versions. The parameters from these two-group calibrations can be used to generate comparable bank scores across the two language versions. In this study, we developed a Spanish version of the PROMIS smoking toolkit, which was originally designed and developed for English speakers. With the growing Spanish-speaking population, it is important to make the toolkit more accessible by translating the items and calibrating the Spanish version to be comparable with English-language scores. This study provided the translated item banks and short forms, comparable unbiased scores for Spanish speakers and evaluations of the psychometric properties of the new Spanish toolkit. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of an item bank and computer adaptive test for role functioning.
Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B
2012-11-01
Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
Irrational Delay Revisited: Examining Five Procrastination Scales in a Global Sample
Svartdal, Frode; Steel, Piers
2017-01-01
Scales attempting to measure procrastination focus on different facets of the phenomenon, yet they share a common understanding of procrastination as an unnecessary, unwanted, and disadvantageous delay. The present paper examines in a global sample (N = 4,169) five different procrastination scales – Decisional Procrastination Scale (DPS), Irrational Procrastination Scale (IPS), Pure Procrastination Scale (PPS), Adult Inventory of Procrastination Scale (AIP), and General Procrastination Scale (GPS), focusing on factor structures and item functioning using Confirmatory Factor Analysis and Item Response Theory. The results indicated that The PPS (12 items selected from DPS, AIP, and GPS) measures different facets of procrastination even better than the three scales it is based on. An even shorter version of the PPS (5 items focusing on irrational delay), corresponds well to the nine-item IPS. Both scales demonstrate good psychometric properties and appear to be superior measures of core procrastination attributes than alternative procrastination scales. PMID:29163302
Irrational Delay Revisited: Examining Five Procrastination Scales in a Global Sample.
Svartdal, Frode; Steel, Piers
2017-01-01
Scales attempting to measure procrastination focus on different facets of the phenomenon, yet they share a common understanding of procrastination as an unnecessary, unwanted, and disadvantageous delay. The present paper examines in a global sample ( N = 4,169) five different procrastination scales - Decisional Procrastination Scale (DPS), Irrational Procrastination Scale (IPS), Pure Procrastination Scale (PPS), Adult Inventory of Procrastination Scale (AIP), and General Procrastination Scale (GPS), focusing on factor structures and item functioning using Confirmatory Factor Analysis and Item Response Theory. The results indicated that The PPS (12 items selected from DPS, AIP, and GPS) measures different facets of procrastination even better than the three scales it is based on. An even shorter version of the PPS (5 items focusing on irrational delay), corresponds well to the nine-item IPS. Both scales demonstrate good psychometric properties and appear to be superior measures of core procrastination attributes than alternative procrastination scales.
Real and Artificial Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2015-01-01
Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
2010-02-01
To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Thomson, W M; Foster Page, L A; Robinson, P G; Do, L G; Traebert, J; Mohamed, A R; Turton, B J; McGrath, C; Bekes, K; Hirsch, C; Del Carmen Aguilar-Diaz, F; Marshman, Z; Benson, P E; Baker, S R
2016-12-01
To examine the factor structure and other psychometric characteristics of the most commonly used child oral-health-related quality-of-life (OHRQoL) measure (the 16-item short-form CPQ 11-14 ) in a large number of children (N = 5804) from different settings and who had a range of caries experience and associated impacts. Secondary data analyses used subnational epidemiological samples of 11- to 14-year-olds in Australia (N = 372), New Zealand (three samples: 352, 202, 429), Brunei (423), Cambodia (244), Hong Kong (542), Malaysia (439), Thailand (220, 325), England (88, 374), Germany (1055), Mexico (335) and Brazil (404). Confirmatory factor analysis (CFA) was used to examine the factor structure of the CPQ 11-14 across the combined sample and within four regions (Australia/NZ, Asia, UK/Europe and Latin America). Item impact and internal reliability analysis were also conducted. Caries experience varied, with mean DMFT scores ranging from 0.5 in the Malaysian sample to 3.4 in one New Zealand sample. Even more variation was noted in the proportion reporting only fair or poor oral health; this was highest in the Cambodian and Mexican samples and lowest in the German sample and one New Zealand sample. One in 10 reported that their oral health had a marked impact on their life overall. The CFA across all samples revealed two factors with eigenvalues greater than 1. The first involved all items in the oral symptoms and functional limitations subscales; the second involved all emotional well-being and social well-being items. The first was designated the 'symptoms/function' subscale, and the second was designated the 'well-being' subscale. Cronbach's alpha scores were 0.72 and 0.84, respectively. The symptoms/function subscale contained more of the items with greater impact, with the item 'Food stuck in between your teeth' having greatest impact; in the well-being subscale, the 'Felt shy or embarrassed' item had the greatest impact. Repeating the analyses by world region gave similar findings. The CPQ 11-14 performed well cross-sectionally in the largest analysis of the scale in the literature to date, with robust and mostly consistent psychometric characteristics, albeit with two underlying factors (rather than the originally hypothesized four-factor structure). It appears to be a sound, robust measure which should be useful for research, practice and policy. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen
2017-07-01
The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Falk, Carl F; Cai, Li
2016-06-01
We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.
Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John
2015-09-01
To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.
Assessing the mechanism of response in the retrosplenial cortex of good and poor navigators☆
Auger, Stephen D.; Maguire, Eleanor A.
2013-01-01
The retrosplenial cortex (RSC) is consistently engaged by a range of tasks that examine episodic memory, imagining the future, spatial navigation, and scene processing. Despite this, an account of its exact contribution to these cognitive functions remains elusive. Here, using functional MRI (fMRI) and multi-voxel pattern analysis (MVPA) we found that the RSC coded for the specific number of permanent outdoor items that were in view, that is, items which are fixed and never change their location. Moreover, this effect was selective, and was not apparent for other item features such as size and visual salience. This detailed detection of the number of permanent items in view was echoed in the parahippocampal cortex (PHC), although the two brain structures diverged when participants were divided into good and poor navigators. There was no difference in the responsivity of the PHC between the two groups, while significantly better decoding of the number of permanent items in view was possible from patterns of activity in the RSC of good compared to poor navigators. Within good navigators, the RSC also facilitated significantly better prediction of item permanence than the PHC. Overall, these findings suggest that the RSC in particular is concerned with coding the presence of every permanent item that is in view. This mechanism may represent a key building block for spatial and scene representations that are central to episodic memories and imagining the future, and could also be a prerequisite for successful navigation. PMID:24012136
Stroke Self-efficacy Questionnaire: a Rasch-refined measure of confidence post stroke.
Riazi, Afsane; Aspden, Trefor; Jones, Fiona
2014-05-01
Measuring self-efficacy during rehabilitation provides an important insight into understanding recovery post stroke. A Rasch analysis of the Stroke Self-efficacy Questionnaire (SSEQ) was undertaken to establish its use as a clinically meaningful and scientifically rigorous measure. One hundred and eighteen stroke patients completed the SSEQ with the help of an interviewer. Participants were recruited from local acute stroke units and community stroke rehabilitation teams. Data were analysed with confirmatory factor analysis conducted using AMOS and Rasch analysis conducted using RUMM2030 software. Confirmatory factor analysis and Rasch analyses demonstrated the presence of two separate scales that measure stroke survivors' self-efficacy with: i) self-management and ii) functional activities. Guided by Rasch analyses, the response categories of these two scales were collapsed from an 11-point to a 4-point scale. Modified scales met the expectations of the Rasch model. Items satisfied the Rasch requirements (overall and individual item fit, local response independence, differential item functioning, unidimensionality). Furthermore, the two subscales showed evidence of good construct validity. The new SSEQ has good psychometric properties and is a clinically useful assessment of self-efficacy after stroke. The scale measures stroke survivors' self-efficacy with self-management and activities as two unidimensional constructs. It is recommended for use in clinical and research interventions, and in evaluating stroke self-management interventions.
Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H
2018-03-01
Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.
Supporting management of medical equipment for inpatient service in public hospitals: a case study.
Figueroa, Rosa L; Vallejos, Guido E
2013-01-01
This work presents a study of medical equipment availability in the short and long term. The work is divided in two parts. The first part is an analysis of the medical equipment inventory for the institution of study. We consider the replacement, maintenance, and reinforcement of the available medical equipment by considering local guidelines and surveying clinical personnel appreciation. The resulting recommendation is to upgrade the current equipment inventory if necessary. The second part considered a demand analysis in the short and medium term. We predicted the future demand with a 5-year horizon using Holt-Winters models. Inventory analysis showed that 27% of the medical equipment in stock was not functional. Due to this poor performance result we suggested that the hospital gradually addresses this situation by replacing 29 non-functional equipment items, reinforcing stock with 40 new items, and adding 11 items not available in the inventory but suggested by the national guidelines. The results suggest that general medicine inpatient demand has a tendency to increase within the time e.g. for general medicine inpatient service the highest increment is obtained by respiratory (12%, RMSE=8%) and genitourinary diseases (20%, RMSE=9%). This increment did not involve any further upgrading of the proposed inventory.
Sabari, Joyce S.; Woodbury, Michelle; Velozo, Craig A.
2014-01-01
Objectives. (1) To develop two independent measurement scales for use as items assessing hand movements and hand activities within the Motor Assessment Scale (MAS), an existing instrument used for clinical assessment of motor performance in stroke survivors; (2) To examine the psychometric properties of these new measurement scales. Design. Scale development, followed by a multicenter observational study. Setting. Inpatient and outpatient occupational therapy programs in eight hospital and rehabilitation facilities in the United States and Canada. Participants. Patients (N = 332) receiving stroke rehabilitation following left (52%) or right (48%) cerebrovascular accident; mean age 64.2 years (sd 15); median 1 month since stroke onset. Intervention. Not applicable. Main Outcome Measures. Data were tested for unidimensionality and reliability, and behavioral criteria were ordered according to difficulty level with Rasch analysis. Results. The new scales assessing hand movements and hand activities met Rasch expectations of unidimensionality and reliability. Conclusion. Following a multistep process of test development, analysis, and refinement, we have redesigned the two scales that comprise the hand function items on the MAS. The hand movement scale contains an empirically validated 10-behavior hierarchy and the hand activities item contains an empirically validated 8-behavior hierarchy. PMID:25177513
2017-01-01
Purpose This study evaluated the changes in nutritional status based on quality of life (QoL) item-level analysis to determine whether individual QoL responses might facilitate personal clinical impact. Materials and Methods This study retrospectively evaluated QoL data obtained by the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire-Core 30 (QLQ-C30) and Quality of Life Questionnaire-Stomach (QLQ-STO22) as well as metabolic-nutritional data obtained by bioelectrical impedance analysis and blood tests. Patients were assessed preoperatively and at the 5-year follow-up. QoL was analyzed at the level of the constituent items. The patients were categorized into vulnerable and non-vulnerable QoL groups for each scale based on their responses to the QoL items and changes in the metabolic-nutritional indices were compared. Results Multiple shortcomings in the metabolic-nutritional indices were observed in the vulnerable groups for nausea/vomiting (waist-hip ratio, degree of obesity), dyspnea (hemoglobin, iron), constipation (body fat mass, percent body fat), dysphagia (body fat mass, percent body fat), reflux (body weight, hemoglobin), dry mouth (percent body fat, waist-hip ratio), and taste (body weight, total body water, soft lean mass, body fat mass). The shortcomings in a single index were observed in the vulnerable groups for emotional functioning and pain (EORTC QLQ-C30) and for eating restrictions (EORTC QLQ-STO22). Conclusions Long-term postoperative QoL deterioration in emotional functioning, nausea/vomiting, pain, dyspnea, constipation, dysphagia, reflux, eating restrictions, dry mouth, and taste were associated with nutritional shortcomings. QoL item-level analysis, instead of scale-level analysis, may help to facilitate personalized treatment for individual QoL respondents. PMID:29302374
ERIC Educational Resources Information Center
Penfield, Randall D.; Alvarez, Karina; Lee, Okhee
2009-01-01
The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…
Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E
2008-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Marckel, Julie M; Neef, Nancy A; Ferreri, Summer J
2006-01-01
Two young boys with autism who used the picture exchange communication system were taught to solve problems (improvise) by using descriptors (functions, colors, and shapes) to request desired items for which specific pictures were unavailable. The results of a multiple baseline across descriptors showed that training increased the number of improvised requests, and that these skills generalized to novel items, and across settings and listeners in the natural environment. PMID:16602390
Marckel, Julie M; Neef, Nancy A; Ferreri, Summer J
2006-01-01
Two young boys with autism who used the picture exchange communication system were taught to solve problems (improvise) by using descriptors (functions, colors, and shapes) to request desired items for which specific pictures were unavailable. The results of a multiple baseline across descriptors showed that training increased the number of improvised requests, and that these skills generalized to novel items, and across settings and listeners in the natural environment.
ERIC Educational Resources Information Center
Marie, S. Maria Josephine Arokia; Edannur, Sreekala
2015-01-01
This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Steigen, Anne Mari; Bergh, Daniel
2018-02-05
This article analyses the psychometric properties of the Social Provisions Scale 10-items version. The Social Provisions Scale was analysed by means of the polytomous Rasch model, applied to data on 93 young adults (16-30 years) out of school or work, participating in different nature-based services, due to mental or drug-related problems. The psychometric analysis concludes that the original scale has difficulties related to targeting and construct validity. In order to improve the psychometric properties, the scale was modified to include eight items measuring functional support. The modification was based on theoretical and statistical considerations. After modifications the scale showed not only satisfying psychometric properties, but it also clarified uncertainties regarding construct validity of the measure. However, further analysis on larger samples are required. Implications for Rehabilitation Social support is important for a variety of rehabilitation outcomes and for different patient groups in the rehabilitation context, including people with mental health or drug-related problems. Social Provisions Scale may be used as a screening tool to assess social support of participants in rehabilitation, and the scale may also be an important instrument in rehabilitation research. There might be issues measuring structural support using a 10-items version of the Social Provisions Scale but it seemed to work well as an 8-item scale measuring functional support.
DiFilippo, Kristen Nicole; Huang, Wenhao; Chapman-Novakofski, Karen M
2017-10-27
The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps' educational quality and technical functionality. Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps' qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. ©Kristen Nicole DiFilippo, Wenhao Huang, Karen M. Chapman-Novakofski. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 27.10.2017.
Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael
2015-06-01
The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
A model for incomplete longitudinal multivariate ordinal data.
Liu, Li C
2008-12-30
In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point. Copyright 2008 John Wiley & Sons, Ltd.
The stroke impairment assessment set: its internal consistency and predictive validity.
Tsuji, T; Liu, M; Sonoda, S; Domen, K; Chino, N
2000-07-01
To study the scale quality and predictive validity of the Stroke Impairment Assessment Set (SIAS) developed for stroke outcome research. Rasch analysis of the SIAS; stepwise multiple regression analysis to predict discharge functional independence measure (FIM) raw scores from demographic data, the SIAS scores, and the admission FIM scores; cross-validation of the prediction rule. Tertiary rehabilitation center in Japan. One hundred ninety stroke inpatients for the study of the scale quality and the predictive validity; a second sample of 116 stroke inpatients for the cross-validation study. Mean square fit statistics to study the degree of fit to the unidimensional model; logits to express item difficulties; discharge FIM scores for the study of predictive validity. The degree of misfit was acceptable except for the shoulder range of motion (ROM), pain, visuospatial function, and speech items; and the SIAS items could be arranged on a common unidimensional scale. The difficulty patterns were identical at admission and at discharge except for the deep tendon reflexes, ROM, and pain items. They were also similar for the right- and left-sided brain lesion groups except for the speech and visuospatial items. For the prediction of the discharge FIM scores, the independent variables selected were age, the SIAS total scores, and the admission FIM scores; and the adjusted R2 was .64 (p < .0001). Stability of the predictive equation was confirmed in the cross-validation sample (R2 = .68, p < .001). The unidimensionality of the SIAS was confirmed, and the SIAS total scores proved useful for stroke outcome prediction.
Fajrianthi; Zein, Rizqy Amelia
2017-01-01
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
Paap, Muirne C S; Lenferink, Lonneke I M; Herzog, Nadine; Kroeze, Karel A; van der Palen, Job
2016-06-27
Health-related quality of life (HRQoL) is widely used as an outcome measure in the evaluation of treatment interventions in patients with chronic obstructive pulmonary disease (COPD). In order to address challenges associated with existing fixed-length measures (e.g., too long to be used routinely, too short to ensure both content validity and reliability), a COPD-specific item bank (COPD-SIB) was developed. Items were selected based on literature review and interviews with Dutch COPD patients, with a strong focus on both content validity and item comprehension. The psychometric quality of the item bank was evaluated using Mokken Scale Analysis and parametric Item Response Theory, using data of 666 COPD patients. The final item bank contains 46 items that form a strong scale, tapping into eight important themes that were identified based on literature review and patient interviews: Coping with disease/symptoms, adaptability; Autonomy; Anxiety about the course/end-state of the disease, hopelessness; Positive psychological functioning; Situations triggering or enhancing breathing problems; Symptoms; Activity; Impact. The 46-item COPD-SIB has good psychometric properties and content validity. Items are available in Dutch and English. The COPD-SIB can be used as a stand-alone instrument, or to inform computerised adaptive testing.
Forkmann, Thomas; Boecker, Maren; Norra, Christine; Eberle, Nicole; Kircher, Tilo; Schauerte, Patrick; Mischke, Karl; Westhofen, Martin; Gauggel, Siegfried; Wirtz, Markus
2009-05-01
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Li, Chih-Ying; Waid-Ebbs, Julia; Velozo, Craig A.; Heaton, Shelley C.
2016-01-01
Primary Objective Social problem solving deficits characterize individuals with traumatic brain injury (TBI). Poor social problem solving interferes with daily functioning and productive lifestyles. Therefore, it is of vital importance to use the appropriate instrument to identify deficits in social problem solving for individuals with TBI. This study investigates factor structure and item-level psychometrics of the Social Problem Solving Inventory-Revised Short Form (SPSI-R:S), for adults with moderate and severe TBI. Research Design Secondary analysis of 90 adults with moderate and severe TBI who completed the SPSI-R:S. Methods and Procedures An exploratory factor analysis (EFA), principal components analysis (PCA) and Rasch analysis examined the factor structure and item-level psychometrics of the SPSI-R:S. Main Outcomes and Results The EFA showed three dominant factors, with positively worded items represented as the most definite factor. The other two factors are negative problem solving orientation and skills; and negative problem solving emotion. Rasch analyses confirmed the three factors are each unidimensional constructs. Conclusions The total score interpretability of the SPSI-R:S may be challenging due to the multidimensional structure of the total measure. Instead, we propose using three separate SPSI-R:S subscores to measure social problem solving for the TBI population. PMID:26052731
Li, Chih-Ying; Waid-Ebbs, Julia; Velozo, Craig A; Heaton, Shelley C
2016-01-01
Social problem-solving deficits characterise individuals with traumatic brain injury (TBI), and poor social problem solving interferes with daily functioning and productive lifestyles. Therefore, it is of vital importance to use the appropriate instrument to identify deficits in social problem solving for individuals with TBI. This study investigates factor structure and item-level psychometrics of the Social Problem Solving Inventory-Revised: Short Form (SPSI-R:S), for adults with moderate and severe TBI. Secondary analysis of 90 adults with moderate and severe TBI who completed the SPSI-R:S was performed. An exploratory factor analysis (EFA), principal components analysis (PCA) and Rasch analysis examined the factor structure and item-level psychometrics of the SPSI-R:S. The EFA showed three dominant factors, with positively worded items represented as the most definite factor. The other two factors are negative problem-solving orientation and skills; and negative problem-solving emotion. Rasch analyses confirmed the three factors are each unidimensional constructs. It was concluded that the total score interpretability of the SPSI-R:S may be challenging due to the multidimensional structure of the total measure. Instead, we propose using three separate SPSI-R:S subscores to measure social problem solving for the TBI population.
Exploring the Functions of Reading: A Cross-Cultural Perspective.
ERIC Educational Resources Information Center
Greaney, Vincent; Neuman, Susan B.
To determine if purposes in reading differ with sex, grade level, and nationality, a 16-item "Functions of Reading Scale" (developed from content analysis of student essays on why they like to read) was administered to 459 Irish (Dublin, Ireland) and American (Windham, Connecticut) students in grades three, five, and eight. Data…
Development of a refractive error quality of life scale for Thai adults (the REQ-Thai).
Sukhawarn, Roongthip; Wiratchai, Nonglak; Tatsanavivat, Pyatat; Pitiyanuwat, Somwung; Kanato, Manop; Srivannaboon, Sabong; Guyatt, Gordon H
2011-08-01
To develop a scale for measuring refractive error quality of life (QOL) for Thai adults. The full survey comprised 424 respondents from 5 medical centers in Bangkok and from 3 medical centers in Chiangmai, Songkla and KhonKaen provinces. Participants were emmetropes and persons with refractive correction with visual acuity of 20/30 or better An item reduction process was employed by combining 3 methods-expert opinion, impact method and item-total correlation methods. The classical reliability testing and the validity testing including convergent, discriminative and construct validity was performed. The developed questionnaire comprised 87 items in 6 dimensions: 1) quality of vision, 2) visual function, 3) social function, 4) psychological function, 5) symptoms and 6) refractive correction problems. It is the 5-level Likert scale type. The Cronbach's Alpha coefficients of its dimensions ranged from 0.756 to 0. 979. All validity testing were shown to be valid. The construct validity was validated by the confirmatory factor analysis. A short version questionnaire comprised 48 items with good reliability and validity was also developed. This is the first validated instrument for measuring refractive error quality of life for Thai adults that was developed with strong research methodology and large sample size.
Rasch model analysis of the Depression, Anxiety and Stress Scales (DASS)
Shea, Tracey L; Tennant, Alan; Pallant, Julie F
2009-01-01
Background There is a growing awareness of the need for easily administered, psychometrically sound screening tools to identify individuals with elevated levels of psychological distress. Although support has been found for the psychometric properties of the Depression, Anxiety and Stress Scales (DASS) using classical test theory approaches it has not been subjected to Rasch analysis. The aim of this study was to use Rasch analysis to assess the psychometric properties of the DASS-21 scales, using two different administration modes. Methods The DASS-21 was administered to 420 participants with half the sample responding to a web-based version and the other half completing a traditional pencil-and-paper version. Conformity of DASS-21 scales to a Rasch partial credit model was assessed using the RUMM2020 software. Results To achieve adequate model fit it was necessary to remove one item from each of the DASS-21 subscales. The reduced scales showed adequate internal consistency reliability, unidimensionality and freedom from differential item functioning for sex, age and mode of administration. Analysis of all DASS-21 items combined did not support its use as a measure of general psychological distress. A scale combining the anxiety and stress items showed satisfactory fit to the Rasch model after removal of three items. Conclusion The results provide support for the measurement properties, internal consistency reliability, and unidimensionality of three slightly modified DASS-21 scales, across two different administration methods. The further use of Rasch analysis on the DASS-21 in larger and broader samples is recommended to confirm the findings of the current study. PMID:19426512
Rasch model analysis of the Depression, Anxiety and Stress Scales (DASS).
Shea, Tracey L; Tennant, Alan; Pallant, Julie F
2009-05-09
There is a growing awareness of the need for easily administered, psychometrically sound screening tools to identify individuals with elevated levels of psychological distress. Although support has been found for the psychometric properties of the Depression, Anxiety and Stress Scales (DASS) using classical test theory approaches it has not been subjected to Rasch analysis. The aim of this study was to use Rasch analysis to assess the psychometric properties of the DASS-21 scales, using two different administration modes. The DASS-21 was administered to 420 participants with half the sample responding to a web-based version and the other half completing a traditional pencil-and-paper version. Conformity of DASS-21 scales to a Rasch partial credit model was assessed using the RUMM2020 software. To achieve adequate model fit it was necessary to remove one item from each of the DASS-21 subscales. The reduced scales showed adequate internal consistency reliability, unidimensionality and freedom from differential item functioning for sex, age and mode of administration. Analysis of all DASS-21 items combined did not support its use as a measure of general psychological distress. A scale combining the anxiety and stress items showed satisfactory fit to the Rasch model after removal of three items. The results provide support for the measurement properties, internal consistency reliability, and unidimensionality of three slightly modified DASS-21 scales, across two different administration methods. The further use of Rasch analysis on the DASS-21 in larger and broader samples is recommended to confirm the findings of the current study.
Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.
2011-01-01
Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Eland, Nicolaas D; Kvåle, Alice; Ostelo, Raymond W J G; Strand, Liv Inger
2016-10-01
There is evidence that clinicians' pain attitudes and beliefs are associated with the pain beliefs and illness perceptions of their patients and furthermore influence their recommendations for activity and work to patients with back pain. The Pain Attitudes and Beliefs Scale (PABS) is a questionnaire designed to differentiate between biomedical and biopsychosocial pain attitudes among health care providers regarding common low back pain. The original version had 36 items, and several shorter versions have been developed. Concern has been raised over the PABS' internal construct validity because of low internal consistency and low explained variance. The aim of this study was to examine and improve the scale's measurement properties and item performance. A convenience sample of 667 Norwegian physiotherapists provided data for Rasch analysis. The biomedical and biopsychosocial subscales of the PABS were examined for unidimensionality, local response independency, invariance, response category function and targeting of persons and items. Reliability was measured with the person separation index (PSI). Items originally excluded by the developers of the scale because of skewness were re-introduced in a second analysis. Our analysis suggested that both subscales required removal of several psychometrically redundant and misfitting items to satisfy the requirements of the Rasch measurement model. Most biopsychosocial items needed revision of their scoring structure. Furthermore, we identified two items originally excluded because of skewness that improved the reliability of the subscales after re-introduction. The ultimate result was two strictly unidimensional subscales, each consisting of seven items, with invariant item ordering and free from any form of misfit. The unidimensionality implies that summation of items to valid total scores is justified. Transformation tables are provided to convert raw ordinal scores to unbiased interval-level scores. Both subscales were adequately targeted at the ability level of our physiotherapist population. Reliability of the biomedical subscale as measured with the PSI was 0.69. A low PSI of 0.64 for the biopsychosocial subscale indicated limitations with regard to its discriminative ability. Rasch analysis produced an improved Norwegian version of the PABS which represents true (fundamental) measurement of clinicians' biomedical and biopsychosocial treatment orientation. However, researchers should be aware of the low discriminative ability of the biopsychosocial subscale when analyzing differences and effect changes. The study presents a revised PABS that provides interval-level measurement of clinicians' pain beliefs. The revision allows for confident use of parametric statistical analysis. Further examination of discriminative validity is required. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Lerdal, Anners; Kottorp, Anders; Gay, Caryl; Aouizerat, Bradley E; Portillo, Carmen J; Lee, Kathryn A
2011-11-01
To examine the psychometric properties of the 9-item Fatigue Severity Scale (FSS) using a Rasch model application. A convenience sample of HIV-infected adults was recruited, and a subset of the sample was assessed at 6-month intervals for 2 years. Socio-demographic, clinical, and symptom data were collected by self-report questionnaires. CD4 T-cell count and viral load measures were obtained from medical records. The Rasch analysis included 316 participants with 698 valid questionnaires. FSS item 2 did not advanced monotonically, and items 1 and 2 did not show acceptable goodness-of-fit to the Rasch model. A reduced FSS 7-item version demonstrated acceptable goodness-of-fit and explained 61.2% of the total variance in the scale. In the FSS-7 item version, no uniform Differential Item Functioning was found in relation to time of evaluation or to any of the socio-demographic or clinical variables. This study demonstrated that the FSS-7 has better psychometric properties than the FSS-9 in this HIV sample and that responses to the different items are comparable over time and unrelated to socio-demographic and clinical variables.
Billis, Evdokia; McCarthy, Christopher J; Roberts, Chris; Gliatis, John; Papandreou, Maria; Gioftsos, George; Oldham, Jacqueline A
2013-02-01
To identify potential subgroups amongst patients with non-specific low back pain based on a consensus list of potentially discriminatory examination items. Exploratory study. A convenience sample of 106 patients with non-specific low back pain (43 males, 63 females, mean age 36 years, standard deviation 15.9 years) and 7 physiotherapists. Based on 3 focus groups and a two-round Delphi involving 23 health professionals and a random stratified sample of 150 physiotherapists, respectively, a comprehensive examination list comprising the most "discriminatory" items was compiled. Following reliability analysis, the most reliable clinical items were assessed with a sample of patients with non-specific low back pain. K-means cluster analysis was conducted for 2-, 3- and 4-cluster options to explore for meaningful homogenous subgroups. The most clinically meaningful cluster was a two-subgroup option, comprising a small group (n = 24) with more severe clinical presentation (i.e. more widespread pain, functional and sleeping problems, other symptoms, increased investigations undertaken, more severe clinical signs, etc.) and a larger less dysfunctional group (n = 80). A number of potentially discriminatory clinical items were identified by health professionals and sub-classified, based on a sample of patients with non-specific low back pain, into two subgroups. However, further work is needed to validate this classification process.
Khan, Anzalee; Liharska, Lora; Harvey, Philip D; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S E
2017-12-01
Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms.
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
La Porta, F; Giordano, A; Caselli, S; Foti, C; Franchignoni, F
2015-12-01
It is unclear whether the BBS is an effective tool for the measurement of early postural control impairments in patients with Parkinson's disease (PD). The aim of this paper was to evaluate BBS' content validity, internal construct validity, reliability and targeting in patients with PD within the Rasch analysis framework. Observational, cross-sectional study. Outpatient Rehabilitation Unit. A sample of 285 outpatients with PD. The content validity of the BBS was assessed using standard linking techniques. The BBS was administered by trained physiotherapists. The data collected then underwent Rasch analysis. Content validity analysis showed a lack of items assessing postural responses to tripping and slips and stability during walking. On Rasch analysis, the BBS failed the requirements of monotonicity, local independence, unidimensionality and invariance. After rescoring 7 items, grouping of locally dependent items into testlets, and deletion of the static sitting balance item because mistargeted and underdiscriminating, the Rasch-modified BBS for PD (BBS-PD) showed adequate internal construct validity (χ(2)24=39.693; P=0.023), including absence of differential item functioning (DIF) across gender and age, and was, as a whole, sufficiently precise for individual person measurement (PSI=0.894). However, the scale was not well targeted to the sample in view of the prevalence of higher scores. This study demonstrated the internal construct validity and reliability of the BBS-PD as a measurement tool for patients with PD within the Rasch analysis framework. However, the lack of items critical to the assessment of postural control impairments typical of PD, affected negatively the targeting, so that a significant percentage of patients was located in the higher ability range of the measurement continuum, where precision of measurement is reduced. These findings suggest that the BBS, even if modified, may not be an effective tool for the measurement of early postural control in patients with PD.
NASA Technical Reports Server (NTRS)
Patton, Jeff A.
1986-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Electrical Power Distribution and Control (EPD and C)/Electrical Power Generation (EPG) hardware. The EPD and C/EPG hardware is required for performing critical functions of cryogenic reactant storage, electrical power generation and product water distribution in the Orbiter. Specifically, the EPD and C/EPG hardware consists of the following components: Power Section Assembly (PSA); Reactant Control Subsystem (RCS); Thermal Control Subsystem (TCS); Water Removal Subsystem (WRS); and Power Reactant Storage and Distribution System (PRSDS). The IOA analysis process utilized available EPD and C/EPG hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
2016-12-01
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Holden, Libby; Lee, Christina; Hockey, Richard; Ware, Robert S; Dobson, Annette J
2014-12-01
This study aimed to validate a 6-item 1-factor global measure of social support developed from the Medical Outcomes Study Social Support Survey (MOS-SSS) for use in large epidemiological studies. Data were obtained from two large population-based samples of participants in the Australian Longitudinal Study on Women's Health. The two cohorts were aged 53-58 and 28-33 years at data collection (N = 10,616 and 8,977, respectively). Items selected for the 6-item 1-factor measure were derived from the factor structure obtained from unpublished work using an earlier wave of data from one of these cohorts. Descriptive statistics, including polychoric correlations, were used to describe the abbreviated scale. Cronbach's alpha was used to assess internal consistency and confirmatory factor analysis to assess scale validity. Concurrent validity was assessed using correlations between the new 6-item version and established 19-item version, and other concurrent variables. In both cohorts, the new 6-item 1-factor measure showed strong internal consistency and scale reliability. It had excellent goodness-of-fit indices, similar to those of the established 19-item measure. Both versions correlated similarly with concurrent measures. The 6-item 1-factor MOS-SSS measures global functional social support with fewer items than the established 19-item measure.
Liegl, Gregor; Rose, Matthias; Correia, Helena; Fischer, H Felix; Kanlidere, Sibel; Mierke, Annett; Obbarius, Alexander; Nolte, Sandra
2018-01-01
To translate the PROMIS Physical Function (PF) item bank version 1.2 into German and to investigate psychometric properties of resulting full bank and seven derived short forms. Cross-sectional psychometric study. Inpatient and outpatient clinics of the Department of Psychosomatic Medicine at Charité-Universitätsmedizin Berlin, Germany. A total of 10 adult patients with various chronic diseases participated in cognitive debriefing interviews. The final item bank was administered to n = 266 adult patients with a broad range of medical conditions. Patient-reported outcome assessment as part of routine care. PROMIS v1.2 PF bank; MOS SF-36 PF scale (PF-10). Cross-cultural adaptation of the item bank followed established guidelines. For the final German translation, the corrected item-total correlations ranged from 0.44 to 0.84. Cronbach's alpha was high for each PROMIS PF short form ( α = 0.88-0.96). The full PROMIS PF bank and most short forms correlated highly with the SF-36 PF-10 ( r = 0.85-0.90), with the exception of PROMIS Upper Extremity ( r = 0.64). PROMIS Upper Extremity showed ceiling effects and lower agreement with the full bank than other short forms. Unidimensionality was supported for all PROMIS PF measures using traditional factor analysis and nonparametric item response theory. The German PROMIS PF bank was found to be conceptually equivalent to the English version and fulfilled the psychometric requirements for use of short forms in clinical practice. Future studies should pay particular attention to samples with upper extremity functional limitations to further investigate the dimensional structure of PF as conceptualized according to PROMIS.
Improving measurement of injection drug risk behavior using item response theory.
Janulis, Patrick
2014-03-01
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Prado, Jérôme; Noveck, Ira A
2007-04-01
Participants experience difficulty detecting that an item depicting an H-in-a-square confirms the logical rule, "If there is not a T then there is not a circle." Indeed, there is a perceptual conflict between the items mentioned in the rule (T and circle) and in the test item (H and square). Much evidence supports the claim that correct responding depends on detecting and resolving such conflicts. One aim of this study is to find more precise neurological evidence in support of this claim by using a parametric event-related functional magnetic resonance imaging (fMRI) paradigm. We scanned 20 participants while they were required to judge whether or not a conditional rule was verified (or falsified) by a corresponding target item. We found that the right middorsolateral prefrontal cortex (mid-DLPFC) was specifically engaged, together with the medial frontal (anterior cingulate and presupplementary motor area [pre-SMA]) and parietal cortices, when mismatching was present. Activity in these regions was also linearly correlated with the level of mismatch between the rule and the test item. Furthermore, a psychophysiological interaction analysis revealed that activation of the mid-DLPFC, which increases as mismatching does, was accompanied by a decrease in functional integration with the bilateral primary visual cortex and an increase in functional integration with the right parietal cortex. This indicates a need to break away from perceptual cues in order to select an appropriate logical response. These findings strongly indicate that the regions involved in inhibitory control (including the right mid-DLPFC and the medial frontal cortex) are engaged when participants have to overcome perceptual mismatches in order to provide a logical response. These findings are also consistent with neuroimaging studies investigating the belief bias, where prior beliefs similarly interfere with logical reasoning.
Kwakkenbos, Linda; Willems, Linda M; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H M; Thombs, Brett D
2014-01-01
The Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics.
Kwakkenbos, Linda; Willems, Linda M.; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H. M.; Thombs, Brett D.
2014-01-01
Objective The Functional Assessment of Chronic Illness Therapy- Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. Methods The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. Results A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. Conclusions There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics. PMID:24638101
Grassi, Mario; Nucera, Andrea
2010-01-01
The objective of this study was twofold: 1) to confirm the hypothetical eight scales and two-component summaries of the questionnaire Short Form 36 Health Survey (SF-36), and 2) to evaluate the performance of two alternative measures to the original physical component summary (PCS) and mental component summary (MCS). We performed principal component analysis (PCA) based on 35 items, after optimal scaling via multiple correspondence analysis (MCA), and subsequently on eight scales, after standard summative scoring. Item-based summary measures were planned. Data from the European Community Respiratory Health Survey II follow-up of 8854 subjects from 25 centers were analyzed to cross-validate the original and the novel PCS and MCS. Overall, the scale- and item-based comparison indicated that the SF-36 scales and summaries meet the supposed dimensionality. However, vitality, social functioning, and general health items did not fit data optimally. The novel measures, derived a posteriori by unit-rule from an oblique (correlated) MCA/PCA solution, are simple item sums or weighted scale sums where the weights are the raw scale ranges. These item-based scores yielded consistent scale-summary results for outliers profiles, with an expected known-group differences validity. We were able to confirm the hypothesized dimensionality of eight scales and two summaries of the SF-36. The alternative scoring reaches at least the same required standards of the original scoring. In addition, it can reduce the item-scale inconsistencies without loss of predictive validity.
Self-reported walking ability predicts functional mobility performance in frail older adults.
Alexander, N B; Guire, K E; Thelen, D G; Ashton-Miller, J A; Schultz, A B; Grunawalt, J C; Giordani, B
2000-11-01
To determine how self-reported physical function relates to performance in each of three mobility domains: walking, stance maintenance, and rising from chairs. Cross-sectional analysis of older adults. University-based laboratory and community-based congregate housing facilities. Two hundred twenty-one older adults (mean age, 79.9 years; range, 60-102 years) without clinical evidence of dementia (mean Folstein Mini-Mental State score, 28; range, 24-30). We compared the responses of these older adults on a questionnaire battery used by the Established Populations for the Epidemiologic Study of the Elderly (EPESE) project, to performance on mobility tasks of graded difficulty. Responses to the EPESE battery included: (1) whether assistance was required to perform seven Katz activities of daily living (ADL) items, specifically with walking and transferring; (2) three Rosow-Breslau items, including the ability to walk up stairs and walk a half mile; and (3) five Nagi items, including difficulty stooping, reaching, and lifting objects. The performance measures included the ability to perform, and time taken to perform, tasks in three summary score domains: (1) walking ("Walking," seven tasks, including walking with an assistive device, turning, stair climbing, tandem walking); (2) stance maintenance ("Stance," six tasks, including unipedal, bipedal, tandem, and maximum lean); and (3) chair rise ("Chair Rise," six tasks, including rising from a variety of seat heights with and without the use of hands for assistance). A total score combines scores in each Walking, Stance, and Chair Rise domain. We also analyzed how cognitive/ behavioral factors such as depression and self-efficacy related to the residuals from the self-report and performance-based ANOVA models. Rosow-Breslau items have the strongest relationship with the three performance domains, Walking, Stance, and Chair Rise (eta-squared ranging from 0.21 to 0.44). These three performance domains are as strongly related to one Katz ADL item, walking (eta-squared ranging from 0.15 to 0.33) as all of the Katz ADL items combined (eta-squared ranging from 0.21 to 0.35). Tests of problem solving and psychomotor speed, the Trails A and Trails B tests, are significantly correlated with the residuals from the self-report and performance-based ANOVA models. Compared with the rest of the EPESE self-report items, self-report items related to walking (such as Katz walking and Rosow-Breslau items) are better predictors of functional mobility performance on tasks involving walking, stance maintenance, and rising from chairs. Compared with other self-report items, self-reported walking ability may be the best predictor of overall functional mobility.
Ingvarsson, Einar T; Kahng, Sungwoo; Hausman, Nicole L
2008-01-01
Functional analysis suggested that the problem behavior of an 8-year-old girl with autism was maintained by escape from demands and access to edible items. Noncontingent delivery of an edible item was sufficient to increase compliance and reduce the rate of problem behavior without the use of escape extinction in a demand context. Leaner and richer schedules of noncontingent reinforcement were equally effective, and there were minimal differences between noncontingent reinforcement and differential reinforcement of compliance.
Measuring Attitudes About Intimate Partner Violence Against Women: The ATT-IPV Scale
Yount, Kathryn M.; VanderEnde, Kristin; Zureick-Brown, Sarah; Anh, Hoang Tu; Schuler, Sidney Ruth; Minh, Tran Hung
2014-01-01
In lower-income settings, women more often than men justify intimate partner violence (IPV). Yet, the role of measurement invariance across gender is unstudied. We developed the ATT-IPV scale to measure attitudes about physical violence against wives in 1,055 married men and women ages 18–50 in My Hao district, Vietnam. Across 10 items about transgressions of the wife, women more often than men agreed that a man had good reason to hit his wife (3 % to 92 %; 0 % to 67 %). In random split-half samples, one-factor exploratory factor analysis (EFA) (N1 = 527) and confirmatory factor analysis (CFA) (N2 = 528) models for nine items with sufficient variability had significant loadings (0.575–0.883; 0.502–0.897) and good fit (RMSEA = 0.068, 0.048; CFI = 0.951, 0.978, TLI = 0.935, 0.970). Three items had significant uniform differential item functioning (DIF) by gender, and adjustment for DIF revealed that measurement noninvariance was partially masking men’s lower propensity than women to justify IPV. A CFA model for the six items without DIF had excellent fit (RMSEA = 0.019, CFI = 0.994, TLI = 0.991) and an attitudinal gender gap similar to the DIF-adjusted nine-item model, suggesting that the six-item scale reliably measures attitudes about IPV across gender. Researchers should validate the scale in urban Vietnam and elsewhere and decompose DIF-adjusted gender attitudinal gaps. PMID:24846070
Applying a Mixed Methods Framework to Differential Item Function Analyses
ERIC Educational Resources Information Center
Hitchcock, John H.; Johanson, George A.
2015-01-01
Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Ramsay-Curve Differential Item Functioning
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Prisciandaro, James J; Tolliver, Bryan K
2016-11-15
The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren
2013-11-01
This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residuals<|2.5|) and no DIF or LD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.
Farzandipour, Mehrdad; Riazi, Hossein; Jabali, Monireh Sadeqi
2018-01-01
Introduction: System usability assessment is among the important aspects in assessing the quality of clinical information technology, especially when the end users of the system are concerned. This study aims at providing a comprehensive list of system usability. Methods: This research is a descriptive cross-sectional one conducted using Delphi technique in three phases in 2013. After experts’ ideas were concluded, the final version of the questionnaire including 163 items in three phases was presented to 40 users of information systems in hospitals. The grading ranged from 0-4. Data analysis was conducted using SPSS software. Those requirements with a mean point of three or higher were finally confirmed. Results: The list of system usability requirements for electronic health record was designed and confirmed in nine areas including suitability for the task (24 items), self-descriptiveness (22 items), controllability (19 questions), conformity with user expectations (25 items), error tolerance (21 items), suitability for individualization (7 items), suitability for learning (19 items), visual clarity (18 items) and auditory presentation (8 items). Conclusion: A relatively comprehensive model including useful requirements for using EHR was presented which can increase functionality, effectiveness and users’ satisfaction. Thus, it is suggested that the present model be adopted by system designers and healthcare system institutions to assess those systems. PMID:29719310
Baylor, Carolyn R.; Yorkston, Kathryn M.; Eadie, Tanya L.; Miller, Robert M.; Amtmann, Dagmar
2011-01-01
Purpose The purpose of this study was to conduct the initial psychometric analyses of the Communicative Participation Item Bank—a new self-report instrument designed to measure the extent to which communication disorders interfere with communicative participation. This item bank is intended for community-dwelling adults across a range of communication disorders. Method A set of 141 candidate items was administered to 208 adults with spasmodic dysphonia. Participants rated the extent to which their condition interfered with participation in various speaking communication situations. Questionnaires were administered online or in a paper version per participant preference. Participants also completed the Voice Handicap Index (B. H. Jacobson et al., 1997) and a demographic questionnaire. Rasch analyses were conducted using Winsteps software (J. M. Linacre, 1991). Results The results show that items functioned better when the 5-category response format was recoded to a 4-category format. After removing 8 items that did not fit the Rasch model, the remaining 133 items demonstrated strong evidence of sufficient unidimensionality, with the model accounting for 89.3% of variance. Item location values ranged from −2.73 to 2.20 logits. Conclusions Preliminary Rasch analyses of the Communicative Participation Item Bank show strong psychometric properties. Further testing in populations with other communication disorders is needed. PMID:19717652
Validation of a condition-specific measure for women having an abnormal screening mammography.
Brodersen, John; Thorsen, Hanne; Kreiner, Svend
2007-01-01
The aim of this study is to assess the validity of a new condition-specific instrument measuring psychosocial consequences of abnormal screening mammography (PCQ-DK33). The draft version of the PCQ-DK33 was completed on two occasions by 184 women who had received an abnormal screening mammography and on one occasion by 240 women who had received a normal screening result. Item Response Theories and Classical Test Theories were used to analyze data. Construct validity, concurrent validity, known group validity, objectivity and reliability were established by item analysis examining the fit between item responses and Rasch models. Six dimensions covering anxiety, behavioral impact, sense of dejection, impact on sleep, breast examination, and sexuality were identified. One item belonging to the dejection dimension had uniform differential item functioning. Two items not fitting the Rasch models were retained because of high face validity. A sick leave item added useful information when measuring side effects and socioeconomic consequences of breast cancer screening. Five "poor items" were identified and should be deleted from the final instrument. Preliminary evidence for a valid and reliable condition-specific measure for women having an abnormal screening mammography was established. The measure includes 27 "good" items measuring different attributes of the same overall latent structure-the psychosocial consequences of abnormal screening mammography.
Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.
Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J
2018-02-01
Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Spanish adaptation of the revised Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS-R).
Campos, Teresa Salas; Rodríguez-Santos, Francisco; Esteban, Jesus; Vázquez, Pilar Cordero; Mora Pardina, Jesus S; Carmona, Alejandra Cano
2010-10-01
ALSFRS-R is a tool designed to measure disease progress in ALS patients. It consists of 12 items grouped into four functions designed to assess disabilities according to the Activities of daily living (ADL). Our objective was to validate the Spanish version of ALSFRS-R based on the original version. Four examiners assessed 73 ALS patients, applying the ALSFRS-R, ALSAQ-40 and the respiratory function variable assessed by the SRI scale, which measures respiratory insufficiency. Internal consistency and test-retest correlations were measured using Cronbach's alpha and Spearman's Rho tests. Factor analysis was performed by applying Varimax rotation and Kaiser standardization. Validity was analysed based on correlations between items in the ALSFRS-R scales and equivalents in the ALSAQ-40 and SRI questionnaires. The results showed high internal consistency (0.77-0.95) and a good test-retest correlation (0.80-0.95). Factor analysis showed a 73.3% principal component contribution; the weight of each item regarding their corresponding factors was 0.7-0.9. High correlations were observed (rs >0.60) between corresponding factors of ALSFRS-R/ALSAQ-40 and ALSFRS-R/SRI. We conclude that the version obtained from the ALSFRS-R maintains the internal consistency and validity of the construct of the original scale. The Spanish version of ALSFRS-R is available for readers at http://www.fundela.es/verOtras.php.
Pricing policy for declining demand using item preservation technology.
Khedlekar, Uttam Kumar; Shukla, Diwakar; Namdeo, Anubhav
2016-01-01
We have designed an inventory model for seasonal products in which deterioration can be controlled by item preservation technology investment. Demand for the product is considered price sensitive and decreases linearly. This study has shown that the profit is a concave function of optimal selling price, replenishment time and preservation cost parameter. We simultaneously determined the optimal selling price of the product, the replenishment cycle and the cost of item preservation technology. Additionally, this study has shown that there exists an optimal selling price and optimal preservation investment to maximize the profit for every business set-up. Finally, the model is illustrated by numerical examples and sensitive analysis of the optimal solution with respect to major parameters.
ERIC Educational Resources Information Center
O'Reilly, Mark; Fragale, Christina; Gainey, Summer; Kang, Soyeon; Koch, Heather; Shubert, Jennifer; El Zein, Farah; Longino, Deanna; Chung, Moon; Xu, Ziwei; White, Pamela; Lang, Russell; Davis, Tonya; Rispoli, Mandy; Lancioni, Giulio; Didden, Robert; Healy, Olive; Kagohara, Deborah; van der Meer, Larah; Sigafoos, Jeff
2012-01-01
We examined the influence of an antecedent communication intervention on challenging behavior for three students with developmental disorders. Students were taught to request tangible items that were identified as reinforcers for challenging behavior in a prior functional analysis. Individual participant multielement and reversal designs were used…
Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A
2006-11-01
To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
Landfeldt, Erik; Mayhew, Anna; Straub, Volker; Bushby, Katharine; Lochmüller, Hanns; Lindgren, Peter
2017-12-18
To explore the psychometric properties of the full 22-item English (UK and US) version of the Zarit Caregiver Burden Interview administered to caregivers to patients with Duchenne muscular dystrophy. Caregivers to patients with Duchenne muscular dystrophy from the United Kingdom and the United States, recruited through the TREAT-NMD network, completed the Zarit Caregiver Burden Interview online. The psychometric properties of the Zarit Caregiver Burden Interview were examined using Rasch analysis. A total of 475 caregivers completed the Zarit Caregiver Burden Interview. Model misfit was identified for 9 of 22 items (mean item fit residual 0.061, SD: 2.736) and 13 of 22 items displayed disordered thresholds. The overall item-trait interaction chi-square value was 499 (198 degrees of freedom, p < 0.001). The mean person fit residual was estimated at -0.213 (SD: 1.235). The Person Separation Index and Cronbach's α were estimated at 0.902 and 0.914, respectively. Item dependency was low and we found no significant differential item functioning by country or sex. Our Rasch analysis shows that the Zarit Caregiver Burden Interview fails to fully operationalize a quantitative conceptualization of caregiver burden among caregivers to patients with Duchenne muscular dystrophy from the United Kingdom and the United States. Further research is needed to understand the psychometric properties of the Zarit Caregiver Burden Interview in other populations and settings. Implications for Rehabilitation Duchenne muscular dystrophy is a terminal disease characterized by progressive muscle degeneration resulting in substantial disability and a significant burden on family caregivers. The Zarit Caregiver Burden Interview is one of the most widely applied measures of caregiver burden. Our Rasch analysis suggests that the Zarit Caregiver Burden Interview is not fit for purpose to measure burden in UK and US caregivers to patients with Duchenne muscular dystrophy. Clinicians and decision-makers should interpret Zarit Caregiver Burden Interview data from these populations with caution.
Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F
2012-11-01
Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.
Measuring Workplace Climate in Community Clinics and Health Centers.
Friedberg, Mark W; Rodriguez, Hector P; Martsolf, Grant R; Edelen, Maria O; Vargas Bustamante, Arturo
2016-10-01
The effectiveness of community clinics and health centers' efforts to improve the quality of care might be modified by clinics' workplace climates. Several surveys to measure workplace climate exist, but their relationships to each other and to distinguishable dimensions of workplace climate are unknown. To assess the psychometric properties of a survey instrument combining items from several existing surveys of workplace climate and to generate a shorter instrument for future use. We fielded a 106-item survey, which included items from 9 existing instruments, to all clinicians and staff members (n=781) working in 30 California community clinics and health centers, receiving 628 responses (80% response rate). We performed exploratory factor analysis of survey responses, followed by confirmatory factor analysis of 200 reserved survey responses. We generated a new, shorter survey instrument of items with strong factor loadings. Six factors, including 44 survey items, emerged from the exploratory analysis. Two factors (Clinic Workload and Teamwork) were independent from the others. The remaining 4 factors (staff relationships, quality improvement orientation, managerial readiness for change, and staff readiness for change) were highly correlated, indicating that these represented dimensions of a higher-order factor we called "Clinic Functionality." This 2-level, 6-factor model fit the data well in the exploratory and confirmatory samples. For all but 1 factor, fewer than 20 survey responses were needed to achieve clinic-level reliability >0.7. Survey instruments designed to measure workplace climate have substantial overlap. The relatively parsimonious item set we identified might help target and tailor clinics' quality improvement efforts.
Measuring Workplace Climate in Community Clinics and Health Centers
Friedberg, Mark W.; Rodriguez, Hector P.; Martsolf, Grant; Edelen, Maria Orlando; Vargas-Bustamante, Arturo
2018-01-01
Background The effectiveness of community clinics and health centers’ efforts to improve the quality of care might be modified by clinics’ workplace climates. Several surveys to measure workplace climate exist, but their relationships to each other and to distinguishable dimensions of workplace climate are unknown. Objective To assess the psychometric properties of a survey instrument combining items from several existing surveys of workplace climate and to generate a shorter instrument for future use. Methods We fielded a 106-item survey, which included items from 9 existing instruments, to all clinicians and staff members (n=781) working in 30 California community clinics and health centers, receiving 628 responses (80% response rate). We performed exploratory factor analysis of survey responses, followed by confirmatory factor analysis of 200 reserved survey responses. We generated a new, shorter survey instrument of items with strong factor loadings. Results Six factors, including 44 survey items, emerged from the exploratory analysis. Two factors (Clinic Workload and Teamwork) were independent from the others. The remaining 4 factors (Staff Relationships, Quality Improvement Orientation, Managerial Readiness for Change, and Staff Readiness for Change) were highly correlated, indicating that these represented dimensions of a higher-order factor we called “Clinic Functionality.” This two-level, six-factor model fit the data well in the exploratory and confirmatory samples. For all but one factor, fewer than 20 survey responses were needed to achieve clinic-level reliability >0.7. Conclusion Survey instruments designed to measure workplace climate have substantial overlap. The relatively parsimonious item set we identified might help target and tailor clinics’ quality improvement efforts. PMID:27326549
Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon
2012-12-01
(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Functional Analysis of Self-Injurious Behavior and Its Relation to Self-Restraint
ERIC Educational Resources Information Center
Rooker, Griffin W.; Roscoe, Eileen M.
2005-01-01
Some individuals who engage in self-injurious behavior (SIB) also exhibit self-restraint. In the present study, a series of three functional analyses were conducted to determine the variables that maintained a participant's SIB, one without restraint items available, one with a preferred and effective form of self-restraint (an airplane pillow)…
Planning an Information System for a Small College. AIR Forum Paper 1978.
ERIC Educational Resources Information Center
Toombs, William; Sagaria, Mary Ann
Data collection and analyses of college records and interviewing provided a cross-sectional view of data flow and information transmission in a small college. The micro-analysis of interview data, forms, and reports yielded a picture of functional relationships, clarified loci of decision making, and stipulated functions served by data items.…
Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.
2014-01-01
Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Assessing depression outcome in patients with moderate dementia: sensitivity of the HoNOS65+ scale.
Canuto, Alessandra; Rudhard-Thomazic, Valérie; Herrmann, François R; Delaloye, Christophe; Giannakopoulos, Panteleimon; Weber, Kerstin
2009-08-15
To date, there is no widely accepted clinical scale to monitor the evolution of depressive symptoms in demented patients. We assessed the sensitivity to treatment of a validated French version of the Health of the Nation Outcome Scale (HoNOS) 65+ compared to five routinely used scales. Thirty elderly inpatients with ICD-10 diagnosis of dementia and depression were evaluated at admission and discharge using paired t-test. Using the Brief Psychiatric Rating Scale (BPRS) "depressive mood" item as gold standard, a receiver operating characteristic curve (ROC) analysis assessed the validity of HoNOS65+F "depressive symptoms" item score changes. Unlike Geriatric Depression Scale, Mini Mental State Examination and Activities of Daily Living scores, BPRS scores decreased and Global Assessment Functioning Scale score increased significantly from admission to discharge. Amongst HoNOS65+F items, "behavioural disturbance", "depressive symptoms", "activities of daily life" and "drug management" items showed highly significant changes between the first and last day of hospitalization. The ROC analysis revealed that changes in the HoNOS65+F "depressive symptoms" item correctly classified 93% of the cases with good sensitivity (0.95) and specificity (0.88) values. These data suggest that the HoNOS65+F "depressive symptoms" item may provide a valid assessment of the evolution of depressive symptoms in demented patients.
Mojtabai, Ramin; Corey-Lisle, Patricia K; Ip, Edward Hak-Sing; Kopeykina, Irina; Haeri, Sophia; Cohen, Lisa Janet; Shumaker, Sally
2012-12-30
Investigation of patients' subjective perspective regarding the effectiveness - as opposed to efficacy - of antipsychotic medication has been hampered by a relative shortage of self-report measures of global clinical outcome. This paper presents data supporting the feasibility, inter-item consistency, and construct validity of the Patient Assessment Questionnaire (PAQ)-a self-report measure of psychiatric symptoms, medication side effects and general wellbeing, ultimately intended to assess effectiveness of interventions for schizophrenia-spectrum patients. The original 53-item instrument was developed by a multidisciplinary team which utilized brainstorming sessions for item generation and content analysis, patient focus groups, and expert panel reviews. This instrument and additional validation measures were administered, via Audio Computer-Assisted Self-Interviewing (ACASI), to 300 stable, medicated outpatients diagnosed with schizophrenia or schizoaffective disorder. Item elimination was based on psychometric properties and Item-Response Theory information functions and characteristic curves. Exploratory factor analysis of the resulting 40-item scale yielded a five factor solution. The five subscales (General Distress, Side Effects, Psychotic Symptoms, Cognitive Symptoms, Sleep) showed robust convergent (β's=0.34-0.75, average β=0.49) and discriminant validity. The PAQ demonstrates feasibility, reliability, and construct validity as a self-report measure of multiple domains pertinent to effectiveness. Future research needs to establish the PAQ's sensitivity to change. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Translation and validation of the Rhinosinusitis Disability Index for use in Nigeria.
Asoegwu, C N; Nwawolo, C C; Okubadejo, N U
2017-07-01
The Rhinosinusitis Disability Index (RSDI) is a validated and reliable measure of severity of chronic rhinosinusitis. The objective of this study was to translate and validate the instrument for use in Nigeria. This is a methodological study. 71 patients with chronic rhinosinusitis attending two Otolaryngology clinics in Lagos, Nigeria. Using standardized methods and trained translators, the RSDI was translated to vernacular (Yoruba language) and back-translated to culturally appropriate English. Data analysis comprised of assessment of the item quality, content validity and internal consistency of the back-translated Rhinosinusitis Disability Index (bRSDI), and correlation to the original RSDI. Content validity (floor and ceiling effects) showed 0% floor and ceiling effects for the total scores, 0% ceiling effects for all domains and floor effect for physical domain, and 9.9 and 8.5% floor effects for functional and emotional domains, respectively. The mean item-own correlation for physical domain was 0.54 ± 0.08, 0.72 ± 0.08 for functional domain and 0.74 ± 0.07 for emotional domain. All domain item-own correlations were higher than item-other domain correlations. The total Cronbach's alpha was 0.936 and was higher than 0.70 for all the domains representing good internal consistency. Pearson correlation analysis showed strong correlation of RSDI to bRSDI (total score 0.881; p = 0.000, and domain subscores-physical: 0.788; p = 0.000, functional: 0.830; p = 0.000, and emotional: 0.888; p = 0.000). The back-translated Rhinosinusitis Disability Index shows good face and content validity with good internal consistency while correlating linearly and significantly with the original Rhinosinusitis Disability Index and is recommended for use in Nigeria.
The MIMIC Model as a Tool for Differential Bundle Functioning Detection
ERIC Educational Resources Information Center
Finch, W. Holmes
2012-01-01
Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…
Paz, Sylvia H.; Jones, Loretta; Calderón, José L.; Hays, Ron D.
2016-01-01
Background Depression and physical function are especially important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Function Item Bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. Objective To estimate the readability of the GDS and PROMIS® Physical Function items and to assess their comprehensibility by a sample of African American and Latino elderly. Methods Readability was estimated using the Flesch-Kincaid (F-K) and Flesch-Reading-Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS items by minority elderly was evaluated with 30 cognitive interviews. Results Readability estimates of a number of items in English and Spanish of the GDS and PROMIS physical functioning items exceed the recommended 5th grade level, or were rated as fairly difficult, difficult, or very difficult to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS items was considered confusing and responses potentially uninterpretable because they were based on physical aids. Conclusions Problems with item wording and response options of the GDS and PROMIS Physical Function items may negatively affect reliability and validity of measurement when used with minority elderly. PMID:27599978
Darzins, Susan; Imms, Christine; Di Stefano, Marilyn; Taylor, Nicholas F; Pallant, Julie F
2014-11-05
The Personal Care Participation Assessment and Resource Tool (PC-PART) is a 43-item, clinician-administered assessment, designed to identify patients' unmet needs (participation restrictions) in activities of daily living (ADL) required for community life. This information is important for identifying problems that need addressing to enable, for example, discharge from inpatient settings to community living. The objective of this study was to evaluate internal construct validity of the PC-PART using Rasch methods. Fit to the Rasch model was evaluated for 41 PC-PART items, assessing threshold ordering, overall model fit, individual item fit, person fit, internal consistency, Differential Item Functioning (DIF), targeting of items and dimensionality. Data used in this research were taken from admission data from a randomised controlled trial conducted at two publically funded inpatient rehabilitation units in Melbourne, Australia, with 996 participants (63% women; mean age 74 years) and with various impairment types. PC-PART items assessed as one scale, and original PC-PART domains evaluated as separate scales, demonstrated poor fit to the Rasch model. Adequate fit to the Rasch model was achieved in two newly formed PC-PART scales: Self-Care (16 items) and Domestic Life (14 items). Both scales were unidimensional, had acceptable internal consistency (PSI =0.85, 0.76, respectively) and well-targeted items. Rasch analysis did not support conventional summation of all PC-PART item scores to create a total score. However, internal construct validity of the newly formed PC-PART scales, Self-Care and Domestic Life, was supported. Their Rasch-derived scores provided interval-level measurement enabling summation of scores to form a total score on each scale. These scales may assist clinicians, managers and researchers in rehabilitation settings to assess and measure changes in ADL participation restrictions relevant to community living. Data used in this research were gathered during a registered randomised controlled trial: Australian and New Zealand Clinical Trials Registry ACTRN12609000973213. Ethics committee approval was gained for secondary analysis of data for this study.
Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.
2015-01-01
Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011
NASA Astrophysics Data System (ADS)
Liou, Pey-Yan; Bulut, Okan
2017-12-01
The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
ERIC Educational Resources Information Center
Grover, Raman K.; Ercikan, Kadriye
2017-01-01
In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Real and Artificial Differential Item Functioning
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2012-01-01
The literature in modern test theory on procedures for identifying items with differential item functioning (DIF) among two groups of persons includes the Mantel-Haenszel (MH) procedure. Generally, it is not recognized explicitly that if there is real DIF in some items which favor one group, then as an artifact of this procedure, artificial DIF…
Validation of a mobility item bank for older patients in primary care.
Cabrero-García, Julio; Ramos-Pichardo, Juan Diego; Muñoz-Mendoza, Carmen Luz; Cabañero-Martínez, María José; González-Llopis, Lorena; Reig-Ferrer, Abilio
2012-12-05
To develop and validate an item bank to measure mobility in older people in primary care and to analyse differential item functioning (DIF) and differential bundle functioning (DBF) by sex. A pool of 48 mobility items was administered by interview to 593 older people attending primary health care practices. The pool contained four domains based on the International Classification of Functioning: changing and maintaining body position, carrying, lifting and pushing, walking and going up and down stairs. The Late Life Mobility item bank consisted of 35 items, and measured with a reliability of 0.90 or more across the full spectrum of mobility, except at the higher end of better functioning. No evidence was found of non-uniform DIF but uniform DIF was observed, mainly for items in the changing and maintaining body position and carrying, lifting and pushing domains. The walking domain did not display DBF, but the other three domains did, principally the carrying, lifting and pushing items. During the design and validation of an item bank to measure mobility in older people, we found that strength (carrying, lifting and pushing) items formed a secondary dimension that produced DBF. More research is needed to determine how best to include strength items in a mobility measure, or whether it would be more appropriate to design separate measures for each construct.
Terluin, Berend; Smits, Niels; Miedema, Baukje
2014-12-01
Translations of questionnaires need to be carefully validated to assure that the translation measures the same construct(s) as the original questionnaire. The four-dimensional symptom questionnaire (4DSQ) is a Dutch self-report questionnaire measuring distress, depression, anxiety and somatization. To evaluate the equivalence of the English version of the 4DSQ. 4DSQ data of English and Dutch speaking general practice attendees were analysed and compared. The English speaking group consisted of 205 attendees, aged 18-64 years, in general practice, in Canada whereas the Dutch group consisted of 302 general practice attendees in the Netherlands. Differential item functioning (DIF) analysis was conducted using the Mantel-Haenszel method and ordinal logistic regression. Differential test functioning (DTF; i.e., the scale impact of DIF) was evaluated using linear regression analysis. DIF was detected in 2/16 distress items, 2/6 depression items, 2/12 anxiety items, and 1/16 somatization items. With respect to mean scale scores, the impact of DIF on the scale level was negligible for all scales. On the anxiety scale DIF caused the English speaking patients with moderate to severe anxiety to score about one point lower than Dutch patients with the same anxiety level. The English 4DSQ measures the same constructs like the original Dutch 4DSQ. The distress, depression and somatization scales can employ the same cut-off points as the corresponding Dutch scales. However, cut-off points of the English 4DSQ anxiety scale should be lowered by one point to retain the same meaning as the Dutch anxiety cut-off points.
Huang, Chun-Jen; Chen, Cheng-Chung
2018-01-01
Abstract Background The burden of major depressive disorder includes suffering due to symptom severity, functional impairment, and quality of life deficits. The aim of this study was to compare the differences between electroconvulsive therapy and pharmacotherapy in reducing such burdens. Methods This was a pooled analysis study including 2 open-label trials for major depressive disorder inpatients receiving either standard bitemporal and modified electroconvulsive therapy with a maximum of 12 sessions or 20 mg/d of fluoxetine for 6 weeks. Symptom severity, functioning, and quality of life were assessed using the 17-item Hamilton Rating Scale for Depression, the Modified Work and Social Adjustment Scale, and SF-36. Side effects following treatment, including subjective memory impairment, nausea/vomiting, and headache, were recorded. The differences between these 2 groups in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, quality of life, side effects, and time to response (at least a 50% reduction of 17-item Hamilton Rating Scale for Depression) and remission (17-item Hamilton Rating Scale for Depression ≤7) following treatment were analyzed. Results Electroconvulsive therapy (n=116) showed a significantly greater reduction in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, and quality of life deficits and had significantly shorter time to response/remission than fluoxetine (n=126). However, the electroconvulsive therapy group was more likely to experience subjective memory impairment and headache. Conclusions Compared with fluoxetine, electroconvulsive therapy was more effective in alleviating the burden of major depressive disorder and had a substantially increased speed of response/remission in the acute phase. Increased education and information about electroconvulsive therapy for clinicians, patients, and their families and the general public is warranted. PMID:29228200
Simpelaere, Ingeborg S; Van Nuffelen, Gwen; De Bodt, Marc; Vanderwegen, Jan; Hansen, Tina
2017-04-07
The Swallowing Quality-of-Life Questionnaire (SWAL-QoL) is considered the gold standard for assessing health-related QoL in oropharyngeal dysphagia. The Dutch translation (DSWAL-QoL) and its adjusted version (aDSWAL-QoL) have been validated using classical test theory (CTT). However, these scales have not been tested against the Rasch measurement model, which is required to establish the structural validity and objectivity of the total scale and subscale scores. Thus, the purpose of this study was to examine the psychometric properties of these scales using item analysis according to the Rasch model. Item analysis with the Rasch model was performed using RUMM2030 software with previously collected data from a validation study of 108 patients. The assessment included evaluations of overall model fit, reliability, unidimensionality, threshold ordering, individual item and person fits, differential item functioning (DIF), local item dependency (LID) and targeting. The analysis could not establish the psychometric properties of either of the scales or their subscales because they did not fit the Rasch model, and multidimensionality, disordered thresholds, DIF, and/or LID were found. The reliability and power of fit were high for the total scales (PSI = 0.93) but low for most of the subscales (PSI < 0.70). The targeting of persons and items was suboptimal. The main source of misfit was disordered thresholds for both the total scales and subscales. Based on the results of the analysis, adjustments to improve the scales were implemented as follows: disordered thresholds were rescaled, misfit items were removed and items were split for DIF. However, the multidimensionality and LID could not be resolved. The reliability and power of fit remained low for most of the subscales. This study represents the first analyses of the DSWAL-QoL and aDSWAL-QoL with the Rasch model. Relying on the DSWAL-QoL and aDSWAL-QoL total and subscale scores to make conclusions regarding dysphagia-related HRQoL should be treated with caution before the structural validity and objectivity of both scales have been established. A larger and well-targeted sample is recommended to derive definitive conclusions about the items and scales. Solutions for the psychometric weaknesses suggested by the model and practical implications are discussed.
Fujita, Takaaki; Sato, Atsushi; Tsuchiya, Kenji; Ohashi, Takuro; Yamane, Kazuhiro; Yamamoto, Yuichi; Iokawa, Kazuaki; Ohira, Yoko; Otsuki, Koji; Tozato, Fusae
2017-12-01
This study aimed to elucidate the relationship between grooming performance of stroke patients and various motor and cognitive functions and to examine the cognitive and physical functional standards required for grooming independence. We retrospectively analyzed the data of 96 hospitalized patients with first stroke in a rehabilitation hospital ward. Logistic regression analysis and receiver operating characteristic curves were used to investigate the related cognitive and motor functions with grooming performance and to calculate the cutoff values for independence and supervision levels in grooming. For analysis between the independent and supervision-dependent groups, the only item with an area under the curve (AUC) of .9 or higher was the Berg Balance Scale, and the calculated cutoff value was 41/40 (sensitivity, 83.6%; specificity, 87.8%). For analysis between the independent-supervision and dependent groups, the items with an AUC of .9 or higher were the Simple Test for Evaluating Hand Function (STEF) on the nonaffected side, Vitality Index (VI), and FIM ® cognition. The cutoff values were 68/67 for the STEF (sensitivity, 100%; specificity, 72.2%), 9/8 points for the VI (sensitivity, 92.3%; specificity, 88.9%), and 23/22 points for FIM ® cognition (sensitivity, 91.0%; specificity, 88.9%). Our results suggest that upper-extremity functions on the nonaffected side, motivation, and cognitive functions are particularly important to achieve the supervision level and that balance is important to reach the independence level. The effective improvement of grooming performance is possible by performing therapeutic or compensatory intervention on functions that have not achieved these cutoff values. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Measurement properties of the WOMAC LK 3.1 pain scale.
Stratford, P W; Kennedy, D M; Woodhouse, L J; Spadoni, G F
2007-03-01
The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is applied extensively to patients with osteoarthritis of the hip or knee. Previous work has challenged the validity of its physical function scale however an extensive evaluation of its pain scale has not been reported. Our purpose was to estimate internal consistency, factorial validity, test-retest reliability, and the standard error of measurement (SEM) of the WOMAC LK 3.1 pain scale. Four hundred and seventy-four patients with osteoarthritis of the hip or knee awaiting arthroplasty were administered the WOMAC. Estimates of internal consistency (coefficient alpha), factorial validity (confirmatory factor analysis), and the SEM based on internal consistency (SEM(IC)) were obtained. Test-retest reliability [Type 2,1 intraclass correlation coefficients (ICC)] and a corresponding SEM(TRT) were estimated on a subsample of 36 patients. Our estimates were: internal consistency alpha=0.84; SEM(IC)=1.48; Type 2,1 ICC=0.77; SEM(TRT)=1.69. Confirmatory factor analysis failed to support a single factor structure of the pain scale with uncorrelated error terms. Two comparable models provided excellent fit: (1) a model with correlated error terms between the walking and stairs items, and between night and sit items (chi2=0.18, P=0.98); (2) a two factor model with walking and stairs items loading on one factor, night and sit items loading on a second factor, and the standing item loading on both factors (chi2=0.18, P=0.98). Our examination of the factorial structure of the WOMAC pain scale failed to support a single factor and internal consistency analysis yielded a coefficient less than optimal for individual patient use. An alternate strategy to summing the five-item responses when considering individual patient application would be to interpret item responses separately or to sum only those items which display homogeneity.
NASA Astrophysics Data System (ADS)
Slater, Stephanie
2009-05-01
The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M
2014-07-01
The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj
2012-03-05
Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.
Development of an Easy-to-Use Tool for the Assessment of Emergency Department Physical Design.
Majidi, Alireza; Tabatabaey, Ali; Motamed, Hassan; Motamedi, Maryam; Forouzanfar, Mohammad Mehdi
2014-01-01
Physical design of the emergency department (ED) has an important effect on its role and function. To date, no guidelines have been introduced to set the standards for the construction of EDs in Iran. In this study, we aim to devise an easy-to-use tool based on the available literature and expert opinion for the quick and effective assessment of EDs in regards to their physical design. For this purpose, based on current literature on emergency design, a comprehensive checklist was developed. Then, this checklist was analyzed by a panel consisting of heads of three major EDs and contradicting items were decided. 178 crude items were derived from available literature. The Items were categorized in to three major domains of Physical space, Equipment, and Accessibility. The final checklist approved by the panel consisted of 163 items categorized into six domains. Each item was phrased as a "Yes or No" question for ease of analysis, meaning that the criterion is either met or not.
Ashley, Laura; Smith, Adam B; Keding, Ada; Jones, Helen; Velikova, Galina; Wright, Penny
2013-12-01
To provide new insights into the psychometrics of the revised Illness Perception Questionnaire (IPQ-R) in cancer patients. To undertake, for the first time using data from breast, colorectal and prostate cancer patients, a confirmatory factor analysis (CFA) to assess the validity of the IPQ-R's core seven-factor structure. Also, for the first time in any illness group, to undertake Rasch analysis to explore the extent to which the IPQ-R factors form unidimensional scales, with linear measurement properties and no Differential Item Functioning (DIF). Patients with potentially curable breast, colorectal or prostate cancer, within 6months post-diagnosis, completed the IPQ-R online (N=531). CFA was conducted, including multi-sample analysis, and for each IPQ-R factor fit to the Rasch model was assessed by examining, amongst other things, item fit, DIF and unidimensionality. The CFA showed a moderate fit of the data to the IPQ-R model, and stability across diagnosis, although fit was significantly improved following the removal of selected items. All seven factors achieved fit to the Rasch model, and exhibited unidimensionality and minimal DIF, although in most cases this was after some item rescoring and/or deletion. In both analyses, IPQ-R items 12, 18 and 24 were indicated as misfitting and removed. Given the rigorous standard of Rasch measurement, and the generic nature of the IPQ-R, it stood up well to the demands of the Rasch model in this study. Importantly, the results show that with some relatively minor, pragmatic modifications the IPQ-R could possess Rasch-standard measurement in cancer patients. © 2013.
Rasch analysis of the Patient Rated Elbow Evaluation questionnaire.
Vincent, Joshua I; MacDermid, Joy C; King, Graham J W; Grewal, Ruby
2015-06-20
The Patient Rated Elbow Evaluation (PREE) was developed as an elbow joint specific measure of pain and disability and validated with classical psychometric methods. More recently, Rasch analysis has contributed new methods for analyzing the clinical measurement properties of self-report outcome measures. The objective of the study was to determine aspects of validity of the PREE using the Rasch model to assess the overall fit of the PREE data, the response scaling, individual item fit, differential item functioning (DIF), local dependency, unidimensionality and person separation index (PSI). A convenience sample of 236 patients (Age range 21-79 years; M: F- 97:139) with elbow disorders were recruited from the Roth│McFarlane Hand and Upper Limb Centre, London, Ontario, Canada. The baseline scores of the PREE were used. Rasch analysis was conducted using RUMM 2030 software on the 3 sub scales of the PREE separately. The 3 sub scales showed misfit initially with disordered thresholds on17 out of 20 items), uniform DIF was observed for two items ("Carrying a 10lbs object" from specific activities subscale for age group; and "household work" from the usual activities subscale for gender); multidimensionality and local dependency. The Pain subscale satisfied Rasch expectations when item 2 "Pain - At rest" was split for age group, while the usual activities subscale readily stood up to Rasch requirements when the item 2 "household work" was split for gender. The specific activities subscale demonstrated fit to the Rasch model when sub test analysis accounted for local dependency. All three subscales of the PREE were well targeted and had high reliability (PSI >0.80). The three subscales of the PREE appear to be robust when tested against the Rasch model when subject to a few alterations. The value of changing the 0-10 format is questionable given its widespread use; further Rasch-based analysis of whether these findings are stable in other samples is warranted.
Independent Orbiter Assessment (IOA): Analysis of the guidance, navigation, and control subsystem
NASA Technical Reports Server (NTRS)
Trahan, W. H.; Odonnell, R. A.; Pietz, K. C.; Hiott, J. M.
1986-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) is presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. The independent analysis results corresponding to the Orbiter Guidance, Navigation, and Control (GNC) Subsystem hardware are documented. The function of the GNC hardware is to respond to guidance, navigation, and control software commands to effect vehicle control and to provide sensor and controller data to GNC software. Some of the GNC hardware for which failure modes analysis was performed includes: hand controllers; Rudder Pedal Transducer Assembly (RPTA); Speed Brake Thrust Controller (SBTC); Inertial Measurement Unit (IMU); Star Tracker (ST); Crew Optical Alignment Site (COAS); Air Data Transducer Assembly (ADTA); Rate Gyro Assemblies; Accelerometer Assembly (AA); Aerosurface Servo Amplifier (ASA); and Ascent Thrust Vector Control (ATVC). The IOA analysis process utilized available GNC hardware drawings, workbooks, specifications, schematics, and systems briefs for defining hardware assemblies, components, and circuits. Each hardware item was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Independent Orbiter Assessment (IOA): Analysis of the manned maneuvering unit
NASA Technical Reports Server (NTRS)
Bailey, P. S.
1986-01-01
Results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items (PCIs). To preserve indepedence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Manned Maneuvering Unit (MMU) hardware. The MMU is a propulsive backpack, operated through separate hand controllers that input the pilot's translational and rotational maneuvering commands to the control electronics and then to the thrusters. The IOA analysis process utilized available MMU hardware drawings and schematics for defining hardware subsystems, assemblies, components, and hardware items. Final levels of detail were evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the worst case severity of the effect for each identified failure mode. The IOA analysis of the MMU found that the majority of the PCIs identified are resultant from the loss of either the propulsion or control functions, or are resultant from inability to perform an immediate or future mission. The five most severe criticalities identified are all resultant from failures imposed on the MMU hand controllers which have no redundancy within the MMU.
Decisions that Make a Difference in Detecting Differential Item Functioning
ERIC Educational Resources Information Center
Sireci, Stephen G.; Rios, Joseph A.
2013-01-01
There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions…
Examining Differential Math Performance by Gender and Opportunity to Learn
ERIC Educational Resources Information Center
Albano, Anthony D.; Rodriguez, Michael C.
2013-01-01
Although a substantial amount of research has been conducted on differential item functioning in testing, studies have focused on detecting differential item functioning rather than on explaining how or why it may occur. Some recent work has explored sources of differential functioning using explanatory and multilevel item response models. This…
Gao, Yong; Zhu, Weimo
2011-05-01
The purpose of this study was to identify subgroup-sensitive physical activities (PA) using differential item functioning (DIF) analysis. A sub-unweighted sample of 1857 (men=923 and women=934) from the 2003-2004 National Health and Nutrition Examination Survey PA questionnaire data was used for the analyses. Using the Mantel-Haenszel, the simultaneous item bias test, and the ANOVA DIF methods, 33 specific leisure-time moderate and/or vigorous PA (MVPA) items were analyzed for DIF across race/ethnicity, gender, education, income, and age groups. Many leisure-time MVPA items were identified as large DIF items. When participating in the same amount of leisure-time MVPA, non-Hispanic blacks were more likely to participate in basketball and dance activities than non-Hispanic whites (NHW); NHW were more likely to participated in golf and hiking than non-Hispanic blacks; Hispanics were more likely to participate in dancing, hiking, and soccer than NHW, whereas NHW were more likely to engage in bicycling, golf, swimming, and walking than Hispanics; women were more likely to participate in aerobics, dancing, stretching, and walking than men, whereas men were more likely to engage in basketball, fishing, golf, running, soccer, weightlifting, and hunting than women; educated persons were more likely to participate in jogging and treadmill exercise than less educated persons; persons with higher incomes were more likely to engage in golf than those with lower incomes; and adults (20-59 yr) were more likely to participate in basketball, dancing, jogging, running, and weightlifting than older adults (60+ yr), whereas older adults were more likely to participate in walking and golf than younger adults. DIF methods are able to identify subgroup-sensitive PA and thus provide useful information to help design group-sensitive, targeted interventions for disadvantaged PA subgroups. © 2011 by the American College of Sports Medicine
Wickert, Natasha M; Wong Riff, Karen W Y; Mansour, Mark; Forrest, Christopher R; Goodacre, Timothy E E; Pusic, Andrea L; Klassen, Anne F
2018-01-01
Objective The aim of this systematic review was to identify patient-reported outcome (PRO) instruments used in research with children/youth with conditions associated with facial differences to identify the health concepts measured. Design MEDLINE, EMBASE, CINAHL, and PsycINFO were searched from 2004 to 2016 to identify PRO instruments used in acne vulgaris, birthmarks, burns, ear anomalies, facial asymmetries, and facial paralysis patients. We performed a content analysis whereby the items were coded to identify concepts and categorized as positive or negative content or phrasing. Results A total of 7,835 articles were screened; 6 generic and 11 condition-specific PRO instruments were used in 96 publications. Condition-specific instruments were for acne (four), oral health (two), dermatology (one), facial asymmetries (two), microtia (one), and burns (one). The PRO instruments provided 554 items (295 generic; 259 condition specific) that were sorted into 4 domains, 11 subdomains, and 91 health concepts. The most common domain was psychological (n = 224 items). Of the identified items, 76% had negative content or phrasing (e.g., "Because of the way my face looks I wish I had never been born"). Given the small number of items measuring facial appearance (n = 19) and function (n = 22), the PRO instruments reviewed lacked content validity for patients whose condition impacted facial function and/or appearance. Conclusions Treatments can change facial appearance and function. This review draws attention to a problem with content validity in existing PRO instruments. Our team is now developing a new PRO instrument called FACE-Q Kids to address this problem.
Identifying Country-Specific Cultures of Physics Education: A differential item functioning approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes
2012-11-01
In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.
Nguyen, Kim-Huong; Mulhern, Brendan; Kularatna, Sanjeewa; Byrnes, Joshua; Moyle, Wendy; Comans, Tracy
2017-01-25
With an ageing population, the number of people with dementia is rising. The economic impact on the health care system is considerable and new treatment methods and approaches to dementia care must be cost effective. Economic evaluation requires valid patient reported outcome measures, and this study aims to develop a dementia-specific health state classification system based on the Quality of Life for Alzheimer's disease (QOL-AD) instrument (nursing home version). This classification system will subsequently be valued to generate a preference-based measure for use in the economic evaluation of interventions for people with dementia. We assessed the dimensionality of the QOL-AD to develop a new classification system. This was done using exploratory and confirmatory factor analysis and further assessment of the structure of the measure to ensure coverage of the key areas of quality of life. Secondly, we used Rasch analysis to test the psychometric performance of the items, and select item(s) to describe each dimension. This was done on 13 items of the QOL-AD (excluding two general health items) using a sample of 284 residents living in long-term care facilities in Australia who had a diagnosis of dementia. A five dimension classification system is proposed resulting from the three factor structure (defined as 'interpersonal environment', 'physical health' and 'self-functioning') derived from the factor analysis and two factors ('memory' and 'mood') from the accompanying review. For the first three dimensions, Rasch analysis selected three questions of the QOL-AD ('living situation', 'physical health', and 'do fun things') with memory and mood questions representing their own dimensions. The resulting classification system (AD-5D) includes many of the health-related quality of life dimensions considered important to people with dementia, including mood, global function and skill in daily living. The development of the AD-5D classification system is an important step in the future application of the widely used QOL-AD in economic evaluations. Future valuation studies will enable this tool to be used to calculate quality adjusted life years to evaluate treatments and interventions for people diagnosed with mild to moderate dementia.
ERIC Educational Resources Information Center
Holweger, Nancy; Taylor, Grace
The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model
ERIC Educational Resources Information Center
Wang, Wen-Chung; Wilson, Mark
2005-01-01
This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
The Effects of Testlets on Reliability and Differential Item Functioning
ERIC Educational Resources Information Center
Teker, Gulsen Tasdelen; Dogan, Nuri
2015-01-01
Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…
MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Shih, Ching-Lin
2010-01-01
Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
ERIC Educational Resources Information Center
Myers, Nicholas D.; Wolfe, Edward W.; Feltz, Deborah L.; Penfield, Randall D.
2006-01-01
This study (a) provided a conceptual introduction to differential item functioning (DIF), (b) introduced the multifaceted Rasch rating scale model (MRSM) and an associated statistical procedure for identifying DIF in rating scale items, and (c) applied this procedure to previously collected data from American coaches who responded to the coaching…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Li, Johnson
2013-01-01
The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
ERIC Educational Resources Information Center
Bilir, Mustafa Kuzey
2009-01-01
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
A Comparison of Two Area Measures for Detecting Differential Item Functioning.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
1991-01-01
The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)
Impressions of functional food consumers.
Saher, Marieke; Arvola, Anne; Lindeman, Marjaana; Lähteenmäki, Liisa
2004-02-01
Functional foods provide a new way of expressing healthiness in food choices. The objective of this study was to apply an indirect measure to explore what kind of impressions people form of users of functional foods. Respondents (n=350) received one of eight versions of a shopping list and rated the buyer of the foods on 66 bipolar attributes on 7-point scales. The shopping lists had either healthy or neutral background items, conventional or functional target items and the buyer was described either as a 40-year-old woman or man. The attribute ratings revealed three factors: disciplined, innovative and gentle. Buyers with healthy background items were perceived as more disciplined than those having neutral items on the list, users of functional foods were rated as more disciplined than users of conventional target items only when the background list consisted of neutral items. Buyers of functional foods were regarded as more innovative and less gentle, but gender affected the ratings on gentle dimension. The impressions of functional food users clearly differ from those formed of users of conventional foods with a healthy image. The shopping list method performed well as an indirect method, but further studies are required to test its feasibility in measuring other food-related impressions.
Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian
2018-01-01
Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
Plasma Interactions With Spacecraft
2009-04-01
software core 3 Table 2. N2kDB classes 8 Table 3. N2kDB Application Programmer Interface 11 Table 4. How to get number of items from N2kDB 14 Table 5...grid, timesteps, and pages of particles. Table 4 specifies how these functions are used to get useful quantities. The Getcount function gets the...number of items with data item names that start with the specified string. 13 Table 4. How to get number of items from N2kDB. Function Specifics
ERIC Educational Resources Information Center
Brekke, Beverly W.; And Others
A 40-item behavior analysis task, the Menstrual Care Scale, was developed and tested with 75 randomly selected institutionalized severely retarded women (13-59 years old). The need for developing personal care skills in menstruation habits had been identified as a priority area for sexuality instruction by staff and confirmed by analysis of…
Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J
2004-01-01
Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
2018-07-01
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Paz, Sylvia H; Jones, Loretta; Calderón, José L; Hays, Ron D
2017-02-01
Depression and physical function are particularly important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) physical function item bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. The aim of this study was to estimate the readability of the GDS and PROMIS ® physical function items and to assess their comprehensibility using a sample of African American and Latino elderly. Readability was estimated using the Flesch-Kincaid and Flesch Reading Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS ® items by minority elderly was evaluated with 30 cognitive interviews. Readability estimates of a number of items in English and Spanish of the GDS and PROMIS ® physical functioning items exceed the U.S. recommended 5th-grade threshold for vulnerable populations, or were rated as 'fairly difficult', 'difficult', or 'very difficult' to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS ® items was considered confusing, and interpreting responses was problematic because they were based on using physical aids. Problems with item wording and response options of the GDS and PROMIS ® physical function items may reduce reliability and validity of measurement when used with minority elderly.
ERIC Educational Resources Information Center
Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.
2006-01-01
In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
ERIC Educational Resources Information Center
Choi, Jinnie
2017-01-01
This article reviews PROC IRT, which was added to Statistical Analysis Software in 2014. We provide an introductory overview of a free version of SAS, describe what PROC IRT offers for item response theory (IRT) analysis and how one can use PROC IRT, and discuss how other SAS macros and procedures may compensate the IRT functionalities of PROC IRT.
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus
2018-04-01
The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Cubaka, Vincent Kalumire; Schriver, Michael; Vedsted, Peter; Makoul, Gregory; Kallestrup, Per
2018-04-23
To identify, adapt and validate a measure for providers' communication and interpersonal skills in Rwanda. After selection, translation and piloting of the measure, structural validity, test-retest reliability, and differential item functioning were assessed. Identification and adaptation: The 14-item Communication Assessment Tool (CAT) was selected and adapted. Content validation found all items highly relevant in the local context except two, which were retained upon understanding the reasoning applied by patients. Eleven providers and 291 patients were involved in the field-testing. Confirmatory factor analysis showed a good fit for the original one factor model. Test-retest reliability assessment revealed a mean quadratic weighted Kappa = 0.81 (range: 0.69-0.89, N = 57). The average proportion of excellent scores was 15.7% (SD: 24.7, range: 9.9-21.8%, N = 180). Differential item functioning was not observed except for item 1, which focuses on greetings, for age groups (p = 0.02, N = 180). The Kinyarwanda version of CAT (K-CAT) is a reliable and valid patient-reported measure of providers' communication and interpersonal skills. K-CAT was validated on nurses and its use on other types of providers may require further validation. K-CAT is expected to be a valuable feedback tool for providers in practice and in training. Copyright © 2018 Elsevier B.V. All rights reserved.
Measuring Psychobiosocial States in Sport: Initial Validation of a Trait Measure
Bertollo, Maurizio; Ruiz, Montse C.; Bortoli, Laura
2016-01-01
We examined the item characteristics, the factor structure, and the concurrent validity of a trait measure of psychobiosocial states. In Study 1, Italian athletes (N = 342, 228 men, 114 women, Mage = 23.93, SD = 6.64) rated the intensity, the frequency, and the perceived impact dimensions of a psychobiosocial states scale, trait version (PBS-ST), which is composed of 20 items (10 functional and 10 dysfunctional) referring to how they usually felt before an important competition. In Study 2, the scale was cross validated in an independent sample (N = 251, 181 men, 70 women, Mage = 24.35, SD = 7.25). The concurrent validity of the PBS-ST scale scores were also examined in comparison with two sport-specific emotion-related measures and a general measure of affect. Exploratory structural equation modeling and confirmatory factor analysis of the data of Study 1 showed that a 2-factor, 15-item solution of the PBS-ST scale (8 functional items and 7 dysfunctional items) reached satisfactory fit indices for the three dimensions (i.e., intensity, frequency, and perceived impact). Results of Study 2 provided evidence of substantial measurement and structural invariance of all dimensions across samples. The low association of the PBS-ST scale with other measures suggests that the scale taps unique constructs. Findings of the two studies offer initial validity evidence for a sport-specific tool to measure psychobiosocial states. PMID:27907111
Gambling-Related Cognition Scale (GRCS): Are skills-based games at a disadvantage?
Lévesque, David; Sévigny, Serge; Giroux, Isabelle; Jacques, Christian
2017-09-01
The Gambling-Related Cognition Scale (GRCS; Raylu & Oei, 2004) was developed to evaluate gambling-related cognitive distortions for all types of gamblers, regardless of their gambling activities (poker, slot machine, etc.). It is therefore imperative to ascertain the validity of its interpretation across different types of gamblers; however, some skills-related items endorsed by players could be interpreted as a cognitive distortion despite the fact that they play skills-related games. Using an intergroup (168 poker players and 73 video lottery terminal [VLT] players) differential item functioning (DIF) analysis, this study examined the possible manifestation of item biases associated with the GRCS. DIF was analyzed with ordinal logistic regressions (OLRs) and Ramsay's (1991) nonparametric kernel smoothing approach with TestGraf. Results show that half of the items display at least moderate DIF between groups and, depending on the type of analysis used, 3 to 7 items displayed large DIF. The 5 items with the most DIF were more significantly endorsed by poker players (uniform DIF) and were all related to skills, knowledge, learning, or probabilities. Poker players' interpretations of some skills-related items may lead to an overestimation of their cognitive distortions due to their total score increased by measurement artifact. Findings indicate that the current structure of the GRCS contains potential biases to be considered when poker players are surveyed. The present study conveys new and important information on bias issues to ponder carefully before using and interpreting the GRCS and other similar wide-range instruments with poker players. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Statistical power as a function of Cronbach alpha of instrument questionnaire items.
Heo, Moonseong; Kim, Namhee; Faith, Myles S
2015-10-14
In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.
EXTENDING THE FLOOR AND THE CEILING FOR ASSESSMENT OF PHYSICAL FUNCTION
Fries, James F.; Lingala, Bharathi; Siemons, Liseth; Glas, Cees A. W.; Cella, David; Hussain, Yusra N; Bruce, Bonnie; Krishnan, Eswar
2014-01-01
Objective The objective of the current study was to improve the assessment of physical function by improving the precision of assessment at the floor (extremely poor function) and at the ceiling (extremely good health) of the health continuum. Methods Under the NIH PROMIS program, we developed new physical function floor and ceiling items to supplement the existing item bank. Using item response theory (IRT) and the standard PROMIS methodology, we developed 30 floor items and 26 ceiling items and administered them during a 12-month prospective observational study of 737 individuals at the extremes of health status. Change over time was compared across anchor instruments and across items by means of effect sizes. Using the observed changes in scores, we back-calculated sample size requirements for the new and comparison measures. Results We studied 444 subjects with chronic illness and/or extreme age, and 293 generally fit subjects including athletes in training. IRT analyses confirmed that the new floor and ceiling items outperformed reference items (p<0.001). The estimated post-hoc sample size requirements were reduced by a factor of two to four at the floor and a factor of two at the ceiling. Conclusion Extending the range of physical function measurement can substantially improve measurement quality, can reduce sample size requirements and improve research efficiency. The paradigm shift from Disability to Physical Function includes the entire spectrum of physical function, signals improvement in the conceptual base of outcome assessment, and may be transformative as medical goals more closely approach societal goals for health. PMID:24782194
Barile, John P.; Reeve, Bryce B.; Smith, Ashley Wilder; Zack, Matthew M.; Mitchell, Sandra A.; Kobau, Rosemarie; Cella, David F.; Luncheon, Cecily; Thompson, William W.
2015-01-01
Purpose Healthy People 2020 identified health-related quality of life and well-being (WB) as indicators of population health for the next decade. This study examined the measurement properties of the NIH PROMIS® Global Health Scale, the CDC Healthy Days items, and associations with the Satisfaction with Life Scale. Methods A total of 4,184 adults completed the Porter Novelli's HealthStyles mailed survey. Physical and mental health (9 items from PROMIS Global Scale and 3 items from CDC Healthy days measure), and 4 WB factor items were tested for measurement equivalence using multiple-group confirmatory factor analysis. Results The CDC items accounted for similar variance as the PROMIS items on physical and mental health factors; both factors were moderately correlated with WB. Measurement invariance was supported across gender and age; the magnitude of some factor loadings differed between those with and without a chronic medical condition. Conclusions The PROMIS, CDC, and WB items all performed well. The PROMIS items captured a broad range of functioning across the entire continuum of physical and mental health, while the CDC items appear appropriate for assessing burden of disease for chronic conditions and are brief and easily interpretable. All three measures under study appear to be appropriate measures for monitoring several aspects of the Healthy People 2020 goals and objectives. PMID:23404737
Barile, John P; Reeve, Bryce B; Smith, Ashley Wilder; Zack, Matthew M; Mitchell, Sandra A; Kobau, Rosemarie; Cella, David F; Luncheon, Cecily; Thompson, William W
2013-08-01
Healthy People 2020 identified health-related quality of life and well-being (WB) as indicators of population health for the next decade. This study examined the measurement properties of the NIH PROMIS(®) Global Health Scale, the CDC Healthy Days items, and associations with the Satisfaction with Life Scale. A total of 4,184 adults completed the Porter Novelli's HealthStyles mailed survey. Physical and mental health (9 items from PROMIS Global Scale and 3 items from CDC Healthy days measure), and 4 WB factor items were tested for measurement equivalence using multiple-group confirmatory factor analysis. The CDC items accounted for similar variance as the PROMIS items on physical and mental health factors; both factors were moderately correlated with WB. Measurement invariance was supported across gender and age; the magnitude of some factor loadings differed between those with and without a chronic medical condition. The PROMIS, CDC, and WB items all performed well. The PROMIS items captured a broad range of functioning across the entire continuum of physical and mental health, while the CDC items appear appropriate for assessing burden of disease for chronic conditions and are brief and easily interpretable. All three measures under study appear to be appropriate measures for monitoring several aspects of the Healthy People 2020 goals and objectives.
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2011-01-01
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
ERIC Educational Resources Information Center
Ajeigbe, Taiwo Oluwafemi; Afolabi, Eyitayo Rufus Ifedayo
2017-01-01
This study assessed unidimensionality and occurrence of Differential Item Functioning (DIF) in Mathematics and English Language items of Osun State Qualifying Examination. The study made use of secondary data. The results showed that OSQ Mathematics (-0.094 = r = 0.236) and English Language items (-0.095 = r = 0.228) were unidimensional. Also,…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.
ERIC Educational Resources Information Center
Kaskowitz, Gary S.; De Ayala, R. J.
2001-01-01
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Doong, Shing H.
2009-01-01
The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…
ERIC Educational Resources Information Center
Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.
2012-01-01
Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
Bode, Rita K.; Heinemann, Allen W.; Butt, Zeeshan; Stallings, Jena; Taylor, Caitlin; Rowe, Morgan; Roth, Elliot J.
2013-01-01
Bode RK, Heinemann AW, Butt Z, Stallings J, Taylor C, Rowe M, Roth EJ. Development and validation of participation and positive psychologic function measures for stroke survivors. Objective To evaluate the reliability and validity of Neurologic Quality of Life (NeuroQOL) item banks that assess quality-of-life (QOL) domains not typically included in poststroke measures. Design Secondary analysis of item responses to selected NeuroQOL domains. Setting Community. Participants Community-dwelling stroke survivors (n=111) who were at least 12 months poststroke. Interventions Not applicable. Main Outcome Measures Five measures developed for 3 NeuroQoL domains: ability to participate in social activities, satisfaction with participation in social activities, and positive psychologic function. Results A single bank was developed for the positive psychologic function domain, but 2 banks each were developed for the ability-to-participate and satisfaction-with-participation domains. The resulting item banks showed good psychometric properties and external construct validity with correlations with the legacy instruments, ranging from .53 to .71. Using these measures, stroke survivors in this sample reported an overall high level of QOL. Conclusions The NeuroQoL-derived measures are promising and valid methods for assessing aspects of QOL not typically measured in this population. PMID:20801251
Bridges, Susan M; Parthasarathy, Divya S; Au, Terry K F; Wong, Hai Ming; Yiu, Cynthia K Y; McGrath, Colman P
2014-01-01
This paper describes the development of a new literacy assessment instrument, the Hong Kong Oral Health Literacy Assessment Task for Paediatric Dentistry (HKOHLAT-P). Its relationship to literacy theory is analyzed to establish content and face validity. Implications for construct validity are examined by analyzing cognitive demand to determine how "comprehension" is measured. Key influences from literacy assessment were identified to analyze item development. Cognitive demand was analyzed using an established taxonomy. The HKOHLAT-P focuses on the functional domain of health literacy assessment. Items had strong content and face validity reflecting established principles from modern literacy theory. Inclusion of new text types signified relevant developments in the area of new literacies. Analysis of cognitive demand indicated that this instrument assesses the "comprehension" domain, specifically the areas of factual and procedural knowledge, with some assessment of conceptual knowledge. Metacognitive knowledge was not assessed. Comprehension tasks assessing patient health literacy predominantly examine functional health literacy at the lower levels of comprehension. Item development is influenced by the fields of situated and authentic literacy. Inclusion of content regarding multiliteracies is suggested for further research. Development of functional health literacy assessment instruments requires careful consideration of the clinical context in determining construct validity. © 2013 American Association of Public Health Dentistry.
NASA Technical Reports Server (NTRS)
Robinson, W. W.
1987-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the Electrical Power Distribution and Control (EPD and C)/Remote Manipulator System (RMS) hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained in the NASA FMEA/CIL documentation. This report documents the results of the independent analysis of the EPD and C/RMS (both port and starboard) hardware. The EPD and C/RMS subsystem hardware provides the electrical power and power control circuitry required to safely deploy, operate, control, and stow or guillotine and jettison two (one port and one starboard) RMSs. The EPD and C/RMS subsystem is subdivided into the four following functional divisions: Remote Manipulator Arm; Manipulator Deploy Control; Manipulator Latch Control; Manipulator Arm Shoulder Jettison; and Retention Arm Jettison. The IOA analysis process utilized available EPD and C/RMS hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based on the severity of the effect for each failure mode.
Independent Orbiter Assessment (IOA): Analysis of the displays and controls subsystem
NASA Technical Reports Server (NTRS)
Trahan, W. H.; Prust, E. E.
1987-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Displays and Controls (D and C) subsystem hardware. The function of the D and C hardware is to provide the crew with the monitor, command, and control capabilities required for management of all normal and contingency mission and flight operations. The D and C hardware for which failure modes analysis was performed consists of the following: Acceleration Indicator (G-METER); Head Up Display (HUD); Display Driver Unit (DDU); Alpha/Mach Indicator (AMI); Horizontal Situation Indicator (HSI); Attitude Director Indicator (ADI); Propellant Quantity Indicator (PQI); Surface Position Indicator (SPI); Altitude/Vertical Velocity Indicator (AVVI); Caution and Warning Assembly (CWA); Annunciator Control Assembly (ACA); Event Timer (ET); Mission Timer (MT); Interior Lighting; and Exterior Lighting. Each hardware item was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Varying the valuating function and the presentable bank in computerized adaptive testing.
Barrada, Juan Ramón; Abad, Francisco José; Olea, Julio
2011-05-01
In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.
Montpetit, Kathleen; Haley, Stephen; Bilodeau, Nathalie; Ni, Pengsheng; Tian, Feng; Gorton, George; Mulcahey, M J
2011-02-01
This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function.
Hand function evaluation: a factor analysis study.
Jarus, T; Poremba, R
1993-05-01
The purpose of this study was to investigate hand function evaluations. Factor analysis with varimax rotation was used to assess the fundamental characteristics of the items included in the Jebsen Hand Function Test and the Smith Hand Function Evaluation. The study sample consisted of 144 subjects without disabilities and 22 subjects with Colles fracture. Results suggest a four factor solution: Factor I--pinch movement; Factor II--grasp; Factor III--target accuracy; and Factor IV--activities of daily living. These categories differentiated the subjects without Colles fracture from the subjects with Colles fracture. A hand function evaluation consisting of these four factors would be useful. Such an evaluation that can be used for current clinical purposes is provided.
Validation of the Modified Fatigue Impact Scale in mild to moderate traumatic brain injury.
Schiehser, Dawn M; Delano-Wood, Lisa; Jak, Amy J; Matthews, Scott C; Simmons, Alan N; Jacobson, Mark W; Filoteo, J Vincent; Bondi, Mark W; Orff, Henry J; Liu, Lin
2015-01-01
To evaluate the validity of the Modified Fatigue Impact Scale (MFIS) in veterans with a history of mild to moderate traumatic brain injury (TBI). Veterans (N = 106) with mild (92%) or moderate (8%) TBI. Veterans Administration Health System. Factor structure, internal consistency, convergent validity, sensitivity, and specificity of the MFIS were examined. Principal component analysis identified 2 viable MFIS factors: a Cognitive subscale and a Physical/Activities subscale. Item analysis revealed high internal consistency of the MFIS Total scale and subscale items. Strong convergent validity of the MFIS scales was established with 2 Beck Depression Inventory II fatigue items. Receiver operating characteristic curve analysis revealed good to excellent accuracy of the MFIS in classifying fatigued versus nonfatigued individuals. The MFIS is a valid multidimensional measure that can be used to evaluate the impact of fatigue on cognitive and physical functioning in individuals with mild to moderate TBI. The psychometric properties of the MFIS make it useful for evaluating fatigue and provide the potential for improving research on fatigue in this population.
Peipert, John D; Bentler, Peter; Klicko, Kristi; Hays, Ron D
2018-05-14
Black dialysis patients report better health-related quality of life (HRQOL) than White patients, which may be explained if Black and White patients respond systematically differently to HRQOL survey items. We examined differential item functioning (DIF) of the Kidney Disease Quality of Life 36-item (KDQOL TM -36) Burden of Kidney Disease, Symptoms and Problems with Kidney Disease, and Effects of Kidney Disease scales between Black (n = 18,404) and White (n = 21,439) dialysis patients. We fit multiple group confirmatory factor analysis models with increasing invariance: a Configural model (invariant factor structure), a Metric model (invariant factor loadings), and a Scalar model (invariant intercepts). Criteria for invariance included non-significant χ 2 tests, > 0.002 difference in the models' CFI, and > 0.015 difference in RMSEA and SRMR. Next, starting with a fully invariant model, we freed loadings and intercepts item-by-item to determine if DIF impacted estimated KDQOL TM -36 scale means. ΔCFI was 0.006 between the metric and scalar models but was reduced to 0.001 when we freed intercepts for the burdens and symptoms and problems of kidney disease scales. In comparison to standardized means of 0 in the White group, those for the Black group on the Burdens, Symptoms and Problems, and Effects of Kidney Disease scales were 0.218, 0.061, and 0.161, respectively. When loadings and thresholds were released sequentially, differences in means between models ranged between 0.001 and 0.048. Despite some DIF, impacts on KDQOL TM -36 responses appear to be minimal. We conclude that the KDQOL TM -36 is appropriate to make substantive comparisons of HRQOL between Black and White dialysis patients.