item response analysis: Topics by Science.gov

Sample records for item response analysis

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

ERIC Educational Resources Information Center

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol

2016-01-01

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

PubMed

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
A Comparison of Measurement Equivalence Methods Based on Confirmatory Factor Analysis and Item Response Theory.

ERIC Educational Resources Information Center

Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.

Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
1999 Survey of Active Duty Personnel: Administration, Datasets, and Codebook. Appendix G: Frequency and Percentage Distributions for Variables in the Survey Analysis Files.

DTIC Science & Technology

2000-12-01

A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

ERIC Educational Resources Information Center

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions

ERIC Educational Resources Information Center

Liang, Longjuan; Browne, Michael W.

2015-01-01

If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
A Polytomous Item Response Theory Analysis of Social Physique Anxiety Scale

ERIC Educational Resources Information Center

Fletcher, Richard B.; Crocker, Peter

2014-01-01

The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…
Item Response Data Analysis Using Stata Item Response Theory Package

ERIC Educational Resources Information Center

Yang, Ji Seung; Zheng, Xiaying

2018-01-01

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Generalized Full-Information Item Bifactor Analysis

ERIC Educational Resources Information Center

Cai, Li; Yang, Ji Seung; Hansen, Mark

2011-01-01

Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of…
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).

PubMed

Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul

2018-03-01

Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

ERIC Educational Resources Information Center

Livingston, Samuel A.; Dorans, Neil J.

2004-01-01

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory.

PubMed

Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd

2017-01-01

The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory

PubMed Central

Shedden-Mora, Meike C.; Löwe, Bernd

2017-01-01

Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

PubMed

Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

2016-03-12

Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Item Response Theory Using Hierarchical Generalized Linear Models

ERIC Educational Resources Information Center

Ravand, Hamdollah

2015-01-01

Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Item response theory analysis of the Pain Self-Efficacy Questionnaire.

PubMed

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.

Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions

ERIC Educational Resources Information Center

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R.

2005-01-01

Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the…
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

PubMed

Eichenbaum, Alexander E; Marcus, David K; French, Brian F

2017-06-01

This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Environmental management performance for Brazilian industrials: measuring with the item response theory.

PubMed

Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Bornia, Antonio Cezar; de Andrade, Dalton Francisco; Campos, Lucila Maria de Souza

2012-01-01

Growing challenges with respect to preserving the environment have forced changes in company operational structures. Thus, the objective of this article is to measure the evidence of Environmental Management using the Item Response Theory, based on website analysis from Brazilian industrial companies from sectors defined through the scope of the research. This is a qualitative, exploratory, and descriptive study related to an information collection and analysis instrument. The general view of the research problem with respect to the phenomenon under study in based on multi-case studies, with the methodological outline based on the theoretical reference used. Primary data was gathered from 270 company websites from 7 different Brazilian sectors and led to the creation of 26 items approved by environmental specialists. The results were attained with the measuring of Environmental Management evidence via the Item Response Theory, providing a clear order of the items involved based on each item's level of difficulty, quality, and propriety. This permitted the measurement of each item's quality and propriety, as well as that of the respondents, placing them on the same analysis scale. Increasing the number of items and companies involved is suggested fEor future research in order to permit broader sector analysis.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

PubMed

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Measuring the quality of life in hypertension according to Item Response Theory

PubMed Central

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-01-01

ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
Affective Outcomes of Schooling: Full-Information Item Factor Analysis of a Student Questionnaire.

ERIC Educational Resources Information Center

Muraki, Eiji; Engelhard, George, Jr.

Recent developments in dichotomous factor analysis based on multidimensional item response models (Bock and Aitkin, 1981; Muthen, 1978) provide an effective method for exploring the dimensionality of questionnaire items. Implemented in the TESTFACT program, this "full information" item factor analysis accounts not only for the pairwise joint…
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Psychometric properties of the Chinese version of resilience scale specific to cancer: an item response theory analysis.

PubMed

Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong

2018-06-01

Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Careful with Those Priors: A Note on Bayesian Estimation in Two-Parameter Logistic Item Response Theory Models

ERIC Educational Resources Information Center

Marcoulides, Katerina M.

2018-01-01

This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis

ERIC Educational Resources Information Center

Edwards, Michael C.

2010-01-01

Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show…
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

ERIC Educational Resources Information Center

Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

2018-01-01

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Update on the Child's Challenging Behaviour Scale following evaluation using Rasch analysis.

PubMed

Bourke-Taylor, H M; Pallant, J F; Law, M

2014-03-01

The Child's Challenging Behaviour Scale (CCBS) was designed to measure a mother's rating of her child's challenging behaviours. The CCBS was initially developed for mothers of school-aged children with developmental disability and has previously been shown to have good psychometric properties using classical test theory techniques. The aim of this study was to use Rasch analysis to fully evaluate all aspects of the scale, including response format, item fit, dimensionality and targeting. The sample consisted of 152 mothers of a school-aged child (aged 5-18 years) with a disability. Mothers were recruited via websites and mail-out newsletters through not-for-profit organizations that supported families with disabilities. Respondents completed a survey which included the 11 items of the CCBS. Rasch analysis was conducted on these responses using the RUMM2030 package. Rasch analysis of the CCBS revealed serious threshold disordering for nine of the 11 items, suggesting problems with the 5-point response format used for the scale. The neutral midpoint of the response format was subsequently removed to create a 4-point scale. High levels of local dependency were detected among two pairs of items, resulting in the removal of two items (item 7 and item 1). The final nine-item version of the scale (CCBS Version 2) was unidimensional, well targeted, showed good fit to the Rasch model, and strong internal consistency. To achieve fit to the Rasch model it was necessary to make two modifications to the CCBS scale. The resulting nine-item scale with a 4-point response format showed excellent psychometric properties, supporting its internal validity. © 2013 John Wiley & Sons Ltd.
Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models

ERIC Educational Resources Information Center

Sulis, Isabella; Toland, Michael D.

2017-01-01

Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Differential item functioning analysis of the Vanderbilt Expertise Test for cars

PubMed Central

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

PubMed

Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

2013-11-01

To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Least Squares Distance Method of Cognitive Validation and Analysis for Binary Items Using Their Item Response Theory Parameters

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2007-01-01

The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…

A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

PubMed

Massof, Robert W

2014-10-01

A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Evaluation of Internal Construct Validity and Unidimensionality of the Brachial Assessment Tool, A Patient-Reported Outcome Measure for Brachial Plexus Injury.

PubMed

Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea

2016-12-01

To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright Â© 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.

PubMed

Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A

2017-09-01

The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
The Application of Strength of Association Statistics to the Item Analysis of an In-Training Examination in Diagnostic Radiology.

ERIC Educational Resources Information Center

Diamond, James J.; McCormick, Janet

1986-01-01

Using item responses from an in-training examination in diagnostic radiology, the application of a strength of association statistic to the general problem of item analysis is illustrated. Criteria for item selection, general issues of reliability, and error of measurement are discussed. (Author/LMO)
Generalized Full-Information Item Bifactor Analysis

PubMed Central

Cai, Li; Yang, Ji Seung; Hansen, Mark

2011-01-01

Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682
Bayesian Analysis of Multidimensional Item Response Theory Models: A Discussion and Illustration of Three Response Style Models

ERIC Educational Resources Information Center

Leventhal, Brian C.; Stone, Clement A.

2018-01-01

Interest in Bayesian analysis of item response theory (IRT) models has grown tremendously due to the appeal of the paradigm among psychometricians, advantages of these methods when analyzing complex models, and availability of general-purpose software. Possible models include models which reflect multidimensionality due to designed test structure,…
An Analysis of Variance Approach for the Estimation of Response Time Distributions in Tests

ERIC Educational Resources Information Center

Attali, Yigal

2010-01-01

Generalizability theory and analysis of variance methods are employed, together with the concept of objective time pressure, to estimate response time distributions and the degree of time pressure in timed tests. By estimating response time variance components due to person, item, and their interaction, and fixed effects due to item types and…
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

PubMed

Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

2018-02-01

The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

ERIC Educational Resources Information Center

Tsutakawa, Robert K.; Lin, Hsin Ying

Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
Detection of Differential Item Functioning Using the Lasso Approach

ERIC Educational Resources Information Center

Magis, David; Tuerlinckx, Francis; De Boeck, Paul

2015-01-01

This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Practical Guide to Conducting an Item Response Theory Analysis

ERIC Educational Resources Information Center

Toland, Michael D.

2014-01-01

Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…
Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

PubMed Central

2012-01-01

Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12) – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension. Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware. PMID:22686586
Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

PubMed

Stochl, Jan; Jones, Peter B; Croudace, Tim J

2012-06-11

Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension.Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware.
Item Parameter Estimation for the MIRT Model: Bias and Precision of Confirmatory Factor Analysis-Based Models

ERIC Educational Resources Information Center

Finch, Holmes

2010-01-01

The accuracy of item parameter estimates in the multidimensional item response theory (MIRT) model context is one that has not been researched in great detail. This study examines the ability of two confirmatory factor analysis models specifically for dichotomous data to properly estimate item parameters using common formulae for converting factor…
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

PubMed Central

Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

2014-01-01

We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
[Development and evaluation of the reliability and validity of an empowerment scale for health promotion volunteers].

PubMed

Koyama, Utako; Murayama, Nobuko

2011-08-01

This qualitative and quantitative research was conducted to develop an empowerment scale for health promotion volunteers (hereinafter referred to as the ESFHPV), key persons responsible for creating healthy communities. A focus group interview was conducted with four groups of health promotion volunteers from two cities in S Public Health Center of N Prefecture. A qualitative analysis was employed and a 32-item draft scale was created. The reliability and validity of this scale were then evaluated using quantitative methods. A questionnaire survey was conducted in 2009 for all 660 health promotion volunteers across the 2 cities. Of 401 respondents (response rate, 60.8%), 356 (53.9%) provided valid responses and were thus included in the analysis. 1) Internal consistency was confirmed by item-total correlation analysis (I-T analysis), assessment of Cronbach's coefficient alpha for all except one item and good-poor analysis (G-P analysis). Four items were excluded from the 32-item draft scale because of correlation coefficients more than 0.7, leaving 28 items for analysis. 2) Based on the results obtained from the factor analysis performed on the 28 provisional empowerment questions, 28 items were chosen for inclusion in the ESFHPV. These items consisted of four sub-scales, namely 'activity for healthy community' (10 items), 'intention for solving health problems of the community' (10 items), 'democratic organization activity' (four items) and 'growth as individual health promotion volunteers' (four items). 3) The Cronbach's coefficient alpha for the ESFHPV and its four sub-scales were 0.93, 0.88, 0.89, 0.84 and 0.79 respectively. The coefficients of I-T analysis were between 0.33 and 0.69. 4) The health promotion volunteers who attended other community activities demonstrated significantly high scores for the ESFHPV and the four sub-scales. Persons who were above 60 years, had a longer duration of activity as a health promotion volunteer and were housewives showed significantly high scores on the first sub-scale, 'growth as individual health promotion volunteers' To measure the empowerment levels of health promotion volunteers, a 28-item scale was developed and its reliability and validity were confirmed. Health promotion volunteers as well as the public health nurses who assist them can use this scale to assess the empowerment levels of other health promotion volunteers.
Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

ERIC Educational Resources Information Center

Kopf, Julia; Zeileis, Achim; Strobl, Carolin

2015-01-01

Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…
Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU-QOL) instrument.

PubMed

Gorecki, C; Lamping, D L; Nixon, J; Brown, J M; Cano, S

2012-04-01

Pretesting is key in the development of patient-reported outcome (PRO) instruments. We describe a mixed-methods approach based on interviews and Rasch measurement methods in the pretesting of the Pressure Ulcer Quality of Life (PU-QOL) instrument. We used cognitive interviews to pretest the PU-QOL in 35 patients with pressure ulcers with the view to identifying problematic items, followed by Rasch analysis to examine response options, appropriateness of the item series and biases due to question ordering (item fit). We then compared findings in an interactive and iterative process to identify potential strengths and weaknesses of PU-QOL items, and guide decision-making about further revisions to items and design/layout. Although cognitive interviews largely supported items, they highlighted problems with layout, response options and comprehension. Findings from the Rasch analysis identified problems with response options through reversed thresholds. The use of a mixed-methods approach in pretesting the PU-QOL instrument proved beneficial for identifying problems with scale layout, response options and framing/wording of items. Rasch measurement methods are a useful addition to standard qualitative pretesting for evaluating strengths and weaknesses of early stage PRO instruments.
Data Visualization of Item-Total Correlation by Median Smoothing

ERIC Educational Resources Information Center

Yu, Chong Ho; Douglas, Samantha; Lee, Anna; An, Min

2016-01-01

This paper aims to illustrate how data visualization could be utilized to identify errors prior to modeling, using an example with multi-dimensional item response theory (MIRT). MIRT combines item response theory and factor analysis to identify a psychometric model that investigates two or more latent traits. While it may seem convenient to…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…

Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

ERIC Educational Resources Information Center

Parish, Jane A.; Karisch, Brandi B.

2013-01-01

Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
The construct validity of the Major Depression Inventory: A Rasch analysis of a self-rating scale in primary care.

PubMed

Nielsen, Marie Germund; Ørnbøl, Eva; Vestergaard, Mogens; Bech, Per; Christensen, Kaj Sparle

2017-06-01

We aimed to assess the measurement properties of the ten-item Major Depression Inventory when used on clinical suspicion in general practice by performing a Rasch analysis. General practitioners asked consecutive persons to respond to the web-based Major Depression Inventory on clinical suspicion of depression. We included 22 practices and 245 persons. Rasch analysis was performed using RUMM2030 software. The Rasch model fit suggests that all items contribute to a single underlying trait (defined as internal construct validity). Mokken analysis was used to test dimensionality and scalability. Our Rasch analysis showed misfit concerning the sleep and appetite items (items 9 and 10). The response categories were disordered for eight items. After modifying the original six-point to a four-point scoring system for all items, we achieved ordered response categories for all ten items. The person separation reliability was acceptable (0.82) for the initial model. Dimensionality testing did not support combining the ten items to create a total score. The scale appeared to be well targeted to this clinical sample. No significant differential item functioning was observed for gender, age, work status and education. The Rasch and Mokken analyses revealed two dimensions, but the Major Depression Inventory showed fit to one scale if items 9 and 10 were excluded. Our study indicated scalability problems in the current version of the Major Depression Inventory. The conducted analysis revealed better statistical fit when items 9 and 10 were excluded. Copyright © 2017 Elsevier Inc. All rights reserved.
A Generalized Partial Credit Model: Application of an EM Algorithm.

ERIC Educational Resources Information Center

Muraki, Eiji

1992-01-01

The partial credit model with a varying slope parameter is developed and called the generalized partial credit model (GPCM). Analysis results for simulated data by this and other polytomous item-response models demonstrate that the rating formulation of the GPCM is adaptable to the analysis of polytomous item responses. (SLD)
Using a Multivariate Multilevel Polytomous Item Response Theory Model to Study Parallel Processes of Change: The Dynamic Association between Adolescents' Social Isolation and Engagement with Delinquent Peers in the National Youth Survey

ERIC Educational Resources Information Center

Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.

2010-01-01

The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…
Studies of Horst's Procedure for Binary Data Analysis.

ERIC Educational Resources Information Center

Gray, William M.; Hofmann, Richard J.

Most responses to educational and psychological test items may be represented in binary form. However, such dichotomously scored items present special problems when an analysis of correlational interrelationships among the items is attempted. Two general methods of analyzing binary data are proposed by Horst to partial out the effects of…
Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

ERIC Educational Resources Information Center

Wang, Wen-Chung

2004-01-01

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

PubMed Central

Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

2004-01-01

Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Analyzing Multiple-Choice Questions by Model Analysis and Item Response Curves

NASA Astrophysics Data System (ADS)

Wattanakasiwich, P.; Ananta, S.

2010-07-01

In physics education research, the main goal is to improve physics teaching so that most students understand physics conceptually and be able to apply concepts in solving problems. Therefore many multiple-choice instruments were developed to probe students' conceptual understanding in various topics. Two techniques including model analysis and item response curves were used to analyze students' responses from Force and Motion Conceptual Evaluation (FMCE). For this study FMCE data from more than 1000 students at Chiang Mai University were collected over the past three years. With model analysis, we can obtain students' alternative knowledge and the probabilities for students to use such knowledge in a range of equivalent contexts. The model analysis consists of two algorithms—concentration factor and model estimation. This paper only presents results from using the model estimation algorithm to obtain a model plot. The plot helps to identify a class model state whether it is in the misconception region or not. Item response curve (IRC) derived from item response theory is a plot between percentages of students selecting a particular choice versus their total score. Pros and cons of both techniques are compared and discussed.
A Note on Item-Restscore Association in Rasch Models

ERIC Educational Resources Information Center

Kreiner, Svend

2011-01-01

To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

PubMed Central

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Validation of a clinical critical thinking skills test in nursing.

PubMed

Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

2015-01-27

The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing

PubMed Central

2015-01-01

Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Translation Fidelity of Psychological Scales: An Item Response Theory Analysis of an Individualism-Collectivism Scale.

ERIC Educational Resources Information Center

Bontempo, Robert

1993-01-01

Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)
Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…
Item response theory analysis of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised in the Pooled Resource Open-Access ALS Clinical Trials Database.

PubMed

Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M

2016-01-01

Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
An item response theory evaluation of the young mania rating scale and the montgomery-asberg depression rating scale in the systematic treatment enhancement program for bipolar disorder (STEP-BD).

PubMed

Prisciandaro, James J; Tolliver, Bryan K

2016-11-15

The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
Rasch analysis of the Italian Lower Extremity Functional Scale: insights on dimensionality and suggestions for an improved 15-item version.

PubMed

Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano

2017-04-01

To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
Checking Dimensionality in Item Response Models with Principal Component Analysis on Standardized Residuals

ERIC Educational Resources Information Center

Chou, Yeh-Tai; Wang, Wen-Chung

2010-01-01

Dimensionality is an important assumption in item response theory (IRT). Principal component analysis on standardized residuals has been used to check dimensionality, especially under the family of Rasch models. It has been suggested that an eigenvalue greater than 1.5 for the first eigenvalue signifies a violation of unidimensionality when there…
Measurement properties of painDETECT: Rasch analysis of responses from community-dwelling adults with neuropathic pain.

PubMed

Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian

2017-03-04

painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.
Developing an African youth psychosocial assessment: an application of item response theory.

PubMed

Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise

2014-06-01

This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.

Developing an African youth psychosocial assessment: an application of item response theory

PubMed Central

BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE

2014-01-01

This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
Pattern analysis of total item score and item response of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative sample of US adults

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A.

2017-01-01

Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales. PMID:28289560
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

PubMed

Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

2018-02-02

In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?

ERIC Educational Resources Information Center

Royal, Kenneth D.; Stockdale, Myrah R.

2015-01-01

The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
Rasch validation of the Arabic version of the lower extremity functional scale.

PubMed

Alnahdi, Ali H

2018-02-01

The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
The Use of Multiple Imputation for Missing Data in Uniform DIF Analysis: Power and Type I Error Rates

ERIC Educational Resources Information Center

Finch, Holmes

2011-01-01

Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Efficiently Assessing Negative Cognition in Depression: An Item Response Theory Analysis of the Dysfunctional Attitude Scale

ERIC Educational Resources Information Center

Beevers, Christopher G.; Strong, David R.; Meyer, Bjorn; Pilkonis, Paul A.; Miller, Ivan R.

2007-01-01

Despite a central role for dysfunctional attitudes in cognitive theories of depression and the widespread use of the Dysfunctional Attitude Scale, form A (DAS-A; A. Weissman, 1979), the psychometric development of the DAS-A has been relatively limited. The authors used nonparametric item response theory methods to examine the DAS-A items and…
An item response curves analysis of the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.

2012-09-01

Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.
A General Program for Item-Response Analysis That Employs the Stabilized Newton-Raphson Algorithm. Research Report. ETS RR-13-32

ERIC Educational Resources Information Center

Haberman, Shelby J.

2013-01-01

A general program for item-response analysis is described that uses the stabilized Newton-Raphson algorithm. This program is written to be compliant with Fortran 2003 standards and is sufficiently general to handle independent variables, multidimensional ability parameters, and matrix sampling. The ability variables may be either polytomous or…
Maximizing the Information and Validity of a Linear Composite in the Factor Analysis Model for Continuous Item Responses

ERIC Educational Resources Information Center

Ferrando, Pere J.

2008-01-01

This paper develops results and procedures for obtaining linear composites of factor scores that maximize: (a) test information, and (b) validity with respect to external variables in the multiple factor analysis (FA) model. I treat FA as a multidimensional item response theory model, and use Ackerman's multidimensional information approach based…
Examining Differential Item Functioning: IRT-Based Detection in the Framework of Confirmatory Factor Analysis

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2017-01-01

This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.

ERIC Educational Resources Information Center

Muraki, Eiji

1999-01-01

Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory.

PubMed

Pasca, Laura; Aragonés, Juan I; Coello, María T

2017-01-01

The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Model-Based Collaborative Filtering Analysis of Student Response Data: Machine-Learning Item Response Theory

ERIC Educational Resources Information Center

Bergner, Yoav; Droschler, Stefan; Kortemeyer, Gerd; Rayyan, Saif; Seaton, Daniel; Pritchard, David E.

2012-01-01

We apply collaborative filtering (CF) to dichotomously scored student response data (right, wrong, or no interaction), finding optimal parameters for each student and item based on cross-validated prediction accuracy. The approach is naturally suited to comparing different models, both unidimensional and multidimensional in ability, including a…
Acquiescent Responding in Balanced Multidimensional Scales and Exploratory Factor Analysis

ERIC Educational Resources Information Center

Lorenzo-Seva, Urbano; Rodriguez-Fornells, Antoni

2006-01-01

Personality tests often consist of a set of dichotomous or Likert items. These response formats are known to be susceptible to an agreeing-response bias called acquiescence. The common assumption in balanced scales is that the sum of appropriately reversed responses should be reasonably free of acquiescence. However, inter-item correlation (or…
Applying the Nominal Response Model within a Longitudinal Framework to Construct the Positive Family Relationships Scale

ERIC Educational Resources Information Center

Preston, Kathleen Suzanne Johnson; Parral, Skye N.; Gottfried, Allen W.; Oliver, Pamella H.; Gottfried, Adele Eskeles; Ibrahim, Sirena M.; Delany, Danielle

2015-01-01

A psychometric analysis was conducted using the nominal response model under the item response theory framework to construct the Positive Family Relationships scale. Using data from the Fullerton Longitudinal Study, this scale was constructed within a long-term longitudinal framework spanning middle childhood through adolescence. Items tapping…
An Evaluation of the Brief Symptom Inventory-18 Using Item Response Theory: Which Items Are Most Strongly Related to Psychological Distress?

ERIC Educational Resources Information Center

Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent

2011-01-01

The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a strong Mokken scale for outpatients and…
Analysis of Item Response Patterns: Consistency Indices and Their Application to Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Harnisch, Delwyn L.

The major emphasis of this paper is in the examination of test item response patterns. Tatsuoka and Tatsuoka (1980) have developed two indices of response consistency: the norm-conformity index (NCI) and the individual consistency index (ICI). The NCI provides a measure of the degree of consistency between the response pattern of an individual and…

Measuring Workplace Climate in Community Clinics and Health Centers.

PubMed

Friedberg, Mark W; Rodriguez, Hector P; Martsolf, Grant R; Edelen, Maria O; Vargas Bustamante, Arturo

2016-10-01

The effectiveness of community clinics and health centers' efforts to improve the quality of care might be modified by clinics' workplace climates. Several surveys to measure workplace climate exist, but their relationships to each other and to distinguishable dimensions of workplace climate are unknown. To assess the psychometric properties of a survey instrument combining items from several existing surveys of workplace climate and to generate a shorter instrument for future use. We fielded a 106-item survey, which included items from 9 existing instruments, to all clinicians and staff members (n=781) working in 30 California community clinics and health centers, receiving 628 responses (80% response rate). We performed exploratory factor analysis of survey responses, followed by confirmatory factor analysis of 200 reserved survey responses. We generated a new, shorter survey instrument of items with strong factor loadings. Six factors, including 44 survey items, emerged from the exploratory analysis. Two factors (Clinic Workload and Teamwork) were independent from the others. The remaining 4 factors (staff relationships, quality improvement orientation, managerial readiness for change, and staff readiness for change) were highly correlated, indicating that these represented dimensions of a higher-order factor we called "Clinic Functionality." This 2-level, 6-factor model fit the data well in the exploratory and confirmatory samples. For all but 1 factor, fewer than 20 survey responses were needed to achieve clinic-level reliability >0.7. Survey instruments designed to measure workplace climate have substantial overlap. The relatively parsimonious item set we identified might help target and tailor clinics' quality improvement efforts.
Measuring Workplace Climate in Community Clinics and Health Centers

PubMed Central

Friedberg, Mark W.; Rodriguez, Hector P.; Martsolf, Grant; Edelen, Maria Orlando; Vargas-Bustamante, Arturo

2018-01-01

Background The effectiveness of community clinics and health centers’ efforts to improve the quality of care might be modified by clinics’ workplace climates. Several surveys to measure workplace climate exist, but their relationships to each other and to distinguishable dimensions of workplace climate are unknown. Objective To assess the psychometric properties of a survey instrument combining items from several existing surveys of workplace climate and to generate a shorter instrument for future use. Methods We fielded a 106-item survey, which included items from 9 existing instruments, to all clinicians and staff members (n=781) working in 30 California community clinics and health centers, receiving 628 responses (80% response rate). We performed exploratory factor analysis of survey responses, followed by confirmatory factor analysis of 200 reserved survey responses. We generated a new, shorter survey instrument of items with strong factor loadings. Results Six factors, including 44 survey items, emerged from the exploratory analysis. Two factors (Clinic Workload and Teamwork) were independent from the others. The remaining 4 factors (Staff Relationships, Quality Improvement Orientation, Managerial Readiness for Change, and Staff Readiness for Change) were highly correlated, indicating that these represented dimensions of a higher-order factor we called “Clinic Functionality.” This two-level, six-factor model fit the data well in the exploratory and confirmatory samples. For all but one factor, fewer than 20 survey responses were needed to achieve clinic-level reliability >0.7. Conclusion Survey instruments designed to measure workplace climate have substantial overlap. The relatively parsimonious item set we identified might help target and tailor clinics’ quality improvement efforts. PMID:27326549
eHealth literacy in chronic disease patients: An item response theory analysis of the eHealth literacy scale (eHEALS).

PubMed

Paige, Samantha R; Krieger, Janice L; Stellefson, Michael; Alber, Julia M

2017-02-01

Chronic disease patients are affected by low computer and health literacy, which negatively affects their ability to benefit from access to online health information. To estimate reliability and confirm model specifications for eHealth Literacy Scale (eHEALS) scores among chronic disease patients using Classical Test (CTT) and Item Response Theory techniques. A stratified sample of Black/African American (N=341) and Caucasian (N=343) adults with chronic disease completed an online survey including the eHEALS. Item discrimination was explored using bi-variate correlations and Cronbach's alpha for internal consistency. A categorical confirmatory factor analysis tested a one-factor structure of eHEALS scores. Item characteristic curves, in-fit/outfit statistics, omega coefficient, and item reliability and separation estimates were computed. A 1-factor structure of eHEALS was confirmed by statistically significant standardized item loadings, acceptable model fit indices (CFI/TLI>0.90), and 70% variance explained by the model. Item response categories increased with higher theta levels, and there was evidence of acceptable reliability (ω=0.94; item reliability=89; item separation=8.54). eHEALS scores are a valid and reliable measure of self-reported eHealth literacy among Internet-using chronic disease patients. Providers can use eHEALS to help identify patients' eHealth literacy skills. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Can Item Analysis of MCQs Accomplish the Need of a Proper Assessment Strategy for Curriculum Improvement in Medical Education?

ERIC Educational Resources Information Center

Pawade, Yogesh R.; Diwase, Dipti S.

2016-01-01

Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Comparing the Fit of Item Response Theory and Factor Analysis Models

ERIC Educational Resources Information Center

Maydeu-Olivares, Alberto; Cai, Li; Hernandez, Adolfo

2011-01-01

Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be…
An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

ERIC Educational Resources Information Center

Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

2013-01-01

Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning

PubMed Central

Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert

2016-01-01

Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science.

PubMed

Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo

2017-10-30

The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Do large-scale assessments measure students' ability to integrate scientific knowledge?

NASA Astrophysics Data System (ADS)

Lee, Hee-Sun

2010-03-01

Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Anatomy of a physics test: Validation of the physics items on the Texas Assessment of Knowledge and Skills

NASA Astrophysics Data System (ADS)

Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry

2009-06-01

We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
Model Choice and Sample Size in Item Response Theory Analysis of Aphasia Tests

ERIC Educational Resources Information Center

Hula, William D.; Fergadiotis, Gerasimos; Martin, Nadine

2012-01-01

Purpose: The purpose of this study was to identify the most appropriate item response theory (IRT) measurement model for aphasia tests requiring 2-choice responses and to determine whether small samples are adequate for estimating such models. Method: Pyramids and Palm Trees (Howard & Patterson, 1992) test data that had been collected from…
An alternative to Rasch analysis using triadic comparisons and multi-dimensional scaling

NASA Astrophysics Data System (ADS)

Bradley, C.; Massof, R. W.

2016-11-01

Rasch analysis is a principled approach for estimating the magnitude of some shared property of a set of items when a group of people assign ordinal ratings to them. In the general case, Rasch analysis not only estimates person and item measures on the same invariant scale, but also estimates the average thresholds used by the population to define rating categories. However, Rasch analysis fails when there is insufficient variance in the observed responses because it assumes a probabilistic relationship between person measures, item measures and the rating assigned by a person to an item. When only a single person is rating all items, there may be cases where the person assigns the same rating to many items no matter how many times he rates them. We introduce an alternative to Rasch analysis for precisely these situations. Our approach leverages multi-dimensional scaling (MDS) and requires only rank orderings of items and rank orderings of pairs of distances between items to work. Simulations show one variant of this approach - triadic comparisons with non-metric MDS - provides highly accurate estimates of item measures in realistic situations.
Assessing Dimensionality of Noncompensatory Multidimensional Item Response Theory with Complex Structures

ERIC Educational Resources Information Center

Svetina, Dubravka

2013-01-01

The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…
Slower is not always better: Response-time evidence clarifies the limited role of miserly information processing in the Cognitive Reflection Test

PubMed Central

Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard

2017-01-01

We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Looking Closer at the Effects of Framing on Risky Choice: An Item Response Theory Analysis.

PubMed

Sickar; Highhouse

1998-07-01

Item response theory (IRT) methodology allowed an in-depth examination of several issues that would be difficult to explore using traditional methodology. IRT models were estimated for 4 risky-choice items, answered by students under either a gain or loss frame. Results supported the typical framing finding of risk-aversion for gains and risk-seeking for losses but also suggested that a latent construct we label preference for risk was influential in predicting risky choice. Also, the Asian Disease item, most often used in framing research, was found to have anomalous statistical properties when compared to other framing items. Copyright 1998 Academic Press.
Quantifying traditional Chinese medicine patterns using modern test theory: an example of functional constipation.

PubMed

Shen, Minxue; Cui, Yuanwu; Hu, Ming; Xu, Linyong

2017-01-13

The study aimed to validate a scale to assess the severity of "Yin deficiency, intestine heat" pattern of functional constipation based on the modern test theory. Pooled longitudinal data of 237 patients with "Yin deficiency, intestine heat" pattern of constipation from a prospective cohort study were used to validate the scale. Exploratory factor analysis was used to examine the common factors of items. A multidimensional item response model was used to assess the scale with the presence of multidimensionality. The Cronbach's alpha ranged from 0.79 to 0.89, and the split-half reliability ranged from 0.67 to 0.79 at different measurements. Exploratory factor analysis identified two common factors, and all items had cross factor loadings. Bidimensional model had better goodness of fit than the unidimensional model. Multidimensional item response model showed that the all items had moderate to high discrimination parameters. Parameters indicated that the first latent trait signified intestine heat, while the second trait characterized Yin deficiency. Information function showed that items demonstrated highest discrimination power among patients with moderate to high level of disease severity. Multidimensional item response theory provides a useful and rational approach in validating scales for assessing the severity of patterns in traditional Chinese medicine.
Conditional Covariance Theory and DETECT for Polytomous Items. Research Report. ETS RR-04-50

ERIC Educational Resources Information Center

Zhang, Jinming

2004-01-01

This paper extends the theory of conditional covariances to polytomous items. It has been mathematically proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, is positive if the two items are dimensionally homogeneous and negative…
PROC IRT: A SAS Procedure for Item Response Theory

PubMed Central

Matlock Cole, Ki; Paek, Insu

2017-01-01

This article reviews the procedure for item response theory (PROC IRT) procedure in SAS/STAT 14.1 to conduct item response theory (IRT) analyses of dichotomous and polytomous datasets that are unidimensional or multidimensional. The review provides an overview of available features, including models, estimation procedures, interfacing, input, and output files. A small-scale simulation study evaluates the IRT model parameter recovery of the PROC IRT procedure. The use of the IRT procedure in Statistical Analysis Software (SAS) may be useful for researchers who frequently utilize SAS for analyses, research, and teaching.
Development and psychometric properties of the Suicidality of Adolescent Screening Scale (SASS) using Multidimensional Item Response Theory.

PubMed

Sukhawaha, Supattra; Arunpongpaisal, Suwanna; Hurst, Cameron

2016-09-30

Suicide prevention in adolescents by early detection using screening tools to identify high suicidal risk is a priority. Our objective was to build a multidimensional scale namely "Suicidality of Adolescent Screening Scale (SASS)" to identify adolescents at risk of suicide. An initial pool of items was developed by using in-depth interview, focus groups and a literature review. Initially, 77 items were administered to 307 adolescents and analyzed using the exploratory Multidimensional Item Response Theory (MIRT) to remove unnecessary items. A subsequent exploratory factor analysis revealed 35 items that collected into 4 factors: Stressors, Pessimism, Suicidality and Depression. To confirm this structure, a new sample of 450 adolescents were collected and confirmatory MIRT factor analysis was performed. The resulting scale was shown to be both construct valid and able to discriminate well between adolescents that had, and hadn't previous attempted suicide. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory

PubMed Central

Pasca, Laura; Aragonés, Juan I.; Coello, María T.

2017-01-01

The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature. PMID:28824509

Analysis of Individual "Test Of Astronomy STandards" (TOAST) Item Responses

ERIC Educational Resources Information Center

Slater, Stephanie J.; Schleigh, Sharon Price; Stork, Debra J.

2015-01-01

The development of valid and reliable strategies to efficiently determine the knowledge landscape of introductory astronomy college students is an effort of great interest to the astronomy education community. This study examines individual item response rates from a widely used conceptual understanding survey, the Test Of Astronomy Standards…
Psychological distress in cancer survivors: the further development of an item bank.

PubMed

Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P

2013-02-01

Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
The Spanish version of the Self-Determination Inventory Student Report: application of item response theory to self-determination measurement.

PubMed

Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A

2018-04-01

A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting.

PubMed

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-05-03

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting

PubMed Central

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-01-01

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a Rheumatoid Arthritis Population using Rasch Analysis

PubMed Central

Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan

2007-01-01

Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
A Rasch analysis of the Brief Pain Inventory Interference subscale reveals three dimensions and an age bias.

PubMed

Walton, David M; Beattie, Tyler; Putos, Joseph; MacDermid, Joy C

2016-06-01

The Brief Pain Inventory is composed of two quantifiable scales: pain severity and pain interference. The reported factor structure of the interference subscale is not consistent in the extant literature, with no clear choice between a single- or two-factor structure. Here, we report on the results of Rasch-based analysis of the interference subscale using a large population-based ambulatory patient database (the Quebec Pain Registry). Observational cohort. A total of 1,000 responses were randomly drawn from a total database of 5,654 for this analysis. Both the original 7-item and an expanded 10-item version (Tyler 2002) of the interference subscale were evaluated. Rasch analysis revealed significant misfit of both versions of the scale, with the original 7-item version outperforming the expanded 10-item version. Analysis of dimensionality revealed that both versions showed improved model fit when considered two subscales (affective and physical interference) with the item on sleep interference removed or considered separately. Additionally, significant uniform differential item functioning was identified for 6 of the 7 original items when the sample was stratified by age above or below 55 years. The interference subscale achieved adequate model fit when considered as two separate subscales with age as a mediator of response, while interpreting the sleep interference item separately. A transformation matrix revealed that in all cases, ordinal-level change at the extreme ends of the scale appears to be more meaningful than does a similar change at the midpoints. The Interference subscale of the BPI should be interpreted as two separate subscales (Affective Interference, Physical Interference) with the sleep item removed or interpreted separately for optimal fit to the Rasch model. Implications for research and clinical use are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Item response theory analysis of Working Alliance Inventory, revised response format, and new Brief Alliance Inventory.

PubMed

Mallinckrodt, Brent; Tekie, Yacob T

2016-11-01

The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
The emotion regulation questionnaire in women with cancer: A psychometric evaluation and an item response theory analysis.

PubMed

Brandão, Tânia; Schulz, Marc S; Gross, James J; Matos, Paula Mena

2017-10-01

Emotion regulation is thought to play an important role in adaptation to cancer. However, the emotion regulation questionnaire (ERQ), a widely used instrument to assess emotion regulation, has not yet been validated in this context. This study addresses this gap by examining the psychometric properties of the ERQ in a sample of Portuguese women with cancer. The ERQ was administered to 204 women with cancer (mean age = 48.89 years, SD = 7.55). Confirmatory factor analysis and item response theory analysis were used to examine psychometric properties of the ERQ. Confirmatory factor analysis confirmed the 2-factor solution proposed by the original authors (expressive suppression and cognitive reappraisal). This solution was invariant across age and type of cancer. Item response theory analyses showed that all items were moderately to highly discriminant and that items are better suited for identifying moderate levels of expressive suppression and cognitive reappraisal. Support was found for the internal consistency and test-retest reliability of the ERQ. The pattern of relationships with emotional control, alexithymia, emotional self-efficacy, attachment, and quality of life provided evidence of the convergent and concurrent validity for both dimensions of the ERQ. Overall, the ERQ is a psychometrically sound approach for assessing emotion regulation strategies in the oncological context. Clinical implications are discussed. Copyright © 2016 John Wiley & Sons, Ltd.
[Item function analysis on the Quality of Life-Alzheimer's Disease(QOL-AD)Chinese version, based on the Item Response Theory(IRT)].

PubMed

Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei

2013-07-01

To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

ERIC Educational Resources Information Center

Schweizer, Karl; Troche, Stefan

2018-01-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument

PubMed Central

Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.

2012-01-01

Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
1999 Survey of Active Duty Personnel: Administration, Datasets, and Codebook

DTIC Science & Technology

2000-12-01

but left key items blank. These surveys were treated as nonrespondents when at least one item in each of the Questions 39, 50, and 52 were not...be useable, a questionnaire had to have at least one item in each of the questions 39, 50 and 52 answered. 27 These two subgroups of records are...1988). Mail survey response rate: A meta-analysis of selected techniques for inducing response. Public Opinion Quarterly, 52 , 467-491. Francisco, C
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Emotional Intelligence and Nurse Recruitment: Rasch and confirmatory factor analysis of the trait emotional intelligence questionnaire short form.

PubMed

Snowden, Austyn; Watson, Roger; Stenhouse, Rosie; Hale, Claire

2015-12-01

To examine the construct validity of the Trait Emotional Intelligence Questionnaire Short form. Emotional intelligence involves the identification and regulation of our own emotions and the emotions of others. It is therefore a potentially useful construct in the investigation of recruitment and retention in nursing and many questionnaires have been constructed to measure it. Secondary analysis of existing dataset of responses to Trait Emotional Intelligence Questionnaire Short form using concurrent application of Rasch analysis and confirmatory factor analysis. First year undergraduate nursing and computing students completed Trait Emotional Intelligence Questionnaire-Short Form in September 2013. Responses were analysed by synthesising results of Rasch analysis and confirmatory factor analysis. Participants (N = 938) completed Trait Emotional Intelligence Questionnaire Short form. Rasch analysis showed the majority of the Trait Emotional Intelligence Questionnaire-Short Form items made a unique contribution to the latent trait of emotional intelligence. Five items did not fit the model and differential item functioning (gender) accounted for this misfit. Confirmatory factor analysis revealed a four-factor structure consisting of: self-confidence, empathy, uncertainty and social connection. All five misfitting items from the Rasch analysis belonged to the 'social connection' factor. The concurrent use of Rasch and factor analysis allowed for novel interpretation of Trait Emotional Intelligence Questionnaire Short form. Much of the response variation in Trait Emotional Intelligence Questionnaire Short form can be accounted for by the social connection factor. Implications for practice are discussed. © 2015 John Wiley & Sons Ltd.
Vision and Quality of Life Index: validation of the Indian version using Rasch analysis.

PubMed

Gothwal, Vijaya K; Bagga, Deepak K

2013-07-18

A multi-attribute utility instrument (MAUI) consists of a descriptive system in which the items and responses seek information about a concept of the universe of health-related quality of life (QoL), and responses to these items then are weighted and combined to produce the index. To our knowledge, the 6-item Vision and Quality of Life Index (VisQoL) is the only available vision-related MAUI, developed and validated in Australia, specifically for visually impaired (VI) populations. To our knowledge, the psychometric properties of the VisQoL have not yet been investigated in an Indian VI sample; this was the aim of our study. The Indian VisQoL was administered to 349 VI adults face-to-face by a trained interviewer at the Vision Rehabilitation Centres of a tertiary eye care facility, South India. Rasch analysis was used to assess the psychometric properties. Rescoring was necessary for all except one item before ordered thresholds were obtained. All items fit the Rasch model and unidimensionality was confirmed. Person separation was acceptable (2.01), indicating that the instrument can discriminate among three strata of participants" vision-related QoL (VRQoL). The VisQoL items were targeted substantially to the participants" VRQoL (-0.69 logits). One item ("ability to have friendships") demonstrated large differential item functioning by work status; working participants reported the item to be more difficult (-1.13 logits) relative to other items when compared to the nonworking participants. The 6-item Indian VisQoL satisfies unidimensional Rasch model expectations in VI patients. Disordering of response categories was evident; replication is required before a common rescoring option should be considered.
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

ERIC Educational Resources Information Center

Alsadaawi, Abdullah Saleh

2017-01-01

The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Comparison of Reliability Measures under Factor Analysis and Item Response Theory

ERIC Educational Resources Information Center

Cheng, Ying; Yuan, Ke-Hai; Liu, Cheng

2012-01-01

Reliability of test scores is one of the most pervasive psychometric concepts in measurement. Reliability coefficients based on a unifactor model for continuous indicators include maximal reliability rho and an unweighted sum score-based omega, among many others. With increasing popularity of item response theory, a parallel reliability measure pi…
Unidimensional and Multidimensional Models for Item Response Theory.

ERIC Educational Resources Information Center

McDonald, Roderick P.

This paper provides an up-to-date review of the relationship between item response theory (IRT) and (nonlinear) common factor theory and draws out of this relationship some implications for current and future research in IRT. Nonlinear common factor analysis yields a natural embodiment of the weak principle of local independence in appropriate…

Item Banks for Measuring Emotional Distress From the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, Anxiety, and Anger

PubMed Central

Pilkonis, Paul A.; Choi, Seung W.; Reise, Steven P.; Stover, Angela M.; Riley, William T.; Cella, David

2011-01-01

The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately −1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items. PMID:21697139
Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger.

PubMed

Pilkonis, Paul A; Choi, Seung W; Reise, Steven P; Stover, Angela M; Riley, William T; Cella, David

2011-09-01

The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately -1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items.
COVD-QOL questionnaire: An adaptation for school vision screening using Rasch analysis

PubMed Central

Abu Bakar, Nurul Farhana; Ai Hong, Chen; Pik Pin, Goh

2012-01-01

Purpose To adapt the College of Optometrist in Vision Development (COVD-QOL) questionnaire as a vision screening tool for primary school children. Methods An interview session was conducted with children, teachers or guardians regarding visual symptoms of 88 children (45 from special education classes and 43 from mainstream classes) in government primary schools. Data was assessed for response categories, fit items (infit/outfit: 0.6–1.4) and separation reliability (item/person: 0.80). The COVD-QOL questionnaire results were compared with vision assessment in identifying three categories of vision disorders: reduce visual acuity, accommodative response anomaly and convergence insufficiency. Analysis on the screening performance using the simplified version of the questionnaire was evaluated based on receiver-operating characteristic analysis for detection of any type of target conditions for both types of classes. Predictive validity analysis was used a Spearman rank correlation (>0.3). Results Two of the response categories were underutilized and therefore collapsed to the adjacent category and items were reduced to 14. Item separation reliability for the simplified version of the questionnaire was acceptable (0.86) but the person separation reliability was inadequate for special education classes (0.79) similar to mainstream classes (0.78). The discriminant cut-off score of 9 (mainstream classes) and 3 (special education classes) from the 14 items provided sensitivity and specificity of (65% and 54%) and (78% and 80%) with Spearman rank correlation of 0.16 and 0.40 respectively. Conclusion The simplified version of COVD-QOL questionnaire (14-items) performs adequately among children in special education classes suggesting its suitability as a vision screening tool.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.

PubMed

Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C

2015-05-01

To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.

PubMed

McCabe, Erin; Gross, Douglas P; Bulut, Okan

2018-06-07

The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Relationship of the Basic Attributes Test to Tactical Reconnaissance Pilot Performance

DTIC Science & Technology

1987-01-01

ulysk 36 4.. Pscoa- Test: Pefcrman Regression Analysis 66 5. Decsison Making Speed: Ped~cnance Regrssiop Analysis 68 6. Item Recognitio . Pefixmanc...agreement between 12 TRS and 91 TRS supcrvisors. This indicated that those most likely to be faced with the task of determining the performance capabilities...those UPT check flights requiring quick, consistent, and accurate responses. Item Recognitio Test Ihe item recognition test reduced to seven scors. Thet
Development of the Contact Lens User Experience: CLUE Scales

PubMed Central

Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.

2016-01-01

ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257
A Rasch Analysis of the Junior Metacognitive Awareness Inventory with Singapore Students

ERIC Educational Resources Information Center

Ning, Hoi Kwan

2018-01-01

The psychometric properties of the 2 versions of the Junior Metacognitive Awareness Inventory were examined with Singapore student samples. Other than 2 misfitting items and an underutilized response scale, Rasch analysis demonstrated that the instruments have good measurement precision, and no differential item functioning was detected across…
A Study of Item Bias for Attitudinal Measurement Using Maximum Likelihood Factor Analysis.

ERIC Educational Resources Information Center

Mayberry, Paul W.

A technique for detecting item bias that is responsive to attitudinal measurement considerations is a maximum likelihood factor analysis procedure comparing multivariate factor structures across various subpopulations, often referred to as SIFASP. The SIFASP technique allows for factorial model comparisons in the testing of various hypotheses…
Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

PubMed

Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

2015-08-19

Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
An item-response theory approach to safety climate measurement: The Liberty Mutual Safety Climate Short Scales.

PubMed

Huang, Yueng-Hsiang; Lee, Jin; Chen, Zhuo; Perry, MacKenna; Cheung, Janelle H; Wang, Mo

2017-06-01

Zohar and Luria's (2005) safety climate (SC) scale, measuring organization- and group- level SC each with 16 items, is widely used in research and practice. To improve the utility of the SC scale, we shortened the original full-length SC scales. Item response theory (IRT) analysis was conducted using a sample of 29,179 frontline workers from various industries. Based on graded response models, we shortened the original scales in two ways: (1) selecting items with above-average discriminating ability (i.e. offering more than 6.25% of the original total scale information), resulting in 8-item organization-level and 11-item group-level SC scales; and (2) selecting the most informative items that together retain at least 30% of original scale information, resulting in 4-item organization-level and 4-item group-level SC scales. All four shortened scales had acceptable reliability (≥0.89) and high correlations (≥0.95) with the original scale scores. The shortened scales will be valuable for academic research and practical survey implementation in improving occupational safety. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
An item response theory analysis of the narcissistic personality inventory.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Robins, Richard W

2012-01-01

This research uses item response theory methods to evaluate the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988). Analyses using the 2-parameter logistic model were conducted on the total score and the Corry, Merritt, Mrug, and Pamp (2008) and Ackerman et al. (2011) subscales for the NPI. In addition to offering precise information about the psychometric properties of the NPI item pool, these analyses generated insights that can be used to develop new measures of the personality constructs embedded within this frequently used inventory.
Psychometric Properties of the Children's Depression Inventory: An Item Response Theory Analysis across Age in a Nonclinical, Longitudinal, Adolescent Sample

ERIC Educational Resources Information Center

Lee, Young-Sun; Krishnan, Anita; Park, Yoon Soo

2012-01-01

The purpose of this study was to investigate psychometric properties of the Children's Depression Inventory within a nonclinical and longitudinal sample (8th and 12th grades). Using the Rasch rating scale, most items represented one dimension. There was adequate separation among items and no overlap between ranges of item difficulties with latent…
Developing a scale to measure "attachment to the local community" in late middle aged individuals.

PubMed

Sakai, Taichi; Omori, Junko; Takahashi, Kazuko; Mitsumori, Yasuko; Kobayashi, Maasa; Ono, Wakanako; Miyazaki, Toshie; Anzai, Hitomi; Saito, Mika

2016-01-01

Objectives This study was conducted to develop a scale for measuring "attachment to the local community" for its use in health services. The scale is also intended to nurture new social relationships in late middle-aged individuals.Methods Thirty items were initially planned to be included in the scale to measure "attachment to the local community", according to a previous study that identified the concept. The study subjects were late middle-aged residents of City B in Prefecture A, located in Tokyo suburbs. From the basic resident register data, 1,000 individuals (local residents in the 50-69 year age group) were selected by a multi-stage random sampling technique, on the basis of their residential area, age, and sex (while maintaining the male to female ratio). An unsigned self-administered questionnaire was distributed to the subjects, and the responses were collected by postal mail. The collected data was analyzed using psychometric study of scale.Results Valid responses were obtained from 583 subjects, and the response rate was 58.3%. In an item analysis, none of the items were rejected. In a subsequent factor analysis, 7 items were eliminated. These items included 2 items with a factor loading of <0.40, 3 items loading on multiple factors and showing a factor loading of ≥0.40, and 2 items with a low factor correlation (0.04-0.16). These items included factors that related to only these 2 items. Consequently, 23 items in the following 4-factor structure were selected as the scale items: "Source of vitality to live life," "Intention to cherish ties with people," "Place where one can be oneself," and "Pride of being a resident." Cronbach's coefficient α for the entire scale of "attachment to the local community" was 0.95, demonstrating internal consistency. We then examined the correlation with an existing scale to measure social support; the results revealed a statistically significant correlation and confirmed criterion-related validity (P<0.001). In addition, the fit indices in a covariance structure analysis showed adequate values.Conclusions The developed scale was considered reliable and appropriate for measuring "attachment to the local community."
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data.

PubMed

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L J; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two 'one-process' models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a 'two-process' model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses.
Psychometric properties of the Epworth Sleepiness Scale: A factor analysis and item-response theory approach.

PubMed

Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus

2018-04-01

The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications

ERIC Educational Resources Information Center

Chiu, Chia-Yi; Douglas, Jeffrey A.; Li, Xiaodong

2009-01-01

Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and…
A Multilevel Testlet Model for Dual Local Dependence

ERIC Educational Resources Information Center

Jiao, Hong; Kamata, Akihito; Wang, Shudong; Jin, Ying

2012-01-01

The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet-based assessment, both local item dependence and local person dependence are likely to be induced.…
Conditional Covariance Theory and Detect for Polytomous Items

ERIC Educational Resources Information Center

Zhang, Jinming

2007-01-01

This paper extends the theory of conditional covariances to polytomous items. It has been proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, given an appropriately chosen composite is positive if, and only if, the two items…
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning

ERIC Educational Resources Information Center

Finch, W. Holmes

2011-01-01

Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…

Item Response Theory Evaluation of the Light and Spectroscopy Concept Inventory National Data Set

ERIC Educational Resources Information Center

Wallace, Colin S.; Chambers, Timothy G.; Prather, Edward E.

2018-01-01

[This paper is part of the Focused Collection on Astronomy Education Research.] This paper presents the first item response theory (IRT) analysis of the national data set on introductory, general education, college-level astronomy teaching using the Light and Spectroscopy Concept Inventory (LSCI). We used the difference between students' pre- and…
A Multidimensional Item Response Model: Constrained Latent Class Analysis Using the Gibbs Sampler and Posterior Predictive Checks.

ERIC Educational Resources Information Center

Hoijtink, Herbert; Molenaar, Ivo W.

1997-01-01

This paper shows that a certain class of constrained latent class models may be interpreted as a special case of nonparametric multidimensional item response models. Parameters of this latent class model are estimated using an application of the Gibbs sampler, and model fit is investigated using posterior predictive checks. (SLD)
ITEM RESPONSE ANALYSES OF THE EDUCATIONAL OPPORTUNITIES SURVEY 9TH GRADE STUDENT QUESTIONNAIRE.

ERIC Educational Resources Information Center

WEINFELD, FREDERIC D.; AND OTHERS

THIS REPORT PRESENTS THE ANALYSIS OF QUESTIONNAIRE ITEM RESPONSES FROM THE NINTH-GRADE STUDENT QUESTIONNAIRE ADMINISTERED AS PART OF THE EDUCATIONAL OPPORTUNITIES SURVEY. THE ANALYSES WERE PERFORMED TO DOCUMENT SOME OF THE BASIC DATA FROM THE SURVEY, TO MAKE THEM AVAILABLE TO INTERESTED EDUCATIONAL RESEARCHERS, AND TO REWORK THE BASIC DATA FOR…
Computing Maximum Likelihood Estimates of Loglinear Models from Marginal Sums with Special Attention to Loglinear Item Response Theory.

ERIC Educational Resources Information Center

Kelderman, Henk

1992-01-01

Describes algorithms used in the computer program LOGIMO for obtaining maximum likelihood estimates of the parameters in loglinear models. These algorithms are also useful for the analysis of loglinear item-response theory models. Presents modified versions of the iterative proportional fitting and Newton-Raphson algorithms. Simulated data…
Development and Preliminary Validation of Chinese Preschoolers’ Eating Behavior Questionnaire

PubMed Central

Zhang, Yuhai; Wang, Baoxi; Sun, Lijun; Shang, Lei

2014-01-01

Background The objective of this study was to develop a questionnaire for caregivers to assess the eating behavior of Chinese preschoolers. Methods To assess children’s eating behaviors, 152 items were derived from a broad review of the literature related to epidemiology surveys and the assessment of children’s eating behaviors. All of these items were reviewed by 50 caregivers of preschoolers and 10 experienced pediatricians. Seventy-seven items were selected for use in a primary questionnaire. After conducting an exploratory factor analysis and a variability analysis on the data from 313 preschoolers used to evaluate this primary questionnaire, we deleted 39 of these 77 items. A Chinese Preschoolers’ Eating Behavior Questionnaire (CPEBQ) was finally established from the remaining 38 items. The structure of this questionnaire was explored by factor analysis, and its reliability, validity and discriminative ability were evaluated with data collected from caregivers of 603 preschoolers. Results The CPEBQ consisted of 7 dimensions and 38 items. The 7 dimensions were food fussiness, food responsiveness, eating habit, satiety responsiveness, exogenous eating, emotional eating and initiative eating. The Cronbach’s α coefficient for the questionnaire was 0.92, and the test-retest reliability was 0.72. There were significant differences between the scores of normal-weight, overweight and obese preschoolers when it was referred to food fussiness, food responsiveness, eating habits, satiety responsiveness and emotional eating (p<0.05). Differences in caregiver’s education levels also had significant effects on scores for food fussiness, eating habits and exogenous eating (p<0.05). Conclusions The CPEBQ satisfies the conditions of reliability and validity, in accordance with psychometric demands. The questionnaire can be employed to evaluate the characteristics of Chinese preschoolers’ eating behaviors; therefore, it can be used in child health care practice and research. PMID:24520359
Evaluation of the Psychometric Properties of the Asian Adolescent Depression Scale and Construction of a Short Form: An Item Response Theory Analysis.

PubMed

Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen

2017-07-01

The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Differential item functional analysis on pedagogic and content knowledge (PCK) questionnaire for Indonesian teachers using RASCH model

NASA Astrophysics Data System (ADS)

Rahmani, B. D.

2018-01-01

The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Item response analysis of the Positive and Negative Syndrome Scale

PubMed Central

Santor, Darcy A; Ascher-Svanum, Haya; Lindenmayer, Jean-Pierre; Obenchain, Robert L

2007-01-01

Background Statistical models based on item response theory were used to examine (a) the performance of individual Positive and Negative Syndrome Scale (PANSS) items and their options, (b) the effectiveness of various subscales to discriminate among individual differences in symptom severity, and (c) the appropriateness of cutoff scores recently recommended by Andreasen and her colleagues (2005) to establish symptom remission. Methods Option characteristic curves were estimated using a nonparametric item response model to examine the probability of endorsing each of 7 options within each of 30 PANSS items as a function of standardized, overall symptom severity. Our data were baseline PANSS scores from 9205 patients with schizophrenia or schizoaffective disorder who were enrolled between 1995 and 2003 in either a large, naturalistic, observational study or else in 1 of 12 randomized, double-blind, clinical trials comparing olanzapine to other antipsychotic drugs. Results Our analyses show that the majority of items forming the Positive and Negative subscales of the PANSS perform very well. We also identified key areas for improvement or revision in items and options within the General Psychopathology subscale. The Positive and Negative subscale scores are not only more discriminating of individual differences in symptom severity than the General Psychopathology subscale score, but are also more efficient on average than the 30-item total score. Of the 8 items recently recommended to establish symptom remission, 1 performed markedly different from the 7 others and should either be deleted or rescored requiring that patients achieve a lower score of 2 (rather than 3) to signal remission. Conclusion This first item response analysis of the PANSS supports its sound psychometric properties; most PANSS items were either very good or good at assessing overall severity of illness. These analyses did identify some items which might be further improved for measuring individual severity differences or for defining remission thresholds. Findings also suggest that the Positive and Negative subscales are more sensitive to change than the PANSS total score and, thus, may constitute a "mini PANSS" that may be more reliable, require shorter administration and training time, and possibly reduce sample sizes needed for future research. PMID:18005449
An item response theory analysis of Harter's Self-Perception Profile for children or why strong clinical scales should be distrusted.

PubMed

Egberink, Iris J L; Meijer, Rob R

2011-06-01

The authors investigated the psychometric properties of the subscales of the Self-Perception Profile for Children with item response theory (IRT) models using a sample of 611 children. Results from a nonparametric Mokken analysis and a parametric IRT approach for boys (n = 268) and girls (n = 343) were compared. The authors found that most scales formed weak scales and that measurement precision was relatively low and only present for latent trait values indicating low self-perception. The subscales Physical Appearance and Global Self-Worth formed one strong scale. Children seem to interpret Global Self-Worth items as if they measure Physical Appearance. Furthermore, the authors found that strong Mokken scales (such as Global Self-Worth) consisted mostly of items that repeat the same item content. They conclude that researchers should be very careful in interpreting the total scores on the different Self-Perception Profile for Children scales. Finally, implications for further research are discussed.
Item response theory detects differential item functioning between healthy and ill children in QoL measures

PubMed Central

Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

2008-01-01

Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Tracking functional status across the spinal cord injury lifespan: linking pediatric and adult patient-reported outcome scores.

PubMed

Tian, Feng; Ni, Pengsheng; Mulcahey, M J; Hambleton, Ronald K; Tulsky, David; Haley, Stephen M; Jette, Alan M

2014-11-01

To use item response theory (IRT) methods to link scores from 2 recently developed contemporary functional outcome measures, the adult Spinal Cord Injury-Functional Index (SCI-FI) and the Pedi SCI (both the parent version and the child version). Secondary data analysis of the physical functioning items of the adult SCI-FI and the Pedi SCI instruments. We used a nonequivalent group design with items common to both instruments and the Stocking-Lord method for the linking. Linking was conducted so that the adult SCI-FI and Pedi SCI scaled scores could be compared. Community. This study included a total sample of 1558 participants. Pedi SCI items were administered to a sample of children (n=381) with SCI aged 8 to 21 years, and of parents/caregivers (n=322) of children with SCI aged 4 to 21 years. Adult SCI-FI items were administered to a sample of adults (n=855) with SCI aged 18 to 92 years. Not applicable. Five scales common to both instruments were included in the analysis: Wheelchair, Daily Routine/Self-care, Daily Routine/Fine Motor, Ambulation, and General Mobility functioning. Confirmatory factor analysis and exploratory factor analysis results indicated that the 5 scales are unidimensional. A graded response model was used to calibrate the items. Misfitting items were identified and removed from the item banks. Items that function differently between the adult and child samples (ie, exhibit differential item functioning) were identified and removed from the common items used for linking. Domain scores from the Pedi SCI instruments were transformed onto the adult SCI-FI metric. This IRT linking allowed estimation of adult SCI-FI scale scores based on Pedi SCI scale scores and vice versa; therefore, it provides clinicians with a means of tracking long-term functional data for children with an SCI across their entire lifespan. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Rasch analysis of the Chedoke-McMaster Attitudes towards Children with Handicaps scale.

PubMed

Armstrong, Megan; Morris, Christopher; Tarrant, Mark; Abraham, Charles; Horton, Mike C

2017-02-01

Aim To assess whether the Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) 36-item total scale and subscales fit the unidimensional Rasch model. Method The CATCH was administered to 1881 children, aged 7-16 years in a cross-sectional survey. Data were used from a random sample of 416 for the initial Rasch analysis. The analysis was performed on the 36-item scale and then separately for each subscale. The analysis explored fit to the Rasch model in terms of overall scale fit, individual item fit, item response categories, and unidimensionality. Item bias for gender and school level was also assessed. Revised scales were then tested on an independent second random sample of 415 children. Results Analyses indicated that the 36-item overall scale was not unidimensional and did not fit the Rasch model. Two scales of affective attitudes and behavioural intention were retained after four items were removed from each due to misfit to the Rasch model. Additionally, the scaling was improved when the two most negative response categories were aggregated. There was no item bias by gender or school level on the revised scales. Items assessing cognitive attitudes did not fit the Rasch model and had low internal consistency as a scale. Conclusion Affective attitudes and behavioural intention CATCH sub-scales should be treated separately. Caution should be exercised when using the cognitive subscale. Implications for Rehabilitation The 36-item Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) scale as a whole did not fit the Rasch model; thus indicating a multi-dimensional scale. Researchers should use two revised eight-item subscales of affective attitudes and behavioural intentions when exploring interventions aiming to improve children's attitudes towards disabled people or factors associated with those attitudes. Researchers should use the cognitive subscale with caution, as it did not create a unidimensional and internally consistent scale. Therefore, conclusions drawn from this scale may not accurately reflect children's attitudes.
Educational Leadership Effectiveness: A Rasch Analysis

ERIC Educational Resources Information Center

Sinnema, Claire; Ludlow, Larry; Robinson, Viviane

2016-01-01

Purpose: The purposes of this paper are, first, to establish the psychometric properties of the ELP tool, and, second, to test, using a Rasch item response theory analysis, the hypothesized progression of challenge presented by the items included in the tool. Design/ Methodology/ Approach: Data were collected at two time points through a survey of…
Rasch Analysis of the Adult Strabismus Quality of Life Questionnaire (AS-20) among Chinese Adult Patients with Strabismus.

PubMed

Wang, Zonghua; Zhou, Juan; Luo, Xingli; Xu, Yan; She, Xi; Chen, Ling; Yin, Honghua; Wang, Xianyuan

2015-01-01

The impact of strabismus on visual function, self-image, self-esteem, and social interactions decrease health-related quality of life (HRQoL).The purpose of this study was to evaluate and refine the adult strabismus quality of life questionnaire (AS-20) by using Rasch analysis among Chinese adult patients with strabismus. We evaluated the fitness of the AS-20 with Rasch model in Chinese population by assessing unidimensionality, infit and outfit, person and item separation index and reliability, response ordering, targeting and differential item functioning (DIF). The overall AS-20 did not demonstrate unidimensional; however, it was achieved separately in the two Rasch-revised subscales: the psychosocial subscale (11 items) and the function subscale (9 items). The features of good targeting, optimal item infit and outfit, and no notable local dependence were found for each of the subscales. The rating scale was appropriate for the psychosocial subscale but a reduction to four response categories was required for the function subscale. No significant DIF were revealed for any demographic and clinical factors (e.g., age, gender, and strabismus types). The AS-20 was demonstrated by Rasch analysis to be a rigorous instrument for measuring health-related quality of life in Chinese strabismus patents if some revisions were made regarding the subscale construct and response options.
Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

PubMed

Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

2016-12-01

The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

PubMed Central

Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.

2015-01-01

Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965
Item Banks for Substance Use from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Severity of Use and Positive Appeal of Use*

PubMed Central

Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis

2015-01-01

Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
The Mindful Attention Awareness Scale: Further Examination of Dimensionality, Reliability, and Concurrent Validity Estimates.

PubMed

Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M

2016-01-01

We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
A mixed-effects regression model for longitudinal multivariate ordinal data.

PubMed

Liu, Li C; Hedeker, Donald

2006-03-01

A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.
Measuring the ICF components of impairment, activity limitation and participation restriction: an item analysis using classical test theory and item response theory

PubMed Central

Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie

2009-01-01

Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677

Parent Ratings of ADHD Symptoms: Generalized Partial Credit Model Analysis of Differential Item Functioning across Gender

ERIC Educational Resources Information Center

Gomez, Rapson

2012-01-01

Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
An Analysis of the Individual Effects of Sex Bias.

ERIC Educational Resources Information Center

Smith, Richard M.

Most attempts to correct for the presence of biased test items in a measurement instrument have been either to remove the items or to adjust the scores to correct for the bias. Using the Rasch Dichotomous Response Model and the independent ability estimates derived from three sets of items, those which favor females, those which favor males, and…
Item Analysis and Differential Item Functioning of a Brief Conduct Problem Screen

ERIC Educational Resources Information Center

Wu, Johnny; King, Kevin M.; Witkiewitz, Katie; Racz, Sarah Jensen; McMahon, Robert J.

2012-01-01

Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item…
Psychometric properties of the SDM-Q-9 questionnaire for shared decision-making in multiple sclerosis: item response theory modelling and confirmatory factor analysis.

PubMed

Ballesteros, Javier; Moral, Ester; Brieva, Luis; Ruiz-Beato, Elena; Prefasi, Daniel; Maurino, Jorge

2017-04-22

Shared decision-making is a cornerstone of patient-centred care. The 9-item Shared Decision-Making Questionnaire (SDM-Q-9) is a brief self-assessment tool for measuring patients' perceived level of involvement in decision-making related to their own treatment and care. Information related to the psychometric properties of the SDM-Q-9 for multiple sclerosis (MS) patients is limited. The objective of this study was to assess the performance of the items composing the SDM-Q-9 and its dimensional structure in patients with relapsing-remitting MS. A non-interventional, cross-sectional study in adult patients with relapsing-remitting MS was conducted in 17 MS units throughout Spain. A nonparametric item response theory (IRT) analysis was used to assess the latent construct and dimensional structure underlying the observed responses. A parametric IRT model, General Partial Credit Model, was fitted to obtain estimates of the relationship between the latent construct and item characteristics. The unidimensionality of the SDM-Q-9 instrument was assessed by confirmatory factor analysis. A total of 221 patients were studied (mean age = 42.1 ± 9.9 years, 68.3% female). Median Expanded Disability Status Scale score was 2.5 ± 1.5. Most patients reported taking part in each step of the decision-making process. Internal reliability of the instrument was high (Cronbach's α = 0.91) and the overall scale scalability score was 0.57, indicative of a strong scale. All items, except for the item 1, showed scalability indices higher than 0.30. Four items (items 6 through to 9) conveyed more than half of the SDM-Q-9 overall information (67.3%). The SDM-Q-9 was a good fit for a unidimensional latent structure (comparative fit index = 0.98, root-mean-square error of approximation = 0.07). All freely estimated parameters were statistically significant (P < 0.001). All items presented standardized parameter estimates with salient loadings (>0.40) with the exception of item 1 which presented the lowest loading (0.26). Items 6 through to 8 were the most relevant items for shared decision-making. The SDM-Q-9 presents appropriate psychometric properties and is therefore useful for assessing different aspects of shared decision-making in patients with multiple sclerosis.
An Investigation of the Accuracy of Alternative Methods of True Score Estimation in High-Stakes Mixed-Format Examinations.

ERIC Educational Resources Information Center

Klinger, Don A.; Rogers, W. Todd

2003-01-01

The estimation accuracy of procedures based on classical test score theory and item response theory (generalized partial credit model) were compared for examinations consisting of multiple-choice and extended-response items. Analysis of British Columbia Scholarship Examination results found an error rate of about 10 percent for both methods, with…
A Bayesian Approach to Person Fit Analysis in Item Response Theory Models. Research Report.

ERIC Educational Resources Information Center

Glas, Cees A. W.; Meijer, Rob R.

A Bayesian approach to the evaluation of person fit in item response theory (IRT) models is presented. In a posterior predictive check, the observed value on a discrepancy variable is positioned in its posterior distribution. In a Bayesian framework, a Markov Chain Monte Carlo procedure can be used to generate samples of the posterior distribution…
Comparison of Fixed-Item and Response-Sensitive Versions of an Online Tutorial

ERIC Educational Resources Information Center

Grant, Lyle K.; Courtoreille, Marni

2007-01-01

This study is a comparison of 2 versions of an Internet-based tutorial that teaches the behavior-analysis concept of positive reinforcement. A fixed-item group of students studied a version of the tutorial that included 14 interactive examples and nonexamples of the concept. A response-sensitive group of students studied a different version of the…
Increasing the Number of Replications in Item Response Theory Simulations: Automation through SAS and Disk Operating System

ERIC Educational Resources Information Center

Gagne, Phill; Furlow, Carolyn; Ross, Terris

2009-01-01

In item response theory (IRT) simulation research, it is often necessary to use one software package for data generation and a second software package to conduct the IRT analysis. Because this can substantially slow down the simulation process, it is sometimes offered as a justification for using very few replications. This article provides…
An Item Response Theory Analysis of the Community of Inquiry Scale

ERIC Educational Resources Information Center

Horzum, Mehmet Baris; Uyanik, Gülden Kaya

2015-01-01

The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center's online learning programs at a Turkish state university via internet.…
DIF Analysis across Genders for Reading Comprehension Part of English Language Achievement Exam as a Foreign Language

ERIC Educational Resources Information Center

Ögretmen, Tuncay

2015-01-01

The purpose of this study is to carry out differential item functioning (DIF) analysis for content areas of a reading comprehension subtest using four area indices within Item Response Theory (IRT) framework. The differences in the magnitudes of the area indices were compared based on the subject areas. The DIF analysis was carried out across…
A Mixed Effects Randomized Item Response Model

ERIC Educational Resources Information Center

Fox, J.-P.; Wyrick, Cheryl

2008-01-01

The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
Evaluation of the Hospital Anxiety and Depression Scale (HADS) in screening stroke patients for symptoms: Item Response Theory (IRT) analysis.

PubMed

Ayis, Salma A; Ayerbe, Luis; Ashworth, Mark; DA Wolfe, Charles

2018-03-01

Variations have been reported in the number of underlying constructs and choice of thresholds that determine caseness of anxiety and /or depression using the Hospital Anxiety and Depression scale (HADS). This study examined the properties of each item of HADS as perceived by stroke patients, and assessed the information these items convey about anxiety and depression between 3 months to 5 years after stroke. The study included 1443 stroke patients from the South London Stroke Register (SLSR). The dimensionality of HADS was examined using factor analysis methods, and items' properties up to 5 years after stroke were tested using Item Response Theory (IRT) methods, including graded response models (GRMs). The presence of two dimensions of HADS (anxiety and depression) for stroke patients was confirmed. Items that accurately inferred about the severity of anxiety and depression, and offered good discrimination of caseness were identified as "I can laugh and see the funny side of things" (Q4) and "I get sudden feelings of panic" (Q13), discrimination 2.44 (se = 0.26), and 3.34 (se = 0.35), respectively. Items that shared properties, hence replicate inference were: "I get a sort of frightened feeling as if something awful is about to happen" (Q3), "I get a sort of frightened feeling like butterflies in my stomach" (Q6), and "Worrying thoughts go through my mind" (Q9). Item properties were maintained over time. Approximately 20% of patients were lost to follow up. A more concise selection of items based on their properties, would provide a precise approach for screening patients and for an optimal allocation of patients into clinical trials. Copyright © 2017 Elsevier B.V. All rights reserved.
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data

PubMed Central

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L. J.; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two ‘one-process’ models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a ‘two-process’ model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses. PMID:27167518
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Item response theory and structural equation modelling for ordinal data: Describing the relationship between KIDSCREEN and Life-H.

PubMed

Titman, Andrew C; Lancaster, Gillian A; Colver, Allan F

2016-10-01

Both item response theory and structural equation models are useful in the analysis of ordered categorical responses from health assessment questionnaires. We highlight the advantages and disadvantages of the item response theory and structural equation modelling approaches to modelling ordinal data, from within a community health setting. Using data from the SPARCLE project focussing on children with cerebral palsy, this paper investigates the relationship between two ordinal rating scales, the KIDSCREEN, which measures quality-of-life, and Life-H, which measures participation. Practical issues relating to fitting models, such as non-positive definite observed or fitted correlation matrices, and approaches to assessing model fit are discussed. item response theory models allow properties such as the conditional independence of particular domains of a measurement instrument to be assessed. When, as with the SPARCLE data, the latent traits are multidimensional, structural equation models generally provide a much more convenient modelling framework. © The Author(s) 2013.
The Responsive Environmental Assessment for Classroom Teaching (REACT): the dimensionality of student perceptions of the instructional environment.

PubMed

Nelson, Peter M; Demers, Joseph A; Christ, Theodore J

2014-06-01

This study details the initial development of the Responsive Environmental Assessment for Classroom Teachers (REACT). REACT was developed as a questionnaire to evaluate student perceptions of the classroom teaching environment. Researchers engaged in an iterative process to develop, field test, and analyze student responses on 100 rating-scale items. Participants included 1,465 middle school students across 48 classrooms in the Midwest. Item analysis, including exploratory and confirmatory factor analysis, was used to refine a 27-item scale with a second-order factor structure. Results support the interpretation of a single general dimension of the Classroom Teaching Environment with 6 subscale dimensions: Positive Reinforcement, Instructional Presentation, Goal Setting, Differentiated Instruction, Formative Feedback, and Instructional Enjoyment. Applications of REACT in research and practice are discussed along with implications for future research and the development of classroom environment measures. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Computer adaptive test approach to the assessment of children and youth with brachial plexus birth palsy.

PubMed

Mulcahey, M J; Merenda, Lisa; Tian, Feng; Kozin, Scott; James, Michelle; Gogola, Gloria; Ni, Pengsheng

2013-01-01

This study examined the psychometric properties of item pools relevant to upper-extremity function and activity performance and evaluated simulated 5-, 10-, and 15-item computer adaptive tests (CATs). In a multicenter, cross-sectional study of 200 children and youth with brachial plexus birth palsy (BPBP), parents responded to upper-extremity (n = 52) and activity (n = 34) items using a 5-point response scale. We used confirmatory and exploratory factor analysis, ordinal logistic regression, item maps, and standard errors to evaluate the psychometric properties of the item banks. Validity was evaluated using analysis of variance and Pearson correlation coefficients. Results show that the two item pools have acceptable model fit, scaled well for children and youth with BPBP, and had good validity, content range, and precision. Simulated CATs performed comparably to the full item banks, suggesting that a reduced number of items provide similar information to the entire set of items. Copyright © 2013 by the American Occupational Therapy Association, Inc.
Surveying Turkish high school and university students' attitudes and approaches to physics problem solving

NASA Astrophysics Data System (ADS)

Balta, Nuri; Mason, Andrew J.; Singh, Chandralekha

2016-06-01

Students' attitudes and approaches to physics problem solving can impact how well they learn physics and how successful they are in solving physics problems. Prior research in the U.S. using a validated Attitude and Approaches to Problem Solving (AAPS) survey suggests that there are major differences between students in introductory physics and astronomy courses and physics experts in terms of their attitudes and approaches to physics problem solving. Here we discuss the validation, administration, and analysis of data for the Turkish version of the AAPS survey for high school and university students in Turkey. After the validation and administration of the Turkish version of the survey, the analysis of the data was conducted by grouping the data by grade level, school type, and gender. While there are no statistically significant differences between the averages of various groups on the survey, overall, the university students in Turkey were more expertlike than vocational high school students. On an item by item basis, there are statistically differences between the averages of the groups on many items. For example, on average, the university students demonstrated less expertlike attitudes about the role of equations and formulas in problem solving, in solving difficult problems, and in knowing when the solution is not correct, whereas they displayed more expertlike attitudes and approaches on items related to metacognition in physics problem solving. A principal component analysis on the data yields item clusters into which the student responses on various survey items can be grouped. A comparison of the responses of the Turkish and American university students enrolled in algebra-based introductory physics courses shows that on more than half of the items, the responses of these two groups were statistically significantly different, with the U.S. students on average responding to the items in a more expertlike manner.
A new item response theory model to adjust data allowing examinee choice

PubMed Central

Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

2018-01-01

In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
Measuring euthymia within the Neuroticism Scale from the NEO Personality Inventory: A Mokken analysis of the Norwegian general population study for scalability.

PubMed

Bech, P; Carrozzino, D; Austin, S F; Møller, S B; Vassend, O

2016-03-15

Whereas the Eysenck Neuroticism Scale only contains items covering negative mental health to measure dysthymia, the NEO Personality Inventory (NEO-PI) contains neuroticism items covering both negative mental health and positive mental health (or euthymia). The consequence of wording items both positively and negatively within the NEO-PI has never been psychometrically investigated. The aim of this study was to perform a validation analysis of the NEO-PI neuroticism scale. Using a Norwegian general population study we examined the structure of the negatively and positively formulated items by principal component analysis (PCA). The scalability of the identified two groups of euthymia versus dysthymia items was examined by Mokken analysis. With a response rate of 90%, 1082 individuals with a completed NEO-PI were available. The PCA identified the neuroticism scale as the most distinct where 14 items had acceptable loadings for the euthymia subscale, another 14 items for the dysthymia subscale. However, the Mokken analysis coefficient of homogeneity only found acceptable scalability for the euthymia subscale. A comparison with the Eysenck Neuroticism Scale was not performed. The NEO-PI neuroticism scale contains two subscales consisting of items worded in an opposite direction where only the positive euthymia items have an acceptable scalability. Copyright © 2016 Elsevier B.V. All rights reserved.

A new Integrated Negative Symptom structure of the Positive and Negative Syndrome Scale (PANSS) in schizophrenia using item response analysis.

PubMed

Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka

2013-10-01

Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
Comparison of Factor Simplicity Indices for Dichotomous Data: DETECT R, Bentler's Simplicity Index, and the Loading Simplicity Index

ERIC Educational Resources Information Center

Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick

2008-01-01

A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

ERIC Educational Resources Information Center

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with…
Using the Self-Directed Search in Research: Selecting a Representative Pool of Items to Measure Vocational Interests

ERIC Educational Resources Information Center

Poitras, Sarah-Caroline; Guay, Frederic; Ratelle, Catherine F.

2012-01-01

Using Item Response Theory (IRT) and Confirmatory Factor Analysis (CFA), the goal of this study was to select a reduced pool of items from the French Canadian version of the Self-Directed Search--Activities Section (Holland, Fritzsche, & Powell, 1994). Two studies were conducted. Results of Study 1, involving 727 French Canadian students,…
Development and Validation of a Brief Version of the Dyadic Adjustment Scale With a Nonparametric Item Analysis Model

ERIC Educational Resources Information Center

Sabourin, Stephane; Valois, Pierre; Lussier, Yvan

2005-01-01

The main purpose of the current research was to develop an abbreviated form of the Dyadic Adjustment Scale (DAS) with nonparametric item response theory. The authors conducted 5 studies, with a total participation of 8,256 married or cohabiting individuals. Results showed that the item characteristic curves behaved in a monotonically increasing…
Performance of the Generalized S-X[squared] Item Fit Index for the Graded Response Model

ERIC Educational Resources Information Center

Kang, Taehoon; Chen, Troy T.

2011-01-01

The utility of Orlando and Thissen's ("2000", "2003") S-X[squared] fit index was extended to the model-fit analysis of the graded response model (GRM). The performance of a modified S-X[squared] in assessing item-fit of the GRM was investigated in light of empirical Type I error rates and power with a simulation study having…
An Analysis of Differential Response Patterns on the Peabody Picture Vocabulary Test-IIIB in Struggling Adult Readers and Third-Grade Children

ERIC Educational Resources Information Center

Pae, Hye K.; Greenberg, Daphne; Williams, Rihana S.

2012-01-01

This study examines the Peabody Picture Vocabulary Test-IIIB (PPVT-IIIB) performance of 130 adults identified as struggling readers, in comparison to 175 third-grade children. Response patterns to the items on the PPVT-IIIB by these two groups were investigated, focusing on items, semantic categories, and lexical features, including word length,…
Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

ERIC Educational Resources Information Center

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee

2017-01-01

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
An Information-Correction Method for Testlet-Based Test Analysis: From the Perspectives of Item Response Theory and Generalizability Theory. Research Report. ETS RR-17-27

ERIC Educational Resources Information Center

Li, Feifei

2017-01-01

An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…
Conners' Teacher Rating Scale for Preschool Children: A Revised, Brief, Age-Specific Measure

ERIC Educational Resources Information Center

Purpura, David J.; Lonigan, Christopher J.

2009-01-01

The Conners' Teacher Rating Scale-Revised (CTRS-R) is one of the most commonly used measures of child behavior problems. However, the scale length and the appropriateness of some of the items on the scale may reduce the usefulness of the CTRS-R for use with preschoolers. In this study, a Graded Response Model analysis based on Item Response Theory…
Comparison of Self-Reported Telephone Interviewing and Web-Based Survey Responses: Findings From the Second Australian Young and Well National Survey

PubMed Central

Davenport, Tracey A; Burns, Jane M; Hickie, Ian B

2017-01-01

Background Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). Objective The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item’s susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. Methods A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. Results When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression analysis showed respondents who were male (beta=−.19, P=.048) or who were not in employment, education, or training (NEET; beta=−.32, P=.001) were significantly more likely to provide different responses on matched items when responding in the CATI as compared with the Web-based self-report survey. The Web-based self-report survey, however, demonstrated some evidence of avidity and attrition bias. Conclusions Compared with CATI, Web-based self-report surveys are highly cost-effective and had higher rates of self-disclosure on sensitive items, particularly for respondents who identify as male and NEET. A drawback to Web-based surveying methodologies, however, includes the limited control over avidity bias and the greater incidence of attrition bias. These findings have important implications for further development of survey methods in the area of health and well-being, especially when considering research topics (in this case diagnosis, suicidal ideation, sexting, and body image) and groups that are being recruited (young people, males, and NEET). PMID:28951382
Psychometrical assessment and item analysis of the General Health Questionnaire in victims of terrorism.

PubMed

Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

2013-03-01

There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and 729 relatives of the victims. All participants were evaluated using the 28-item version of the GHQ (GHQ-28). We examined the reliability and external validity of scores on the scale using Cronbach's alpha and Pearson correlation with the State-Trait Anxiety Inventory (STAI), respectively. The factor structure of the scale was analyzed with varimax rotation. Samejima's (1969) graded response model was used to explore the item properties. The GHQ-28 scores showed good reliability and item-scale correlations. The factor analysis identified 3 factors: anxious-somatic symptoms, social dysfunction, and depression symptoms. All factors showed good correlation with the STAI. Before rotation, the first, second, and third factor explained 44.0%, 6.4%, and 5.0% of the variance, respectively. Varimax rotation redistributed the percentages of variance accounted for to 28.4%, 13.8%, and 13.2%, respectively. Items with the highest loadings in the first factor measured anxiety symptoms, whereas items with the highest loadings in the third factor measured suicide ideation. Samejima's model found that high scores in suicide-related items were associated with severe depression. The factor structure of the GHQ-28 found in this study underscores the preeminence of anxiety symptoms among victims of terrorism and their relatives. Item response analysis identified the most difficult and significant items for each factor. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Gender Invariance of the Gambling Behavior Scale for Adolescents (GBS-A): An Analysis of Differential Item Functioning Using Item Response Theory.

PubMed

Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina

2017-01-01

As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
The development and exploratory analysis of the Back Pain Attitudes Questionnaire (Back-PAQ)

PubMed Central

Darlow, Ben; Perry, Meredith; Mathieson, Fiona; Stanley, James; Melloh, Markus; Marsh, Reginald; Baxter, G David; Dowell, Anthony

2014-01-01

Objectives To develop an instrument to assess attitudes and underlying beliefs about back pain, and subsequently investigate its internal consistency and underlying structures. Design The instrument was developed by a multidisciplinary team of clinicians and researchers based on analysis of qualitative interviews with people experiencing acute and chronic back pain. Exploratory analysis was conducted using data from a population-based cross-sectional survey. Setting Qualitative interviews with community-based participants and subsequent postal survey. Participants Instrument development informed by interviews with 12 participants with acute back pain and 11 participants with chronic back pain. Data for exploratory analysis collected from New Zealand residents and citizens aged 18 years and above. 1000 participants were randomly selected from the New Zealand Electoral Roll. 602 valid responses were received. Measures The 34-item Back Pain Attitudes Questionnaire (Back-PAQ) was developed. Internal consistency was evaluated by the Cronbach α coefficient. Exploratory analysis investigated the structure of the data using Principal Component Analysis. Results The 34-item long form of the scale had acceptable internal consistency (α=0.70; 95% CI 0.66 to 0.73). Exploratory analysis identified five two-item principal components which accounted for 74% of the variance in the reduced data set: ‘vulnerability of the back’; ‘relationship between back pain and injury’; ‘activity participation while experiencing back pain’; ‘prognosis of back pain’ and ‘psychological influences on recovery’. Internal consistency was acceptable for the reduced 10-item scale (α=0.61; 95% CI 0.56 to 0.66) and the identified components (α between 0.50 and 0.78). Conclusions The 34-item long form of the scale may be appropriate for use in future cross-sectional studies. The 10-item short form may be appropriate for use as a screening tool, or an outcome assessment instrument. Further testing of the 10-item Back-PAQ's construct validity, reliability, responsiveness to change and predictive ability needs to be conducted. PMID:24860003
Development and validation of brief scales to measure emotional and behavioural problems among Chinese adolescents

PubMed Central

Shen, Minxue; Hu, Ming; Sun, Zhenqiu

2017-01-01

Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
What can we learn from PISA?: Investigating PISA's approach to scientific literacy

NASA Astrophysics Data System (ADS)

Schwab, Cheryl Jean

This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7.

PubMed

Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

PubMed

Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

2015-05-01

To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form

PubMed Central

Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.

2015-01-01

Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
Development and Validation of the Adolescent Psychological Need Support in Exercise Questionnaire.

PubMed

Emm-Collison, Lydia G; Standage, Martyn; Gillison, Fiona B

2016-10-01

Grounded within self-determination theory (SDT; Deci & Ryan, 2000; Ryan & Deci, in press), three studies were conducted to develop and psychometrically test a measure of adolescents' perceptions of psychological need support for exercise (viz., for autonomy, competence, and relatedness): the Adolescent Psychological Need Support in Exercise Questionnaire (APNSEQ). In Study 1, 34 items were developed in collaboration with an expert panel. Through categorical confirmatory factor analysis and item response theory, responses from 433 adolescents were used to identify the best fitting and performing items in Study 2. Here, a three-factor nine-item measure showed good fit to the data. In Study 3, responses from an independent sample of 373 adolescents provided further evidence for the nine-item solution as well as for internal consistency, criterion validity, and invariance across gender and social agent (friends, family, and physical education teacher). The APNSEQ was supported as a measure of adolescents' perceptions of psychological need support within the context of exercise.

Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

PubMed

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14).

PubMed

Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi

2015-01-01

Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach's alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice.
Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

PubMed

Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

2017-02-01

The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Measurement equivalence of the KINDL questionnaire across child self-reports and parent proxy-reports: a comparison between item response theory and ordinal logistic regression.

PubMed

Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara

2014-06-01

Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
A new course and textbook on Physical Models of Living Systems, for science and engineering undergraduates

NASA Astrophysics Data System (ADS)

Nelson, Philip

2015-03-01

I'll describe an intermediate-level course on ``Physical Models of Living Systems.'' The only prerequisite is first-year university physics and calculus. The course is a response to rapidly growing interest among undergraduates in a broad range of science and engineering majors. Students acquire several research skills that are often not addressed in traditional courses: Basic modeling skills Probabilistic modeling skills Data analysis methods Computer programming using a general-purpose platform like MATLAB or Python Dynamical systems, particularly feedback control. These basic skills, which are relevant to nearly any field of science or engineering, are presented in the context of case studies from living systems, including: Virus dynamics Bacterial genetics and evolution of drug resistance Statistical inference Superresolution microscopy Synthetic biology Naturally evolved cellular circuits. Work supported by NSF Grants EF-0928048 and DMR-0832802.
Item Response Theory as an Efficient Tool to Describe a Heterogeneous Clinical Rating Scale in De Novo Idiopathic Parkinson's Disease Patients.

PubMed

Buatois, Simon; Retout, Sylvie; Frey, Nicolas; Ueckert, Sebastian

2017-10-01

This manuscript aims to precisely describe the natural disease progression of Parkinson's disease (PD) patients and evaluate approaches to increase the drug effect detection power. An item response theory (IRT) longitudinal model was built to describe the natural disease progression of 423 de novo PD patients followed during 48 months while taking into account the heterogeneous nature of the MDS-UPDRS. Clinical trial simulations were then used to compare drug effect detection power from IRT and sum of item scores based analysis under different analysis endpoints and drug effects. The IRT longitudinal model accurately describes the evolution of patients with and without PD medications while estimating different progression rates for the subscales. When comparing analysis methods, the IRT-based one consistently provided the highest power. IRT is a powerful tool which enables to capture the heterogeneous nature of the MDS-UPDRS.
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
An item response theory analysis of the Psychological Inventory of Criminal Thinking Styles: comparing male and female probationers and prisoners.

PubMed

Walters, Glenn D

2014-09-01

An item response theory (IRT) analysis of the Psychological Inventory of Criminal Thinking Styles (PICTS) was performed on 26,831 (19,067 male and 7,764 female) federal probationers and compared with results obtained on 3,266 (3,039 male and 227 female) prisoners from previous research. Despite the fact male and female federal probationers scored significantly lower on the PICTS thinking style scales than male and female prisoners, discrimination and location parameter estimates for the individual PICTS items were comparable across sex and setting. Consistent with the results of a previous IRT analysis conducted on the PICTS, the current results did not support sentimentality as a component of general criminal thinking. Findings from this study indicate that the discriminative power of the individual PICTS items is relatively stable across sex (male, female) and correctional setting (probation, prison) and that the PICTS may be measuring the same criminal thinking construct in male and female probationers and prisoners. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Assessing items on the SF-8 Japanese version for health-related quality of life: a psychometric analysis based on the nominal categories model of item response theory.

PubMed

Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya

2009-06-01

The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Development of a scale for attitude toward condom use for migrant workers in India.

PubMed

Talukdar, Arunansu; Bal, Runa; Sanyal, Debasis; Roy, Krishnendu; Talukdar, Payel Sengupta

2008-02-01

The propaganda for the use of condoms remains one of the mainstay for prevention of human immunodeficiency virus (HIV) transmission. In spite of the proven efficacy of condom, some moral, social and psychological obstacles are still prevalent, hindering the use of condoms. The study tried to construct a short condom-attitude scale for use among the migrant workers, a major bridge population in India. The study was conducted among the male migrant workers who were 18-49 years old, sexually active and had heard about condoms and were engaged in nonformal jobs. We recruited 234 and 280 candidates for Phase 1 and Phase 2 respectively. Ten items from the original 40-item Brown's ATC (attitude towards condom) scale were selected in Phase 1. After analysis of Phase 1 results, using principal component analysis six items were found appropriate for measuring attitude towards condom use. These six items were then administered in another group in Phase 2. Utilizing Pearson's correlations, scale items were examined in terms of their mean response scores and the correlation matrix between items. Cornbach's alpha and construct validity were also assessed for the entire sample. Study subjects were categorized as condom users and nonusers. The scale structure was explored by analyzing response scores with respect to the items, using principal component analysis followed by varimax rotation analysis. Principal component analysis revealed that the first factor accounted for 71% of the variance, with eigenvalue greater than one. Eigenvalues of the second factor was less than one. Application of screen test suggests only one factor was dominant. Mean score of six items among condom users was 20.45 and that among nonusers was 16.67, which was statistically significant (P<0.01). Cornbach's alpha coefficient was 0.92. This tailor-made attitude-toward-condom-use scale, targeted for most vulnerable people in India, can be included in any rapid survey for assessing the existing beliefs and attitudes toward condoms and also for evaluating efficacy of an intervention program.
Multidimensional student skills with collaborative filtering

NASA Astrophysics Data System (ADS)

Bergner, Yoav; Rayyan, Saif; Seaton, Daniel; Pritchard, David E.

2013-01-01

Despite the fact that a physics course typically culminates in one final grade for the student, many instructors and researchers believe that there are multiple skills that students acquire to achieve mastery. Assessment validation and data analysis in general may thus benefit from extension to multidimensional ability. This paper introduces an approach for model determination and dimensionality analysis using collaborative filtering (CF), which is related to factor analysis and item response theory (IRT). Model selection is guided by machine learning perspectives, seeking to maximize the accuracy in predicting which students will answer which items correctly. We apply the CF to response data for the Mechanics Baseline Test and combine the results with prior analysis using unidimensional IRT.
Validation of the brief version of the Recovery Self-Assessment (RSA-B) using Rasch measurement theory.

PubMed

Barbic, Skye P; Kidd, Sean A; Davidson, Larry; McKenzie, Kwame; O'Connell, Maria J

2015-12-01

In psychiatry, the recovery paradigm is increasingly identified as the overarching framework for service provision. Currently, the Recovery Self-Assessment (RSA), a 36-item rating scale, is commonly used to assess the uptake of a recovery orientation in clinical services. However, the consumer version of the RSA has been found challenging to complete because of length and the reading level required. In response to this feedback, a brief 12-item version of the RSA was developed (RSA-B). This article describes the development of the modified instrument and the application of traditional psychometric analysis and Rasch Measurement Theory to test the psychometrics properties of the RSA-B. Data from a multisite study of adults with serious mental illnesses (n = 1256) who were followed by assertive community treatment teams were examined for reliability, clinical meaning, targeting, response categories, model fit, reliability, dependency, and raw interval-level measurement. Analyses were performed using the Rasch Unidimensional Measurement Model (RUMM 2030). Adequate fit to the Rasch model was observed (χ2 = 112.46, df = 90, p = .06) and internal consistency was good (r = .86). However, Rasch analysis revealed limitations of the 12-item version, with items covering only 39% of the targeted theoretical continuum, 2 misfitting items, and strong evidence for the 5 option response categories not working as intended. This study revealed areas for improvement in the shortened version of the 12-item RSA-B. A revisit of the conceptual model and original 36-item rating scale is encouraged to select items that will help practitioners and researchers measure the full range of recovery orientation. (c) 2015 APA, all rights reserved).
Development of the multiple sclerosis (MS) early mobility impairment questionnaire (EMIQ).

PubMed

Ziemssen, Tjalf; Phillips, Glenn; Shah, Ruchit; Mathias, Adam; Foley, Catherine; Coon, Cheryl; Sen, Rohini; Lee, Andrew; Agarwal, Sonalee

2016-10-01

The Early Mobility Impairment Questionnaire (EMIQ) was developed to facilitate early identification of mobility impairments in multiple sclerosis (MS) patients. We describe the initial development of the EMIQ with a focus on the psychometric evaluation of the questionnaire using classical and item response theory methods. The initial 20-item EMIQ was constructed by clinical specialists and qualitatively tested among people with MS and physicians via cognitive interviews. Data from an observational study was used to make additional updates to the instrument based on exploratory factor analysis (EFA) and item response theory (IRT) analysis, and psychometric analyses were performed to evaluate the reliability and validity of the final instrument's scores and screening properties (i.e., sensitivity and specificity). Based on qualitative interview analyses, a revised 15-item EMIQ was included in the observational study. EFA, IRT and item-to-item correlation analyses revealed redundant items which were removed leading to the final nine-item EMIQ. The nine-item EMIQ performed well with respect to: test-retest reliability (ICC = 0.858); internal consistency (α = 0.893); convergent validity; and known-groups methods for construct validity. A cut-point of 41 on the 0-to-100 scale resulted in sufficient sensitivity and specificity statistics for viably identifying patients with mobility impairment. The EMIQ is a content valid and psychometrically sound instrument for capturing MS patients' experience with mobility impairments in a clinical practice setting. Additional research is suggested to further confirm the EMIQ's screening properties over time.
Multiscale Measurement of Extreme Response Style

ERIC Educational Resources Information Center

Bolt, Daniel M.; Newton, Joseph R.

2011-01-01

This article extends a methodological approach considered by Bolt and Johnson for the measurement and control of extreme response style (ERS) to the analysis of rating data from multiple scales. Specifically, it is shown how the simultaneous analysis of item responses across scales allows for more accurate identification of ERS, and more effective…
The value of children to gay and heterosexual fathers.

PubMed

Bigner, J J; Jacobsen, R B

1989-01-01

Responses of 33 gay fathers were compared with those of 33 heterosexual fathers on the Value of Children scale, an empirical measure of the reasons for wanting to become a parent. Responses of gay fathers did not differ significantly from heterosexual fathers on the majority of the items of the inventory, but differences were found on two subscales, Tradition-Continuity-Security and Social Status. Item analysis of responses shows that gay fathers may have particularly significant reasons motivating them to become parents.
Clarifying and Measuring Filial Concepts across Five Cultural Groups

PubMed Central

Jones, Patricia S.; Lee, Jerry W.; Zhang, Xinwei E.

2011-01-01

Literature on responsibility of adult children for aging parents reflects lack of conceptual clarity. We examined filial concepts across five cultural groups: African-, Asian-, Euro-, Latino-, and Native Americans. Data were randomly divided for scale development (n = 285) and cross-validation (n = 284). Exploratory factor analysis on 59 items identified three filial concepts: Responsibility, Respect, and Care. Confirmatory factor analysis on a 12-item final scale showed data fit the three-factor model better than the single factor solution despite substantial correlations between the factors (.82, .82 for Care with Responsibility and Respect, and .74 for Responsibility with Respect). The scale can be used in cross-cultural research to test hypotheses that predict associations among filial values, filial caregiving, and caregiver health outcomes. PMID:21618557
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.

PubMed

Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland

2014-04-01

To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Item analysis of examinations in the Faculty of Medicine of Tunis.

PubMed

Hermi, Amene; Achour, Wafa

2016-04-01

Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT). Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency. Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.
What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

PubMed Central

Koller, Ingrid; Levenson, Michael R.; Glück, Judith

2017-01-01

The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777
A large-scale, long-term study of scale drift: The micro view and the macro view

NASA Astrophysics Data System (ADS)

He, W.; Li, S.; Kingsbury, G. G.

2016-11-01

The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.

Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales in school children.

PubMed

Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra

2012-03-13

Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed Central

Feinberg, Richard A.; Clauser, Amanda L.

2016-01-01

ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
Comparison of Self-Reported Telephone Interviewing and Web-Based Survey Responses: Findings From the Second Australian Young and Well National Survey.

PubMed

Milton, Alyssa C; Ellis, Louise A; Davenport, Tracey A; Burns, Jane M; Hickie, Ian B

2017-09-26

Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item's susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression analysis showed respondents who were male (beta=-.19, P=.048) or who were not in employment, education, or training (NEET; beta=-.32, P=.001) were significantly more likely to provide different responses on matched items when responding in the CATI as compared with the Web-based self-report survey. The Web-based self-report survey, however, demonstrated some evidence of avidity and attrition bias. Compared with CATI, Web-based self-report surveys are highly cost-effective and had higher rates of self-disclosure on sensitive items, particularly for respondents who identify as male and NEET. A drawback to Web-based surveying methodologies, however, includes the limited control over avidity bias and the greater incidence of attrition bias. These findings have important implications for further development of survey methods in the area of health and well-being, especially when considering research topics (in this case diagnosis, suicidal ideation, sexting, and body image) and groups that are being recruited (young people, males, and NEET). ©Alyssa C Milton, Louise A Ellis, Tracey A Davenport, Jane M Burns, Ian B Hickie. Originally published in JMIR Mental Health (http://mental.jmir.org), 26.09.2017.
Evaluation of diagnostic criteria for panic attack using item response theory: findings from the National Comorbidity Survey in USA.

PubMed

Ietsugu, Tetsuji; Sukigara, Masune; Furukawa, Toshiaki A

2007-12-01

The dichotomous diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) lose much important information concerning what each symptom can offer. This study explored the characteristics and performances of DSM-IV and ICD-10 diagnostic criteria items for panic attack using modern item response theory (IRT). The National Comorbidity Survey used the Composite International Diagnostic Interview to assess 14 DSM-IV and ICD-10 panic attack diagnostic criteria items in the general population in the USA. The dimensionality and measurement properties of these items were evaluated using dichotomous factor analysis and the two-parameter IRT model. A total of 1213 respondents reported at least one subsyndromal or syndromal panic attack in their lifetime. Factor analysis indicated that all items constitute a unidimensional construct. The two-parameter IRT model produced meaningful and interpretable results. Among items with high discrimination parameters, the difficulty parameter for "palpitation" was relatively low, while those for "choking," "fear of dying" and "paresthesia" were relatively high. Several items including "dry mouth" and "fear of losing control" had low discrimination parameters. The item characteristics of diagnostic criteria among help-seeking clinical populations may be different from those that we observed in the general population and deserve further examination. "Paresthesia," "choking" and "fear of dying" can be thought to be good indicators of severe panic attacks, while "palpitation" can discriminate well between cases and non-cases at low level of panic attack severity. Items such as "dry mouth" would contribute less to the discrimination.
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain

PubMed Central

Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

PubMed

Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
A validation study of public health knowledge, skills, social responsibility and applied learning.

PubMed

Vackova, Dana; Chen, Coco K; Lui, Juliana N M; Johnston, Janice M

2018-06-22

To design and validate a questionnaire to measure medical students' Public Health (PH) knowledge, skills, social responsibility and applied learning as indicated in the four domains recommended by the Association of Schools & Programmes of Public Health (ASPPH). A cross-sectional study was conducted to develop an evaluation tool for PH undergraduate education through item generation, reduction, refinement and validation. The 74 preliminary items derived from the existing literature were reduced to 55 items based on expert panel review which included those with expertise in PH, psychometrics and medical education, as well as medical students. Psychometric properties of the preliminary questionnaire were assessed as follows: frequency of endorsement for item variance; principal component analysis (PCA) with varimax rotation for item reduction and factor estimation; Cronbach's Alpha, item-total correlation and test-retest validity for internal consistency and reliability. PCA yielded five factors: PH Learning Experience (6 items); PH Risk Assessment and Communication (5 items); Future Use of Evidence in Practice (6 items); Recognition of PH as a Scientific Discipline (4 items); and PH Skills Development (3 items), explaining 72.05% variance. Internal consistency and reliability tests were satisfactory (Cronbach's Alpha ranged from 0.87 to 0.90; item-total correlation > 0.59). Lower paired test-retest correlations reflected instability in a social science environment. An evaluation tool for community-centred PH education has been developed and validated. The tool measures PH knowledge, skills, social responsibilities and applied learning as recommended by the internationally recognised Association of Schools & Programmes of Public Health (ASPPH).
Validation of self-directed learning instrument and establishment of normative data for nursing students in taiwan: using polytomous item response theory.

PubMed

Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia

2014-06-01

Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.
Effects of bilateral eye movements on the retrieval of item, associative, and contextual information.

PubMed

Parker, Andrew; Relph, Sarah; Dagnall, Neil

2008-01-01

Two experiments are reported that investigate the effects of saccadic bilateral eye movements on the retrieval of item, associative, and contextual information. Experiment 1 compared the effects of bilateral versus vertical versus no eye movements on tests of item recognition, followed by remember-know responses and associative recognition. Supporting previous research, bilateral eye movements enhanced item recognition by increasing the hit rate and decreasing the false alarm rate. Analysis of remember-know responses indicated that eye movement effects were accompanied by increases in remember responses. The test of associative recognition found that bilateral eye movements increased correct responses to intact pairs and decreased false alarms to rearranged pairs. Experiment 2 assessed the effects of eye movements on the recall of intrinsic (color) and extrinsic (spatial location) context. Bilateral eye movements increased correct recall for both types of context. The results are discussed within the framework of dual-process models of memory and the possible neural underpinnings of these effects are considered.
An Item Response Theory (IRT) analysis of the Short Inventory of Problems-Alcohol and Drugs (SIP-AD) among non-treatment seeking men-who-have-sex-with-men: evidence for a shortened 10-item SIP-AD.

PubMed

Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E

2009-11-01

The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.
Nonresponse bias in randomized controlled experiments in criminology: Putting the Queensland Community Engagement Trial (QCET) under a microscope.

PubMed

Antrobus, Emma; Elffers, Henk; White, Gentry; Mazerolle, Lorraine

2013-01-01

The goal of this article is to examine whether or not the results of the Queensland Community Engagement Trial (QCET)-a randomized controlled trial that tested the impact of procedural justice policing on citizen attitudes toward police-were affected by different types of nonresponse bias. We use two methods (Cochrane and Elffers methods) to explore nonresponse bias: First, we assess the impact of the low response rate by examining the effects of nonresponse group differences between the experimental and control conditions and pooled variance under different scenarios. Second, we assess the degree to which item response rates are influenced by the control and experimental conditions. Our analysis of the QCET data suggests that our substantive findings are not influenced by the low response rate in the trial. The results are robust even under extreme conditions, and statistical significance of the results would only be compromised in cases where the pooled variance was much larger for the nonresponse group and the difference between experimental and control conditions was greatly diminished. We also find that there were no biases in the item response rates across the experimental and control conditions. RCTs that involve field survey responses-like QCET-are potentially compromised by low response rates and how item response rates might be influenced by the control or experimental conditions. Our results show that the QCET results were not sensitive to the overall low response rate across the experimental and control conditions and the item response rates were not significantly different across the experimental and control groups. Overall, our analysis suggests that the results of QCET are robust and any biases in the survey responses do not significantly influence the main experimental findings.
The Usability of CAT System for Assessing the Depressive Level of Japanese-A Study on Psychometric Properties and Response Behavior.

PubMed

Iwata, Noboru; Kikuchi, Kenichi; Fujihara, Yuya

2016-08-01

An innovative measurement system using a computerized adaptive testing technique based on the item response theory (CAT) has been expanding to measure mental health status. However, little is known about details in its measurement properties based on the empirical data. Moreover, the response time (RT) data, which are not available by a paper-and-pencil measurement but available by a computerized measurement, would be worth investigating for exploring the response behavior. We aimed at constructing the CAT to measure depressive symptomatology in a community population and exploring its measurement properties. Also, we examined the relationships between RTs, individual item responses, and depressive levels. For constructing the CAT system, responses of 2061 workers and university students to 24 depression scale plus four negatively revised positive affect items were subjected to a polytomous IRT analysis. The stopping rule was set for standard error of estimation < 0.30 or the maximum 15 items displayed. The CAT and non-adaptive computer-based test (CBT) were administered to 209 undergraduates, and 168 of them administered again after 1 week. On average, the CAT was converged by 10.4 items. The θ values estimated by CAT and CBT were highly correlated (r = 0.94 and 0.95 for the 1st and 2nd measurements) and with the traditional scoring procedures (r's > 0.90). The test-retest reliability was at a satisfactory level (r = 0.86). RTs to some items significantly correlated with the θ estimates. The mean RT varied by the item contents and wording, i.e., the RT to positive affect items required additional 2 s or longer than the other subscale items. The CAT would be a reliable and practical measurement tool for various purposes including stress check at workplace.
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods

PubMed Central

Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-01-01

Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Psychometric properties of the neck disability index amongst patients with chronic neck pain using item response theory.

PubMed

Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri

2017-05-13

The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

PubMed

Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

2018-06-01

The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

PubMed

Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

2018-01-01

To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

PubMed

Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

2018-03-01

The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
The development and psychometric validation of the Ethical Awareness Scale.

PubMed

Milliken, Aimee; Ludlow, Larry; DeSanto-Madeya, Susan; Grace, Pamela

2018-04-19

To develop and psychometrically assess the Ethical Awareness Scale using Rasch measurement principles and a Rasch item response theory model. Critical care nurses must be equipped to provide good (ethical) patient care. This requires ethical awareness, which involves recognizing the ethical implications of all nursing actions. Ethical awareness is imperative in successfully addressing patient needs. Evidence suggests that the ethical import of everyday issues may often go unnoticed by nurses in practice. Assessing nurses' ethical awareness is a necessary first step in preparing nurses to identify and manage ethical issues in the highly dynamic critical care environment. A cross-sectional design was used in two phases of instrument development. Using Rasch principles, an item bank representing nursing actions was developed (33 items). Content validity testing was performed. Eighteen items were selected for face validity testing. Two rounds of operational testing were performed with critical care nurses in Boston between February-April 2017. A Rasch analysis suggests sufficient item invariance across samples and sufficient construct validity. The analysis further demonstrates a progression of items uniformly along a hierarchical continuum; items that match respondent ability levels; response categories that are sufficiently used; and adequate internal consistency. Mean ethical awareness scores were in the low/moderate range. The results suggest the Ethical Awareness Scale is a psychometrically sound, reliable and valid measure of ethical awareness in critical care nurses. © 2018 John Wiley & Sons Ltd.
Development of the Sexual Minority Adolescent Stress Inventory

PubMed Central

Schrager, Sheree M.; Goldbach, Jeremy T.; Mamey, Mary Rose

2018-01-01

Although construct measurement is critical to explanatory research and intervention efforts, rigorous measure development remains a notable challenge. For example, though the primary theoretical model for understanding health disparities among sexual minority (e.g., lesbian, gay, bisexual) adolescents is minority stress theory, nearly all published studies of this population rely on minority stress measures with poor psychometric properties and development procedures. In response, we developed the Sexual Minority Adolescent Stress Inventory (SMASI) with N = 346 diverse adolescents ages 14–17, using a comprehensive approach to de novo measure development designed to produce a measure with desirable psychometric properties. After exploratory factor analysis on 102 candidate items informed by a modified Delphi process, we applied item response theory techniques to the remaining 72 items. Discrimination and difficulty parameters and item characteristic curves were estimated overall, within each of 12 initially derived factors, and across demographic subgroups. Two items were removed for excessive discrimination and three were removed following reliability analysis. The measure demonstrated configural and scalar invariance for gender and age; a three-item factor was excluded for demonstrating substantial differences by sexual identity and race/ethnicity. The final 64-item measure comprised 11 subscales and demonstrated excellent overall (α = 0.98), subscale (α range 0.75–0.96), and test–retest (scale r > 0.99; subscale r range 0.89–0.99) reliabilities. Subscales represented a mix of proximal and distal stressors, including domains of internalized homonegativity, identity management, intersectionality, and negative expectancies (proximal) and social marginalization, family rejection, homonegative climate, homonegative communication, negative disclosure experiences, religion, and work domains (distal). Thus, the SMASI development process illustrates a method to incorporate information from multiple sources, including item response theory models, to guide item selection in building a psychometrically sound measure. We posit that similar methods can be used to improve construct measurement across all areas of psychological research, particularly in areas where a strong theoretical framework exists but existing measures are limited. PMID:29599737
Development of a questionnaire for assessing the childbirth experience (QACE).

PubMed

Carquillat, Pierre; Vendittelli, Françoise; Perneger, Thomas; Guittier, Marie-Julia

2017-08-30

Due to its potential impact on women's psychological health, assessing perceptions of their childbirth experience is important. The aim of this study was to develop a multidimensional self-reporting questionnaire to evaluate the childbirth experience. Factors influencing the childbirth experience were identified from a literature review and the results of a previous qualitative study. A total of 25 items were combined from existing instruments or were created de novo. A draft version was pilot tested for face validity with 30 women and submitted for evaluation of its construct validity to 477 primiparous women at one-month post-partum. The recruitment took place in two obstetric clinics from Swiss and French university hospitals. To evaluate the content validity, we compared item responses to general childbirth experience assessments on a numeric, 0 to 10 rating scale. We dichotomized two group assessment scores: "0 to 7" and "8 to 10". We performed an exploratory factor analysis to identify underlying dimensions. In total, 291 women completed the questionnaire (response rate = 61%). The responses to 22 items were statistically significant between the 0 to 7 and 8 to 10 groups for the general childbirth experience assessments. An exploratory factor analysis yielded four sub-scales, which were labelled "relationship with staff" (4 items), "emotional status" (3 items), "first moments with the new born," (3 items) and "feelings at one month postpartum" (3 items). All 4 scales had satisfactory internal consistency levels (alpha coefficients from 0.70 to 0.85). The full 25-item version can be used to analyse each item by itself, and the short 4-dimension version can be scored to summarize the general assessment of the childbirth experience. The Questionnaire for Assessing the Childbirth Experience (QACE) could be useful as a screening instrument to identify women with negative childbirth experiences. It can be used as both a research instrument in its short version and a questionnaire for use in clinical practice in its full version.

Investigating diagnostic bias in autism spectrum conditions: An item response theory analysis of sex bias in the AQ-10.

PubMed

Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie

2017-05-01

Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Three-dimensional structural representation of the sleep-wake adaptability.

PubMed

Putilov, Arcady A

2016-01-01

Various characteristics of the sleep-wake cycle can determine the success or failure of individual adjustment to certain temporal conditions of the today's society. However, it remains to be explored how many such characteristics can be self-assessed and how they are inter-related one to another. The aim of the present report was to apply a three-dimensional structural representation of the sleep-wake adaptability in the form of "rugby cake" (scalene or triaxial ellipsoid) to explain the results of analysis of the pattern of correlations of the responses to the initial 320-item list of a new inventory with scores on the six scales designed for multidimensional self-assessment of the sleep-wake adaptability (Morning and Evening Lateness, Anytime and Nighttime Sleepability, and Anytime and Daytime Wakeability). The results obtained for sample consisting of 149 respondents were confirmed by the results of similar analysis of earlier collected responses of 139 respondents to the same list of 320 items and responses of 1213 respondents to the 72 items of one of the earlier established questionnaire tools. Empirical evidence was provided in support of the model-driven prediction of the possibility to identify items linked to as many as 36 narrow (6 core and 30 mixed) adaptabilities of the sleep-wake cycle. The results enabled the selection of 168 items for self-assessment of all these adaptabilities predicted by the rugby cake model.
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14)

PubMed Central

Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi

2015-01-01

Purpose Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. Methods In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach’s alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). Results The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. Conclusion The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice. PMID:26247356
Genes, Culture and Conservatism-A Psychometric-Genetic Approach.

PubMed

Schwabe, Inga; Jonker, Wilfried; van den Berg, Stéphanie M

2016-07-01

The Wilson-Patterson conservatism scale was psychometrically evaluated using homogeneity analysis and item response theory models. Results showed that this scale actually measures two different aspects in people: on the one hand people vary in their agreement with either conservative or liberal catch-phrases and on the other hand people vary in their use of the "?" response category of the scale. A 9-item subscale was constructed, consisting of items that seemed to measure liberalism, and this subscale was subsequently used in a biometric analysis including genotype-environment interaction, correcting for non-homogeneous measurement error. Biometric results showed significant genetic and shared environmental influences, and significant genotype-environment interaction effects, suggesting that individuals with a genetic predisposition for conservatism show more non-shared variance but less shared variance than individuals with a genetic predisposition for liberalism.
[Mokken scaling of the Cognitive Screening Test].

PubMed

Diesfeldt, H F A

2009-10-01

The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
Calibration of the Spanish PROMIS Smoking Item Banks.

PubMed

Huang, Wenjing; Stucky, Brian D; Edelen, Maria O; Tucker, Joan S; Shadel, William G; Hansen, Mark; Cai, Li

2016-07-01

The Patient-Reported Outcomes Measurement Information System (PROMIS) Smoking Initiative has developed item banks for assessing six smoking behaviors and biopsychosocial correlates of smoking among adult cigarette smokers. The goal of this study is to evaluate the performance of the Spanish version of the PROMIS smoking item banks as compared to the original banks developed in English. The six PROMIS banks for daily smokers were translated into Spanish and administered to a sample of Spanish-speaking adult daily smokers in the United States (N = 302). We first evaluated the unidimensionality of each bank using confirmatory factor analysis. We then conducted a two-group item response theory calibration, including an item response theory-based Differential Item Functioning (DIF) analysis by language of administration (Spanish vs. English). Finally, we generated full bank and short form scores for the translated banks and evaluated their psychometric performance. Unidimensionality of the Spanish smoking item banks was supported by confirmatory factor analysis results. Out of a total of 109 items that were evaluated for language DIF, seven items in three of the six banks were identified as having levels of DIF that exceeded an established criterion. The psychometric performance of the Spanish daily smoker banks is largely comparable to that of the English versions. The Spanish PROMIS smoking item banks are highly similar, but not entirely equivalent, to the original English versions. The parameters from these two-group calibrations can be used to generate comparable bank scores across the two language versions. In this study, we developed a Spanish version of the PROMIS smoking toolkit, which was originally designed and developed for English speakers. With the growing Spanish-speaking population, it is important to make the toolkit more accessible by translating the items and calibrating the Spanish version to be comparable with English-language scores. This study provided the translated item banks and short forms, comparable unbiased scores for Spanish speakers and evaluations of the psychometric properties of the new Spanish toolkit. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

PubMed

Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

2016-05-20

Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
The Dispositions for Culturally Responsive Pedagogy Scale

ERIC Educational Resources Information Center

Whitaker, Manya C.; Valtierra, Kristina Marie

2018-01-01

Purpose: The purpose of this study is to develop and validate the dispositions for culturally responsive pedagogy scale (DCRPS). Design/methodology/approach: Scale development consisted of a six-step process including item development, expert review, exploratory factor analysis, factor interpretation, confirmatory factor analysis and convergent…
An Item Bank for Abuse of Prescription Pain Medication from the Patient-Reported Outcomes Measurement Information System (PROMIS®).

PubMed

Pilkonis, Paul A; Yu, Lan; Dodds, Nathan E; Johnston, Kelly L; Lawrence, Suzanne M; Hilton, Thomas F; Daley, Dennis C; Patkar, Ashwin A; McCarty, Dennis

2017-08-01

There is a need to monitor patients receiving prescription opioids to detect possible signs of abuse. To address this need, we developed and calibrated an item bank for severity of abuse of prescription pain medication as part of the Patient-Reported Outcomes Measurement Information System (PROMIS ® ). Comprehensive literature searches yielded an initial bank of 5,310 items relevant to substance use and abuse, including abuse of prescription pain medication, from over 80 unique instruments. After qualitative item analysis (i.e., focus groups, cognitive interviewing, expert review, and item revision), 25 items for abuse of prescribed pain medication were included in field testing. Items were written in a first-person, past-tense format, with a three-month time frame and five response options reflecting frequency or severity. The calibration sample included 448 respondents, 367 from the general population (ascertained through an internet panel) and 81 from community treatment programs participating in the National Drug Abuse Treatment Clinical Trials Network. A final bank of 22 items was calibrated using the two-parameter graded response model from item response theory. A seven-item static short form was also developed. The test information curve showed that the PROMIS ® item bank for abuse of prescription pain medication provided substantial information in a broad range of severity. The initial psychometric characteristics of the item bank support its use as a computerized adaptive test or short form, with either version providing a brief, precise, and efficient measure relevant to both clinical and community samples. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
A model for incomplete longitudinal multivariate ordinal data.

PubMed

Liu, Li C

2008-12-30

In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point. Copyright 2008 John Wiley & Sons, Ltd.
Item Response Theory to Quantify Longitudinal Placebo and Paliperidone Effects on PANSS Scores in Schizophrenia.

PubMed

Krekels, Ehj; Novakovic, A M; Vermeulen, A M; Friberg, L E; Karlsson, M O

2017-08-01

As biomarkers are lacking, multi-item questionnaire-based tools like the Positive and Negative Syndrome Scale (PANSS) are used to quantify disease severity in schizophrenia. Analyzing composite PANSS scores as continuous data discards information and violates the numerical nature of the scale. Here a longitudinal analysis based on Item Response Theory is presented using PANSS data from phase III clinical trials. Latent disease severity variables were derived from item-level data on the positive, negative, and general PANSS subscales each. On all subscales, the time course of placebo responses were best described with Weibull models, and dose-independent functions with exponential models to describe the onset of the full effect were used to describe paliperidone's effect. Placebo and drug effect were most pronounced on the positive subscale. The final model successfully describes the time course of treatment effects on the individual PANSS item-levels, on all PANSS subscale levels, and on the total score level. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

PubMed

Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

2015-12-01

The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

PubMed Central

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.

PubMed

Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro

2013-01-01

the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Rasch Analysis for Binary Data with Nonignorable Nonresponses

ERIC Educational Resources Information Center

Bertoli-Barsotti, Lucio; Punzo, Antonio

2013-01-01

This paper introduces a two-dimensional Item Response Theory (IRT) model to deal with nonignorable nonresponses in tests with dichotomous items. One dimension provides information about the omitting behavior, while the other dimension is related to the person's "ability". The idea of embedding an IRT model for missingness into the measurement…
A Cost Analysis of Forward Positioning Material in the Fifth Fleet Area of Responsibility

DTIC Science & Technology

2014-12-01

CTF combined task force CY calendar year DDC defense distribution center DDNB Defense Depot Navy Detachment, Bahrain DHL DHL Airways DLA...items from Bahrain to the requesting unit.  Defense Distribution Center ( DDC )– DDC is concerned with getting the requested national item
Combination of classical test theory (CTT) and item response theory (IRT) analysis to study the psychometric properties of the French version of the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF).

PubMed

Bourion-Bédès, Stéphanie; Schwan, Raymund; Epstein, Jonathan; Laprevote, Vincent; Bédès, Alex; Bonnet, Jean-Louis; Baumann, Cédric

2015-02-01

The study aimed to examine the construct validity and reliability of the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF) according to both classical test and item response theories. The psychometric properties of the French version of this instrument were investigated in a cross-sectional, multicenter study. A total of 124 outpatients with a substance dependence diagnosis participated in the study. Psychometric evaluation included descriptive analysis, internal consistency, test-retest reliability, and validity. The dimensionality of the instrument was explored using a combination of the classical test, confirmatory factor analysis (CFA), and an item response theory analysis, the Person Separation Index (PSI), in a complementary manner. The results of the Q-LES-Q-SF revealed that the questionnaire was easy to administer and the acceptability was good. The internal consistency and the test-retest reliability were 0.9 and 0.88, respectively. All items were significantly correlated with the total score and the SF-12 used in the study. The CFA with one factor model was good, and for the unidimensional construct, the PSI was found to be 0.902. The French version of the Q-LES-Q-SF yielded valid and reliable clinical assessments of the quality of life for future research and clinical practice involving French substance abusers. In response to recent questioning regarding the unidimensionality or bidimensionality of the instrument and according to the underlying theoretical unidimensional construct used for its development, this study suggests the Q-LES-Q-SF as a one-dimension questionnaire in French QoL studies.
The development and exploratory analysis of the Back Pain Attitudes Questionnaire (Back-PAQ).

PubMed

Darlow, Ben; Perry, Meredith; Mathieson, Fiona; Stanley, James; Melloh, Markus; Marsh, Reginald; Baxter, G David; Dowell, Anthony

2014-05-23

To develop an instrument to assess attitudes and underlying beliefs about back pain, and subsequently investigate its internal consistency and underlying structures. The instrument was developed by a multidisciplinary team of clinicians and researchers based on analysis of qualitative interviews with people experiencing acute and chronic back pain. Exploratory analysis was conducted using data from a population-based cross-sectional survey. Qualitative interviews with community-based participants and subsequent postal survey. Instrument development informed by interviews with 12 participants with acute back pain and 11 participants with chronic back pain. Data for exploratory analysis collected from New Zealand residents and citizens aged 18 years and above. 1000 participants were randomly selected from the New Zealand Electoral Roll. 602 valid responses were received. The 34-item Back Pain Attitudes Questionnaire (Back-PAQ) was developed. Internal consistency was evaluated by the Cronbach α coefficient. Exploratory analysis investigated the structure of the data using Principal Component Analysis. The 34-item long form of the scale had acceptable internal consistency (α=0.70; 95% CI 0.66 to 0.73). Exploratory analysis identified five two-item principal components which accounted for 74% of the variance in the reduced data set: 'vulnerability of the back'; 'relationship between back pain and injury'; 'activity participation while experiencing back pain'; 'prognosis of back pain' and 'psychological influences on recovery'. Internal consistency was acceptable for the reduced 10-item scale (α=0.61; 95% CI 0.56 to 0.66) and the identified components (α between 0.50 and 0.78). The 34-item long form of the scale may be appropriate for use in future cross-sectional studies. The 10-item short form may be appropriate for use as a screening tool, or an outcome assessment instrument. Further testing of the 10-item Back-PAQ's construct validity, reliability, responsiveness to change and predictive ability needs to be conducted. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Confirmatory Factor Analysis of the Finnish Job Content Questionnaire (JCQ) in 590 Professional Musicians.

PubMed

Vastamäki, Heidi; Vastamäki, Martti; Laimi, Katri; Saltychev, Michail

2017-07-01

Poorly functioning work environments may lead to dissatisfaction for the employees and financial loss for the employers. The Job Content Questionnaire (JCQ) was designed to measure social and psychological characteristics of work environments. To investigate the factor construct of the Finnish 14-item version of JCQ when applied to professional orchestra musicians. In a cross-sectional survey, the questionnaire was sent by mail to 1550 orchestra musicians and students. 630 responses were received. Full data were available for 590 respondents (response rate 38%).The questionnaire also contained questions on demographics, job satisfaction, health status, health behaviors, and intensity of playing music. Confirmatory factor analysis of the 2-factor model of JCQ was conducted. Of the 5 estimates, JCQ items in the "job demand" construct, the "conflicting demands" (question 5) explained most of the total variance in this construct (79%) demonstrating almost perfect correlation of 0.63. In the construct of "job control," "opinions influential" (question 10) demonstrated a perfect correlation index of 0.84 and the items "little decision freedom" (question 14) and "allows own decisions" (question 6) showed substantial correlations of 0.77 and 0.65. The 2-factor model of the Finnish 14-item version of JCQ proposed in this study fitted well into the observed data. The "conflicting demands," "opinions influential," "little decision freedom," and "allows own decisions" items demonstrated the strongest correlations with latent factors suggesting that in a population similar to the studied one, especially these items should be taken into account when observed in the response of a population.
Development and validation of a patient-reported outcome measure for stroke patients.

PubMed

Luo, Yanhong; Yang, Jie; Zhang, Yanbo

2015-05-08

Family support and patient satisfaction with treatment are crucial for aiding in the recovery from stroke. However, current validated stroke-specific questionnaires may not adequately capture the impact of these two variables on patients undergoing clinical trials of new drugs. Therefore, the aim of this study was to develop and evaluate a new stroke patient-reported outcome measure (Stroke-PROM) instrument for capturing more comprehensive effects of stroke on patients participating in clinical trials of new drugs. A conceptual framework and a pool of items for the preliminary Stroke-PROM were generated by consulting the relevant literature and other questionnaires created in China and other countries, and interviewing 20 patients and 4 experts to ensure that all germane parameters were included. During the first item-selection phase, classical test theory and item response theory were applied to an initial scale completed by 133 patients with stroke. During the item-revaluation phase, classical test theory and item response theory were used again, this time with 475 patients with stroke and 104 healthy participants. During the scale assessment phase, confirmatory factor analysis was applied to the final scale of the Stroke-PROM using the same study population as in the second item-selection phase. Reliability, validity, responsiveness and feasibility of the final scale were tested. The final scale of Stroke-PROM contained 46 items describing four domains (physiology, psychology, society and treatment). These four domains were subdivided into 10 subdomains. Cronbach's α coefficients for the four domains ranged from 0.861 to 0.908. Confirmatory factor analysis supported the validity of the final scale, and the model fit index satisfied the criterion. Differences in the Stroke-PROM mean scores were significant between patients with stroke and healthy participants in nine subdomains (P < 0.001), indicating that the scale showed good responsiveness. The Stroke-PROM is a patient-reported outcome multidimensional questionnaire developed especially for clinical trials of new drugs and is focused on issues of family support and patient satisfaction with treatment. Extensive data analyses supported the validity, reliability and responsiveness of the Stroke-PROM.

A Bivariate Generalized Linear Item Response Theory Modeling Framework to the Analysis of Responses and Response Times.

PubMed

Molenaar, Dylan; Tuerlinckx, Francis; van der Maas, Han L J

2015-01-01

A generalized linear modeling framework to the analysis of responses and response times is outlined. In this framework, referred to as bivariate generalized linear item response theory (B-GLIRT), separate generalized linear measurement models are specified for the responses and the response times that are subsequently linked by cross-relations. The cross-relations can take various forms. Here, we focus on cross-relations with a linear or interaction term for ability tests, and cross-relations with a curvilinear term for personality tests. In addition, we discuss how popular existing models from the psychometric literature are special cases in the B-GLIRT framework depending on restrictions in the cross-relation. This allows us to compare existing models conceptually and empirically. We discuss various extensions of the traditional models motivated by practical problems. We also illustrate the applicability of our approach using various real data examples, including data on personality and cognitive ability.
Modeling Composite Assessment Data Using Item Response Theory

PubMed Central

Ueckert, Sebastian

2018-01-01

Composite assessments aim to combine different aspects of a disease in a single score and are utilized in a variety of therapeutic areas. The data arising from these evaluations are inherently discrete with distinct statistical properties. This tutorial presents the framework of the item response theory (IRT) for the analysis of this data type in a pharmacometric context. The article considers both conceptual (terms and assumptions) and practical questions (modeling software, data requirements, and model building). PMID:29493119
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Development and validation of a professionalism assessment scale for medical students

PubMed Central

Klemenc-Ketis, Zalika; Vrecko, Helena

2014-01-01

Objectives To develop and validate a scale for the assess-ment of professionalism in medical students based on students' perceptions of and attitudes towards professional-ism in medicine. Methods This was a mixed methods study with under-graduate medical students. Two focus groups were carried out with 12 students, followed by a transcript analysis (grounded theory method with open coding). Then, a 3-round Delphi with 20 family medicine experts was carried out. A psychometric assessment of the scale was performed with a group of 449 students. The items of the Professional-ism Assessment Scale could be answered on a five-point Likert scale. Results After the focus groups, the first version of the PAS consisted of 56 items and after the Delphi study, 30 items remained. The final sample for quantitative study consisted of 122 students (27.2% response rate). There were 95 (77.9%) female students in the sample. The mean age of the sample was 22.1 ± 2.1 years. After the principal component analysis, we removed 8 items and produced the final version of the PAS (22 items). The Cronbach's alpha of the scale was 0.88. Factor analysis revealed three factors: empathy and humanism, professional relationships and development and responsibility. Conclusions The new Professionalism Assessment Scale proved to be valid and reliable. It can be used for the assessment of professionalism in undergraduate medical students. PMID:25382090
Confirmatory factor analysis and measurement invariance of the Child Feeding Questionnaire in low-income Hispanic and African-American mothers with preschool-age children.

PubMed

Kong, Angela; Vijayasiri, Ganga; Fitzgibbon, Marian L; Schiffer, Linda A; Campbell, Richard T

2015-07-01

Validation work of the Child Feeding Questionnaire (CFQ) in low-income minority samples suggests a need for further conceptual refinement of this instrument. Using confirmatory factor analysis, this study evaluated 5- and 6-factor models on a large sample of African-American and Hispanic mothers with preschool-age children (n = 962). The 5-factor model included: 'perceived responsibility', 'concern about child's weight', 'restriction', 'pressure to eat', and 'monitoring' and the 6-factor model also tested 'food as a reward'. Multi-group analysis assessed measurement invariance by race/ethnicity. In the 5-factor model, two low-loading items from 'restriction' and one low-variance item from 'perceived responsibility' were dropped to achieve fit. Only removal of the low-variance item was needed to achieve fit in the 6-factor model. Invariance analyses demonstrated differences in factor loadings. This finding suggests African-American and Hispanic mothers may vary in their interpretation of some CFQ items and use of cognitive interviews could enhance item interpretation. Our results also demonstrated that 'food as a reward' is a plausible construct among a low-income minority sample and adds to the evidence that this factor resonates conceptually with parents of preschoolers; however, further testing is needed to determine the validity of this factor with older age groups. Copyright © 2015 Elsevier Ltd. All rights reserved.
Overview and current management of computerized adaptive testing in licensing/certification examinations.

PubMed

Seo, Dong Gi

2017-01-01

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees' ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.
Overview and current management of computerized adaptive testing in licensing/certification examinations

PubMed Central

2017-01-01

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations. PMID:28811394
Trends in public perceptions and preferences on energy and environmental policy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farhar, B.C.

1993-02-01

This report presents selected results from a secondary analysis of public opinion surveys, taken at the national and state/local levels, relevant to energy and environmental policy choices. The data base used in the analysis includes about 2000 items from nearly 600 separate surveys conducted between 1979 and 1992. Answers to word-for-word questions were traced over time, permitting trend analysis. Patterns of response were also identified for findings from similarly worded survey items. The analysis identifies changes in public opinion concerning energy during the past 10 to 15 years.
Students' proficiency scores within multitrait item response theory

NASA Astrophysics Data System (ADS)

Scott, Terry F.; Schumayer, Daniel

2015-12-01

In this paper we present a series of item response models of data collected using the Force Concept Inventory. The Force Concept Inventory (FCI) was designed to poll the Newtonian conception of force viewed as a multidimensional concept, that is, as a complex of distinguishable conceptual dimensions. Several previous studies have developed single-trait item response models of FCI data; however, we feel that multidimensional models are also appropriate given the explicitly multidimensional design of the inventory. The models employed in the research reported here vary in both the number of fitting parameters and the number of underlying latent traits assumed. We calculate several model information statistics to ensure adequate model fit and to determine which of the models provides the optimal balance of information and parsimony. Our analysis indicates that all item response models tested, from the single-trait Rasch model through to a model with ten latent traits, satisfy the standard requirements of fit. However, analysis of model information criteria indicates that the five-trait model is optimal. We note that an earlier factor analysis of the same FCI data also led to a five-factor model. Furthermore the factors in our previous study and the traits identified in the current work match each other well. The optimal five-trait model assigns proficiency scores to all respondents for each of the five traits. We construct a correlation matrix between the proficiencies in each of these traits. This correlation matrix shows strong correlations between some proficiencies, and strong anticorrelations between others. We present an interpretation of this correlation matrix.
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children

PubMed Central

2012-01-01

Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

PubMed

Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-04-11

The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Rasch analysis of the Mini-Mental Adjustment to Cancer Scale (mini-MAC) among a heterogeneous sample of long-term cancer survivors: A cross-sectional study

PubMed Central

2012-01-01

Background The mini-Mental Adjustment to Cancer Scale (mini-MAC) is a well-recognised, popular measure of coping in psycho-oncology and assesses five cancer-specific coping strategies. It has been suggested that these five subscales could be grouped to form the over-arching adaptive and maladptive coping subscales to facilitate the interpretation and clinical application of the scale. Despite the popularity of the mini-MAC, few studies have examined its psychometric properties among long-term cancer survivors, and further validation of the mini-MAC is needed to substantiate its use with the growing population of survivors. Therefore, this study examined the psychometric properties and dimensionality of the mini-MAC in a sample of long-term cancer survivors using Rasch analysis. Methods RUMM 2030 was used to analyse the mini-MAC data (n=851). Separate Rasch analyses were conducted for each of the original mini-MAC subscales as well as the over-arching adaptive and maladaptive coping subscales to examine summary and individual model fit statistics, person separation index (PSI), response format, local dependency, targeting, item bias (or differential item functioning -DIF), and dimensionality. Results For the fighting spirit, fatalism, and helplessness-hopelessness subscales, a revised three-point response format seemed more optimal than the original four-point response. To achieve model fit, items were deleted from four of the five subscales – Anxious Preoccupation items 7, 25, and 29; Cognitive Avoidance items 11 and 17; Fighting Spirit item 18; and Helplessness-Hopelessness items 16 and 20. For those subscales with sufficient items, analyses supported unidimensionality. Combining items to form the adaptive and maladaptive subscales was partially supported. Conclusions The original five subscales required item deletion and/or rescaling to improve goodness of fit to the Rasch model. While evidence was found for overarching subscales of adaptive and maladaptive coping, extensive modifications were necessary to achieve this result. Further exploration and validation of over-arching subscales assessing adaptive and maladaptive coping is necessary with cancer survivors. PMID:22607052
Dyadic confirmatory factor analysis of the inflammatory bowel disease family responsibility questionnaire.

PubMed

Greenley, Rachel Neff; Reed-Knight, Bonney; Blount, Ronald L; Wilson, Helen W

2013-09-01

Evaluate the factor structure of youth and maternal involvement ratings on the Inflammatory Bowel Disease Family Responsibility Questionnaire, a measure of family allocation of condition management responsibilities in pediatric inflammatory bowel disease. Participants included 251 youth aged 11-18 years with inflammatory bowel disease and their mothers. Item-level descriptive analyses, subscale internal consistency estimates, and confirmatory factor analyses of youth and maternal involvement were conducted using a dyadic data-analytic approach. Results supported the validity of 4 conceptually derived subscales including general health maintenance, social aspects, condition management tasks, and nutrition domains. Additionally, results indicated adequate support for the factor structure of a 21-item youth involvement measure and strong support for a 16-item maternal involvement measure. Additional empirical support for the validity of the Inflammatory Bowel Disease Family Responsibility Questionnaire was provided. Future research to replicate current findings and to examine the measure's clinical utility is warranted.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.

PubMed

Gibbons, C J; Skevington, S M

2018-04-01

Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
75 FR 57742 - Notice of Submission for OMB Review

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-22

... diminish response ambiguity, to simplify data entry and analysis, and to delete two items that were no...-friendly and diminish response ambiguity. Requests for copies of the information collection submission for...
Response Mixture Modeling: Accounting for Heterogeneity in Item Characteristics across Response Times.

PubMed

Molenaar, Dylan; de Boeck, Paul

2018-06-01

In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
Computerized Adaptive Testing Provides Reliable and Efficient Depression Measurement Using the CES-D Scale

PubMed Central

2017-01-01

Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Measuring the effects of online health information for patients: Item generation for an e-health impact questionnaire

PubMed Central

Kelly, Laura; Jenkinson, Crispin; Ziebland, Sue

2013-01-01

Objective The internet is a valuable resource for accessing health information and support. We are developing an instrument to assess the effects of websites with experiential and factual health information. This study aimed to inform an item pool for the proposed questionnaire. Methods Items were informed through a review of relevant literature and secondary qualitative analysis of 99 narrative interviews relating to patient and carer experiences of health. Statements relating to identified themes were re-cast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n = 21) were used to assess items for face and content validity. Results Eighty-two generic items were identified following secondary qualitative analysis and expert review. Cognitive interviewing confirmed the questionnaire instructions, 62 items and the response options were acceptable to patients and carers. Conclusion Using a clear conceptual basis to inform item generation, 62 items have been identified as suitable to undergo further psychometric testing. Practice implications The final questionnaire will initially be used in a randomized controlled trial examining the effects of online patient's experiences. This will inform recommendations on the best way to present patients’ experiences within health information websites. PMID:23598293
Rasch analysis resulted in an improved Norwegian version of the Pain Attitudes and Beliefs Scale (PABS).

PubMed

Eland, Nicolaas D; Kvåle, Alice; Ostelo, Raymond W J G; Strand, Liv Inger

2016-10-01

There is evidence that clinicians' pain attitudes and beliefs are associated with the pain beliefs and illness perceptions of their patients and furthermore influence their recommendations for activity and work to patients with back pain. The Pain Attitudes and Beliefs Scale (PABS) is a questionnaire designed to differentiate between biomedical and biopsychosocial pain attitudes among health care providers regarding common low back pain. The original version had 36 items, and several shorter versions have been developed. Concern has been raised over the PABS' internal construct validity because of low internal consistency and low explained variance. The aim of this study was to examine and improve the scale's measurement properties and item performance. A convenience sample of 667 Norwegian physiotherapists provided data for Rasch analysis. The biomedical and biopsychosocial subscales of the PABS were examined for unidimensionality, local response independency, invariance, response category function and targeting of persons and items. Reliability was measured with the person separation index (PSI). Items originally excluded by the developers of the scale because of skewness were re-introduced in a second analysis. Our analysis suggested that both subscales required removal of several psychometrically redundant and misfitting items to satisfy the requirements of the Rasch measurement model. Most biopsychosocial items needed revision of their scoring structure. Furthermore, we identified two items originally excluded because of skewness that improved the reliability of the subscales after re-introduction. The ultimate result was two strictly unidimensional subscales, each consisting of seven items, with invariant item ordering and free from any form of misfit. The unidimensionality implies that summation of items to valid total scores is justified. Transformation tables are provided to convert raw ordinal scores to unbiased interval-level scores. Both subscales were adequately targeted at the ability level of our physiotherapist population. Reliability of the biomedical subscale as measured with the PSI was 0.69. A low PSI of 0.64 for the biopsychosocial subscale indicated limitations with regard to its discriminative ability. Rasch analysis produced an improved Norwegian version of the PABS which represents true (fundamental) measurement of clinicians' biomedical and biopsychosocial treatment orientation. However, researchers should be aware of the low discriminative ability of the biopsychosocial subscale when analyzing differences and effect changes. The study presents a revised PABS that provides interval-level measurement of clinicians' pain beliefs. The revision allows for confident use of parametric statistical analysis. Further examination of discriminative validity is required. Copyright Â© 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Effect of the framing of questionnaire items regarding satisfaction with training on residents' responses.

PubMed

Guyatt, G H; Cook, D J; King, D; Norman, G R; Kane, S L; van Ineveld, C

1999-02-01

To determine whether framing questions positively or negatively influences residents' apparent satisfaction with their training. In 1993-94, 276 residents at five Canadian internal medicine residency programs responded to 53 Likert-scale items designed to determine sources of the residents' satisfaction and stress. Two versions of the questionnaire were randomly distributed: one in which half the items were stated positively and the other half negatively, the other version in which the items were stated in the opposite way. The residents scored 43 of the 53 items higher when stated positively and scored ten higher when stated negatively (p < .0001). When analyzed using an analysis-of-variance model, the effect of positive versus negative framing was highly significant (F = 129.81, p < .0001). While the interaction between item and framing was also significant, the effect was much less strong (F = 5.56, p < .0001). On a scale where 1 represented the lowest possible level of satisfaction and 7 the highest, the mean score of the positively stated items was 4.1 and that of the negatively stated items, 3.8, an effect of 0.3. These results suggest a significant "response acquiescence bias." To minimize this bias, questionnaires assessing attitudes toward educational programs should include a mix of positively and negatively stated items.

Validation of a condition-specific measure for women having an abnormal screening mammography.

PubMed

Brodersen, John; Thorsen, Hanne; Kreiner, Svend

2007-01-01

The aim of this study is to assess the validity of a new condition-specific instrument measuring psychosocial consequences of abnormal screening mammography (PCQ-DK33). The draft version of the PCQ-DK33 was completed on two occasions by 184 women who had received an abnormal screening mammography and on one occasion by 240 women who had received a normal screening result. Item Response Theories and Classical Test Theories were used to analyze data. Construct validity, concurrent validity, known group validity, objectivity and reliability were established by item analysis examining the fit between item responses and Rasch models. Six dimensions covering anxiety, behavioral impact, sense of dejection, impact on sleep, breast examination, and sexuality were identified. One item belonging to the dejection dimension had uniform differential item functioning. Two items not fitting the Rasch models were retained because of high face validity. A sick leave item added useful information when measuring side effects and socioeconomic consequences of breast cancer screening. Five "poor items" were identified and should be deleted from the final instrument. Preliminary evidence for a valid and reliable condition-specific measure for women having an abnormal screening mammography was established. The measure includes 27 "good" items measuring different attributes of the same overall latent structure-the psychosocial consequences of abnormal screening mammography.
Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

PubMed

Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

2018-02-01

Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
An Item Response Analysis of the Motor and Behavioral Subscales of the Unified Huntington's Disease Rating Scale in Huntington Disease Gene Expansion Carriers

PubMed Central

Vaccarino, Anthony L.; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K.; Orth, Michael; Paulsen, Jane S.; Sills, Terrence; van Kammen, Daniel P.; Evans, Kenneth R.

2011-01-01

Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (i.e., prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a non-parametric item response analysis was performed on retrospective data from two multicentre, longitudinal studies. Motor and Behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent saccade velocity, dysarthia, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. PMID:21370269
An item response analysis of the motor and behavioral subscales of the unified Huntington's disease rating scale in huntington disease gene expansion carriers.

PubMed

Vaccarino, Anthony L; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K; Orth, Michael; Paulsen, Jane S; Sills, Terrence; van Kammen, Daniel P; Evans, Kenneth R

2011-04-01

Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (ie, prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a nonparametric item response analysis was performed on retrospective data from 2 multicenter longitudinal studies. Motor and behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low, and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent, saccade velocity, dysarthria, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait, and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. Copyright © 2011 Movement Disorder Society.
A Comparison of Three Test Formats to Assess Word Difficulty

ERIC Educational Resources Information Center

Culligan, Brent

2015-01-01

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
Guessing and the Rasch Model

ERIC Educational Resources Information Center

Holster, Trevor A.; Lake, J.

2016-01-01

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Systems Analysis Directorate Activities Summary August 1977

DTIC Science & Technology

1977-09-01

are: x a. Cataloging direction b. Requirements computation c. Procurement direction d. Distribution management e. Disposal direction f...34inventory management," as a responsibility of NICP’s, includes cataloging, requirements computation, procurement direction, distribution management , maintenance...functions are cataloging, major item management, secondary item management, procurement direction, distribution management , overhaul and rebuild
A Substantive Process Analysis of Responses to Items from the Multistate Bar Examination

ERIC Educational Resources Information Center

Bonner, Sarah M.; D'Agostino, Jerome V.

2012-01-01

We investigated examinees' cognitive processes while they solved selected items from the Multistate Bar Exam (MBE), a high-stakes professional certification examination. We focused on ascertaining those mental processes most frequently used by examinees, and the most common types of errors in their thinking. We compared the relationships between…
Outlier Detection in High-Stakes Certification Testing. Research Report.

ERIC Educational Resources Information Center

Meijer, Rob R.

Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
Improving the evaluation of therapeutic interventions in multiple sclerosis: the role of new psychometric methods.

PubMed

Hobart, J; Cano, S

2009-02-01

In this monograph we examine the added value of new psychometric methods (Rasch measurement and Item Response Theory) over traditional psychometric approaches by comparing and contrasting their psychometric evaluations of existing sets of rating scale data. We have concentrated on Rasch measurement rather than Item Response Theory because we believe that it is the more advantageous method for health measurement from a conceptual, theoretical and practical perspective. Our intention is to provide an authoritative document that describes the principles of Rasch measurement and the practice of Rasch analysis in a clear, detailed, non-technical form that is accurate and accessible to clinicians and researchers in health measurement. A comparison was undertaken of traditional and new psychometric methods in five large sets of rating scale data: (1) evaluation of the Rivermead Mobility Index (RMI) in data from 666 participants in the Cannabis in Multiple Sclerosis (CAMS) study; (2) evaluation of the Multiple Sclerosis Impact Scale (MSIS-29) in data from 1725 people with multiple sclerosis; (3) evaluation of test-retest reliability of MSIS-29 in data from 150 people with multiple sclerosis; (4) examination of the use of Rasch analysis to equate scales purporting to measure the same health construct in 585 people with multiple sclerosis; and (5) comparison of relative responsiveness of the Barthel Index and Functional Independence Measure in data from 1400 people undergoing neurorehabilitation. Both Rasch measurement and Item Response Theory are conceptually and theoretically superior to traditional psychometric methods. Findings from each of the five studies show that Rasch analysis is empirically superior to traditional psychometric methods for evaluating rating scales, developing rating scales, analysing rating scale data, understanding and measuring stability and change, and understanding the health constructs we seek to quantify. There is considerable added value in using Rasch analysis rather than traditional psychometric methods in health measurement. Future research directions include the need to reproduce our findings in a range of clinical populations, detailed head-to-head comparisons of Rasch analysis and Item Response Theory, and the application of Rasch analysis to clinical practice.
Getting the story straight: evaluating the test-retest reliability of a university health history questionnaire.

PubMed

Gilkison, C R; Fenton, M V; Lester, J W

1992-05-01

This study was designed to establish the reliability of a health history questionnaire used as a screening tool for incoming university students. The authors used a test-retest design, with a test interval of 6 months, on a sample of medical and nursing students. The analysis focused on overall reliability of the questionnaire and reproducibility of specific items, based on question format. Questionnaire items of specific interest were those with dichotomous yes/no response options versus open-ended format questions, those using the words frequently or recently, or those that asked multiple questions. Demographic characteristics of the subjects were considered in the evaluation of reliability. Overall reliability of the questionnaire (93.6%) was above the anticipated level of 90%, and subject sex or program of study did not show any significant differences in reproducibility of responses. Although wording of questions did not affect item reliability, dichotomous format questions demonstrated a higher degree of reliability (96.4%) than the overall reliability of the questionnaire. Recommendations for enhancing the reliability of the questionnaire are based on item analysis and information gathered from interviews with subjects.
Development and evaluation of a thermochemistry concept inventory for college-level general chemistry

NASA Astrophysics Data System (ADS)

Wren, David A.

The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
Comparing the Personality Disorder Interview for DSM-IV (PDI-IV) and SCID-II borderline personality disorder scales: an item-response theory analysis.

PubMed

Huprich, Steven K; Paggeot, Amy V; Samuel, Douglas B

2015-01-01

One-hundred sixty-nine psychiatric outpatients and 171 undergraduate students were assessed with the Personality Disorder Interview-IV (PDI-IV; Widiger, Mangine, Corbitt, Ellis, & Thomas, 1995) and the Structured Clinical Interview for DSM-IV Axis II disorders (SCID-II; First, Gibbon, Spitzer, Williams, & Benjamin, 1997) for borderline personality disorder (BPD). Eighty individuals met PDI-IV BPD criteria, whereas 34 met SCID-II BPD criteria. Dimensional ratings of both measures were highly intercorrelated (rs = .78, .75), and item-level interrater reliability fell in the good to excellent range. An item-response theory analysis was performed to investigate whether properties of the items from each interview could help understand these differences. The limited agreement seemed to be explained by differences in the response options across the two interviews. We found that suicidal behavior was among the most discriminating criteria on both instruments, whereas dissociation and difficulty controlling anger had the 2 lowest alpha parameter values. Finally, those meeting BPD criteria on both interviews had higher levels of anxiety, depression, and more impairments in object relations than those meeting criteria on just the PDI-IV. These findings suggest that the choice of measure has a notable effect on the obtained diagnostic prevalence and the level of BPD severity that is detected.
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Classical test theory and Rasch analysis validation of the Upper Limb Functional Index in subjects with upper limb musculoskeletal disorders.

PubMed

Bravini, Elisabetta; Franchignoni, Franco; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano; Foti, Calogero

2015-01-01

To perform a comprehensive analysis of the psychometric properties and dimensionality of the Upper Limb Functional Index (ULFI) using both classical test theory and Rasch analysis (RA). Prospective, single-group observational design. Freestanding rehabilitation center. Convenience sample of Italian-speaking subjects with upper limb musculoskeletal disorders (N=174). Not applicable. The Italian version of the ULFI. Data were analyzed using parallel analysis, exploratory factor analysis, and RA for evaluating dimensionality, functioning of rating scale categories, item fit, hierarchy of item difficulties, and reliability indices. Parallel analysis revealed 2 factors explaining 32.5% and 10.7% of the response variance. RA confirmed the failure of the unidimensionality assumption, and 6 items out of the 25 misfitted the Rasch model. When the analysis was rerun excluding the misfitting items, the scale showed acceptable fit values, loading meaningfully to a single factor. Item separation reliability and person separation reliability were .98 and .89, respectively. Cronbach alpha was .92. RA revealed weakness of the scale concerning dimensionality and internal construct validity. However, a set of 19 ULFI items defined through the statistical process demonstrated a unidimensional structure, good psychometric properties, and clinical meaningfulness. These findings represent a useful starting point for further analyses of the tool (based on modern psychometric approaches and confirmatory factor analysis) in larger samples, including different patient populations and nationalities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Analysis of Validity and Reliability of the Health Literacy Index for Female Marriage Immigrants (HLI-FMI).

PubMed

Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok

2016-05-01

The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
Assessing quality of maternity care in Hungary: expert validation and testing of the mother-centered prenatal care (MCPC) survey instrument.

PubMed

Rubashkin, Nicholas; Szebik, Imre; Baji, Petra; Szántó, Zsuzsa; Susánszky, Éva; Vedam, Saraswathi

2017-11-16

Instruments to assess quality of maternity care in Central and Eastern European (CEE) region are scarce, despite reports of poor doctor-patient communication, non-evidence-based care, and informal cash payments. We validated and tested an online questionnaire to study maternity care experiences among Hungarian women. Following literature review, we collated validated items and scales from two previous English-language surveys and adapted them to the Hungarian context. An expert panel assessed items for clarity and relevance on a 4-point ordinal scale. We calculated item-level Content Validation Index (CVI) scores. We designed 9 new items concerning informal cash payments, as well as 7 new "model of care" categories based on mode of payment. The final questionnaire (N = 111 items) was tested in two samples of Hungarian women, representative (N = 600) and convenience (N = 657). We conducted bivariate analysis and thematic analysis of open-ended responses. Experts rated pre-existing English-language items as clear and relevant to Hungarian women's maternity care experiences with an average CVI for included questions of 0.97. Significant differences emerged across the model of care categories in terms of informal payments, informed consent practices, and women's perceptions of autonomy. Thematic analysis (N = 1015) of women's responses identified 13 priority areas of the maternity care experience, 9 of which were addressed by the questionnaire. We developed and validated a comprehensive questionnaire that can be used to evaluate respectful maternity care, evidence-based practice, and informal cash payments in CEE region and beyond.
Development of a tool to assess adherence to a model of the division of responsibility in feeding young children: using response mapping to capacitate validation measures.

PubMed

Lohse, Barbara; Satter, Ellyn; Arnold, Kristen

2014-04-01

Accurate early assessment and targeted intervention with problematic parent/child feeding dynamics is critical for the prevention and treatment of child obesity. The division of responsibility in feeding (sDOR), articulated by the Satter Feeding Dynamics Model (fdSatter), has been demonstrated clinically as an effective approach to reduce child feeding problems, including those leading to obesity. Lack of a tested instrument to examine adherence to fdSatter stimulated initial construction of the Satter Feeding Dynamics Inventory (fdSI). The aim of this project was to refine the item pool to establish translational validity, making the fdSI suitable for advanced psychometric analysis. Cognitive interviews (n = 80) with caregivers of varied socioeconomic strata informed revisions that demonstrated face and content validity. fdSI responses were mapped to interviews using an iterative, multi-phase thematic approach to provide an instrument ready for construct validation. fdSI development required five interview phases over 32 months: Foundational; Refinement; Transitional; Assurance; and Launching. Each phase was associated with item reduction and revision. Thirteen items were removed from the 38-item Foundational phase and seven were revised in the Refinement phase. Revisions, deletions, and additions prompted by Transitional and Assurance phase interviews resulted in the 15-item Launching phase fdSI. Only one Foundational phase item was carried through all development phases, emphasizing the need to test for item comprehension and interpretation before psychometric analyses. Psychometric studies of item pools without encrypted meanings will facilitate progress toward a tool that accurately detects adherence to sDOR. Ability to measure sDOR will facilitate focus on feeding behaviors associated with reduced risk of childhood obesity.
Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

PubMed

Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

2015-01-01

To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.

The Nursing Home Physical Performance Test: A Secondary Data Analysis of Women in Long-Term Care Using Item Response Theory.

PubMed

Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L

2017-04-11

The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare.

PubMed

Squires, Janet E; Estabrooks, Carole A; Newburn-Cook, Christine V; Gierl, Mark

2011-05-19

There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale). We used the Standards for Educational and Psychological Testing as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression. Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression. The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change.
Validation of a Health Literacy Measure for Adolescents and Young Adults Diagnosed with Cancer.

PubMed

McDonald, Fiona E J; Patterson, Pandora; Costa, Daniel S J; Shepherd, Heather L

2016-03-01

Health literacy can influence long-term health outcomes. This study aimed to validate an adapted version of the Functional, Communicative and Critical Health Literacy measure for adolescent and young adult (AYA) cancer patients and survivors (N = 105; age 12-24 years). Exploratory factor analysis was used to validate the measure, and indicated that a slightly modified item structure better fit the results. Furthermore, item response theory analysis highlighted location and discrimination parameter differences among items. Acceptability of the measure was high. This is the first validation of a health literacy measure among AYAs with an illness such as cancer.
The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

PubMed

Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

2017-04-01

Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Measuring grief and loss after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Grief and Loss item bank and short form

PubMed Central

Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.

2015-01-01

Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
75 FR 42725 - Notice of Proposed Information Collection Requests

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-22

... reduce grantee work burden and to diminish response ambiguity, to simplify data entry and analysis, and... items to make the form more user- friendly and diminish response ambiguity. Requests for copies of the...
Measuring the quality of life in hypertension according to Item Response Theory.

PubMed

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; Andrade, Dalton Francisco de; Barbetta, Pedro Alberto; Souza, Ana Célia Caetano de; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-05-04

To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL - Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. Analisar o Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL) por meio da Teoria da Resposta ao Item. Estudo analítico realizado com 712 pessoas com hipertensão arterial atendidas em 13 unidades de atenção primária em saúde de Fortaleza, CE, em 2015. As etapas da análise pela Teoria da Resposta ao Item foram: avaliação da dimensionalidade, estimação dos parâmetros dos itens e construção da escala. O estudo da dimensionalidade foi realizado sobre a matriz de correlação policórica e análise fatorial confirmatória. Para a estimação dos parâmetros dos itens, foi utilizado o Modelo de Resposta Gradual de Samejima. As análises foram conduzidas no software livre R com o auxílio dos pacotes psych e mirt. A análise permitiu a visualização dos parâmetros dos itens e suas contribuições individuais na mensuração do traço latente, gerando mais informação, permitindo a construção de uma escala com um modelo interpretativo que demonstra a evolução da piora da qualidade de vida em cinco níveis. Quanto aos parâmetros dos itens, houve bom desempenho daqueles referentes ao estado somático, pois apresentaram melhor poder de discriminar os indivíduos com pior qualidade de vida. Os itens relacionados ao estado mental foram os que contribuíram com menor quantidade de informação psicométrica no MINICHAL. Conclui-se que o instrumento é indicado para a identificação da deterioração da qualidade de vida em hipertensão arterial. A análise do MINICHAL pela Teoria da Resposta ao Item permitiu identificar novas facetas desse instrumento ainda não abordadas em estudos anteriores.
Refining the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) item candidates: interpretation of a self-reported outcome measure of functional performance by young people with neurodevelopmental disabilities.

PubMed

Kramer, Jessica M; Schwartz, Ariel

2017-10-01

This study examined the item interpretability and rating scale use of the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) by young people with developmental disabilities. The PEDI-PRO assesses the functional performance of discrete functional tasks in the context of everyday life situations. A two-phase cognitive interview design was implemented with a convenience sample of 37 young people (mean age 19y, SD 2y 5mo; 13 males and 24 females; 68% with intellectual disability) with developmental disabilities. In phase I, 182 item candidates were each reviewed by an average of four young people. In phase II, 103 items were carried forward or revised and each reviewed by an average of seven additional young people. Two raters coded responses for intended item interpretation and performance quality; codes were analysed using descriptive statistics. Qualitative analysis explored young people's self-evaluation process. Items were interpreted as intended by most young people (mean 86%). Young people can use PEDI-PRO response categories appropriately to describe their performance: 94% of positive performance descriptions coincided with a positive response category choice; 73% of negative descriptions coincided with a negative response category choice. Young people interpreted items in a literal manner, and their self-evaluation incorporated the use of supports that facilitate functional performance. The PEDI-PRO's measurement framework appears to support the self-evaluation of functional performance of young people with developmental disabilities. © 2017 Mac Keith Press.
Factor Analysis and Item Reduction of the Banff Patella Instability Instrument (BPII): Introduction of BPII 2.0.

PubMed

Lafave, Mark R; Hiemstra, Laurie; Kerslake, Sarah

2016-08-01

Clinical management of patellofemoral (PF) instability is a challenge, particularly considering the number of variables that should be taken into consideration for treatment. Quality of life is an important measure to consider with this patient population. To factor analyze and reduce the total number of items in the Banff Patella Instability Instrument (BPII). Subsequent to the factor analysis, the new, item-reduced BPII 2.0 was tested for validity, reliability, and responsiveness. Cohort study (diagnosis); Level of evidence, 2. Quality of life was measured for PF instability patients (N = 223) through use of the original BPII at their initial consultation. Data from the BPII scores were used in a principal components analysis (PCA) to factor analyze and reduce the total number of items in the original BPII, to create a revised BPII 2.0. The BPII 2.0 underwent content validation (Cronbach alpha, patient interviews, and grade-level checking), construct validation (analysis of variance comparing the initial visit and the 6-, 12-, and 24-month postoperative visits, eta-square), convergent validation (Pearson r correlation to the original BPII), responsiveness testing (eta-square, anchor-based distribution testing), and reliability testing (intraclass correlation coefficient [ICC]). The BPII was successfully reduced from 32 to 23 items with excellent Cronbach alpha values in the new BPII 2.0: initial visit = 0.91; 6-month postoperative visit = 0.96; 12-month postoperative visit = 0.97; and 24-month postoperative visit = 0.76. Grade-level reading for all items was assessed as below grade 12. The BPII 2.0 was able to discriminate between all time periods with significant differences between groups (P < .05). Eta-square was 0.40, demonstrating a medium to large effect size. The BPII significantly correlated with the BPII 2.0 (0.82, 0.90, 0.90, and 0.94 at the initial visit and 6-, 12-, and 24-month postoperative visits, respectively), providing evidence of convergent validity. A significant correlation was found between the 7-point scale and 24-month postoperative BPII 2.0 scores, a sign of anchor-based responsiveness. ICC (2,k) was 0.97, indicating strong reliability. The BPII 2.0 is valid, reliable, and responsive for assessment of patients with PF instability, both surgically and nonsurgically treated. © 2016 The Author(s).
Assessment of health surveys: fitting a multidimensional graded response model.

PubMed

Depaoli, Sarah; Tiemensma, Jitske; Felt, John M

The multidimensional graded response model, an item response theory (IRT) model, can be used to improve the assessment of surveys, even when sample sizes are restricted. Typically, health-based survey development utilizes classical statistical techniques (e.g. reliability and factor analysis). In a review of four prominent journals within the field of Health Psychology, we found that IRT-based models were used in less than 10% of the studies examining scale development or assessment. However, implementing IRT-based methods can provide more details about individual survey items, which is useful when determining the final item content of surveys. An example using a quality of life survey for Cushing's syndrome (CushingQoL) highlights the main components for implementing the multidimensional graded response model. Patients with Cushing's syndrome (n = 397) completed the CushingQoL. Results from the multidimensional graded response model supported a 2-subscale scoring process for the survey. All items were deemed as worthy contributors to the survey. The graded response model can accommodate unidimensional or multidimensional scales, be used with relatively lower sample sizes, and is implemented in free software (example code provided in online Appendix). Use of this model can help to improve the quality of health-based scales being developed within the Health Sciences.
Shortening of an existing generic online health-related quality of life instrument for dogs.

PubMed

Reid, J; Wiseman-Orr, L; Scott, M

2017-10-11

Development, initial validation and reliability testing of a shortened version of a web-based questionnaire instrument to measure generic health-related quality of life in companion dogs, to facilitate smartphone and online use. The original 46 items were reduced using expert judgment and factor analysis. Items were removed on the basis of item loadings and communalities on factors identified through factor analysis of responses from owners of healthy and unwell dogs, intrafactor item correlations, readability of items in the UK, USA and Australia and ability of individual items to discriminate between healthy and unwell dogs. Validity was assessed through factor analysis and a field trial using a "known groups" approach. Test-retest reliability was assessed using intraclass correlation coefficients. The new instrument comprises 22 items, each of which was rated by dog owners using a 7-point Likert scale. Factor analysis revealed a structure with four health-related quality of life domains (energetic/enthusiastic, happy/content, active/comfortable, and calm/relaxed) accounting for 72% of the variability in the data compared with 64% for the original instrument. The field test involving 153 healthy and unwell dogs demonstrated good discriminative properties and high intraclass correlation coefficients. The 22-item shortened form is superior to the original instrument and can be accessed via a mobile phone app. This is likely to increase the acceptability to dog owners as a routine wellness measure in health care packages and as a therapeutic monitoring tool. © 2017 British Small Animal Veterinary Association.
The Social Physique Anxiety Scale: an example of the potential consequence of negatively worded items in factorial validity studies.

PubMed

Motl, R W; Conroy, D E; Horan, P M

2000-01-01

Social physique anxiety (SPA) based on Hart, Leary, and Rejeski's (1989) Social Physique Anxiety Scale (SPAS) was originally conceptualized to be a unidimensional construct. Empirical evidence on the factorial validity of the SPAS has been contradictory, yielding both one- and two-factor models. The two-factor model, which consists of separate factors associated with positively and negatively worded items, has stimulated an ongoing debate about the dimensionality and content of the SPAS. The present study employed confirmatory factor analysis (CFA) to examine whether the two-factor solution to the 12-item SPAS was substantively meaningful or a methodological artifact. Results of the CFAs, which were performed on responses from four different samples (Eklund, Kelley, and Wilson, 1997; Eklund, Mack, and Hart, 1996), supported the existence of a single substantive SPA factor underlying responses to the 12-item SPAS. There were, in addition, method effects associated with the negatively worded items that could be modeled to achieve good fit. Therefore, it was concluded that a single substantive factor and a non-substantive method effect primarily related to the negatively worded items best represented the 12-item SPAS.
Measurement properties of the WOMAC LK 3.1 pain scale.

PubMed

Stratford, P W; Kennedy, D M; Woodhouse, L J; Spadoni, G F

2007-03-01

The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is applied extensively to patients with osteoarthritis of the hip or knee. Previous work has challenged the validity of its physical function scale however an extensive evaluation of its pain scale has not been reported. Our purpose was to estimate internal consistency, factorial validity, test-retest reliability, and the standard error of measurement (SEM) of the WOMAC LK 3.1 pain scale. Four hundred and seventy-four patients with osteoarthritis of the hip or knee awaiting arthroplasty were administered the WOMAC. Estimates of internal consistency (coefficient alpha), factorial validity (confirmatory factor analysis), and the SEM based on internal consistency (SEM(IC)) were obtained. Test-retest reliability [Type 2,1 intraclass correlation coefficients (ICC)] and a corresponding SEM(TRT) were estimated on a subsample of 36 patients. Our estimates were: internal consistency alpha=0.84; SEM(IC)=1.48; Type 2,1 ICC=0.77; SEM(TRT)=1.69. Confirmatory factor analysis failed to support a single factor structure of the pain scale with uncorrelated error terms. Two comparable models provided excellent fit: (1) a model with correlated error terms between the walking and stairs items, and between night and sit items (chi2=0.18, P=0.98); (2) a two factor model with walking and stairs items loading on one factor, night and sit items loading on a second factor, and the standing item loading on both factors (chi2=0.18, P=0.98). Our examination of the factorial structure of the WOMAC pain scale failed to support a single factor and internal consistency analysis yielded a coefficient less than optimal for individual patient use. An alternate strategy to summing the five-item responses when considering individual patient application would be to interpret item responses separately or to sum only those items which display homogeneity.
Validating the European Health Literacy Survey Questionnaire in people with type 2 diabetes: Latent trait analyses applying multidimensional Rasch modelling and confirmatory factor analysis.

PubMed

Finbråten, Hanne Søberg; Pettersen, Kjell Sverre; Wilde-Larsson, Bodil; Nordström, Gun; Trollvik, Anne; Guttersrud, Øystein

2017-11-01

To validate the European Health Literacy Survey Questionnaire (HLS-EU-Q47) in people with type 2 diabetes mellitus. The HLS-EU-Q47 latent variable is outlined in a framework with four cognitive domains integrated in three health domains, implying 12 theoretically defined subscales. Valid and reliable health literacy measurers are crucial to effectively adapt health communication and education to individuals and groups of patients. Cross-sectional study applying confirmatory latent trait analyses. Using a paper-and-pencil self-administered approach, 388 adults responded in March 2015. The data were analysed using the Rasch methodology and confirmatory factor analysis. Response violation (response dependency) and trait violation (multidimensionality) of local independence were identified. Fitting the "multidimensional random coefficients multinomial logit" model, 1-, 3- and 12-dimensional Rasch models were applied and compared. Poor model fit and differential item functioning were present in some items, and several subscales suffered from poor targeting and low reliability. Despite multidimensional data, we did not observe any unordered response categories. Interpreting the domains as distinct but related latent dimensions, the data fit a 12-dimensional Rasch model and a 12-factor confirmatory factor model best. Therefore, the analyses did not support the estimation of one overall "health literacy score." To support the plausibility of claims based on the HLS-EU score(s), we suggest: removing the health care aspect to reduce the magnitude of multidimensionality; rejecting redundant items to avoid response dependency; adding "harder" items and applying a six-point rating scale to improve subscale targeting and reliability; and revising items to improve model fit and avoid bias owing to person factors. © 2017 John Wiley & Sons Ltd.
Exploring the measurement properties of the osteopathy clinical teaching questionnaire using Rasch analysis.

PubMed

Vaughan, Brett

2018-01-01

Clinical teaching evaluations are common in health profession education programs to ensure students are receiving a quality clinical education experience. Questionnaires students use to evaluate their clinical teachers have been developed in professions such as medicine and nursing. The development of a questionnaire that is specifically for the osteopathy on-campus, student-led clinic environment is warranted. Previous work developed the 30-item Osteopathy Clinical Teaching Questionnaire. The current study utilised Rasch analysis to investigate the construct validity of the Osteopathy Clinical Teaching Questionnaire and provide evidence for the validity argument through fit to the Rasch model. Senior osteopathy students at four institutions in Australia, New Zealand and the United Kingdom rated their clinical teachers using the Osteopathy Clinical Teaching Questionnaire. Three hundred and ninety-nine valid responses were received and the data were evaluated for fit to the Rasch model. Reliability estimations (Cronbach's alpha and McDonald's omega) were also evaluated for the final model. The initial analysis demonstrated the data did not fit the Rasch model. Accordingly, modifications to the questionnaire were made including removing items, removing person responses, and rescoring one item. The final model contained 12 items and fit to the Rasch model was adequate. Support for unidimensionality was demonstrated through both the Principal Components Analysis/t-test, and the Cronbach's alpha and McDonald's omega reliability estimates. Analysis of the questionnaire using McDonald's omega hierarchical supported a general factor (quality of clinical teaching in osteopathy). The evidence for unidimensionality and the presence of a general factor support the calculation of a total score for the questionnaire as a sufficient statistic. Further work is now required to investigate the reliability of the 12-item Osteopathy Clinical Teaching Questionnaire to provide evidence for the validity argument.
Refinements of Stout’s Procedure for Assessing Latent Trait Unidimensionality

DTIC Science & Technology

1992-08-01

in the presence of guessing when coupled with many high-discriminating items. A revision of DIMTEST is proposed to overcome this limitation. Also, an...used for factor analysis. When guessing is present in the responses to items, however, linear factor analysis of tetrachoric correlations can produce...significance when d=1 and maintaining good power when d=2, even when the correlation between the abilities is as high as .7. The present study provides a
Analysis of the psychometric properties of the Multiple Sclerosis Impact Scale-29 (MSIS-29) in relapsing–remitting multiple sclerosis using classical and modern test theory

PubMed Central

Wyrwich, KW; Phillips, GA; Vollmer, T; Guo, S

2016-01-01

Background Investigations using classical test theory support the psychometric properties of the original version of the Multiple Sclerosis Impact Scale (MSIS-29v1), a disease-specific measure of multiple sclerosis (MS) impact (physical and psychological subscales). Later, assessments of the MSIS-29v1 in an MS community-based sample using Rasch analysis led to revisions of the instrument’s response options (MSIS-29v2). Objective The objective of this paper is to evaluate the psychometric properties of the MSIS-29v1 in a clinical trial cohort of relapsing–remitting MS patients (RRMS). Methods Data from 600 patients with RRMS enrolled in the SELECT clinical trial were used. Assessments were performed at baseline and at Weeks 12, 24, and 52. In addition to traditional psychometric analyses, Item Response Theory (IRT) and Rasch analysis were used to evaluate the measurement properties of the MSIS-29v1. Results Both MSIS-29v1 subscales demonstrated strong reliability, construct validity, and responsiveness. The IRT and Rasch analysis showed overall support for response category threshold ordering, person-item fit, and item fit for both subscales. Conclusions Both MSIS-29v1 subscales demonstrated robust measurement properties using classical, IRT, and Rasch techniques. Unlike previous research using a community-based sample, the MSIS-29v1 was found to be psychometrically sound to assess physical and psychological impairments in a clinical trial sample of patients with RRMS. PMID:28607741
Analysis of the psychometric properties of the Multiple Sclerosis Impact Scale-29 (MSIS-29) in relapsing-remitting multiple sclerosis using classical and modern test theory.

PubMed

Bacci, E D; Wyrwich, K W; Phillips, G A; Vollmer, T; Guo, S

2016-01-01

Investigations using classical test theory support the psychometric properties of the original version of the Multiple Sclerosis Impact Scale (MSIS-29v1), a disease-specific measure of multiple sclerosis (MS) impact (physical and psychological subscales). Later, assessments of the MSIS-29v1 in an MS community-based sample using Rasch analysis led to revisions of the instrument's response options (MSIS-29v2). The objective of this paper is to evaluate the psychometric properties of the MSIS-29v1 in a clinical trial cohort of relapsing-remitting MS patients (RRMS). Data from 600 patients with RRMS enrolled in the SELECT clinical trial were used. Assessments were performed at baseline and at Weeks 12, 24, and 52. In addition to traditional psychometric analyses, Item Response Theory (IRT) and Rasch analysis were used to evaluate the measurement properties of the MSIS-29v1. Both MSIS-29v1 subscales demonstrated strong reliability, construct validity, and responsiveness. The IRT and Rasch analysis showed overall support for response category threshold ordering, person-item fit, and item fit for both subscales. Both MSIS-29v1 subscales demonstrated robust measurement properties using classical, IRT, and Rasch techniques. Unlike previous research using a community-based sample, the MSIS-29v1 was found to be psychometrically sound to assess physical and psychological impairments in a clinical trial sample of patients with RRMS.
Development of the functional vision questionnaire for children and young people with visual impairment: the FVQ_CYP.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2013-12-01

To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

Rasch measurement: the Arm Activity measure (ArmA) passive function sub-scale.

PubMed

Ashford, Stephen; Siegert, Richard J; Alexandrescu, Roxana

2016-01-01

To evaluate the conformity of the Arm Activity measure (ArmA) passive function sub-scale to the Rasch model. A consecutive cohort of patients (n = 92) undergoing rehabilitation, including upper limb rehabilitation and spasticity management, at two specialist rehabilitation units were included. Rasch analysis was used to examine scaling and conformity to the model. Responses were analysed using Rasch unidimensional measurement models (RUMM 2030). The following aspects were considered: overall model and individual item fit statistics and fit residuals, internal reliability, item response threshold ordering, item bias, local dependency and unidimensionality. ArmA contains both active and passive function sub-scales, but in this analysis only the passive function sub-scale was considered. Four of the seven items in the ArmA passive function sub-scale initially had disordered thresholds. These items were rescored to four response options, which resulted in ordered thresholds for all items. Once the items with disordered thresholds had been rescored, item bias was not identified for age, global disability level or diagnosis, but with a small difference in difficulty between males and females for one item of the scale. Local dependency was not observed and the unidimensionality of the sub-scale was supported and good fit to the Rasch model was identified. The person separation index (PSI) was 0.95 indicating that the scale is able to reliably differentiate at least two groups of patients. The ArmA passive function sub-scale was shown in this evaluation to conform to the Rasch model once disordered thresholds had been addressed. Using the logit scores produced by the Rasch model it was possible to convert this back to the original scale range. Implications for Rehabilitation The ArmA passive function sub-scale was shown, in this evaluation, to conform to the Rasch model once disordered thresholds had been addressed and therefore to be a clinically applicable and potentially useful hierarchical measure. Using Rasch logit scores it has be possible to convert back to the original ordinal scale range and provide an indication of real change to enable evaluation of clinical outcome of importance to patients and clinicians.
The Attitudes to Ageing Questionnaire: Mokken Scaling Analysis

PubMed Central

Shenkin, Susan D.; Watson, Roger; Laidlaw, Ken; Starr, John M.; Deary, Ian J.

2014-01-01

Background Hierarchical scales are useful in understanding the structure of underlying latent traits in many questionnaires. The Attitudes to Ageing Questionnaire (AAQ) explored the attitudes to ageing of older people themselves, and originally described three distinct subscales: (1) Psychosocial Loss (2) Physical Change and (3) Psychological Growth. This study aimed to use Mokken analysis, a method of Item Response Theory, to test for hierarchies within the AAQ and to explore how these relate to underlying latent traits. Methods Participants in a longitudinal cohort study, the Lothian Birth Cohort 1936, completed a cross-sectional postal survey. Data from 802 participants were analysed using Mokken Scaling analysis. These results were compared with factor analysis using exploratory structural equation modelling. Results Participants were 51.6% male, mean age 74.0 years (SD 0.28). Three scales were identified from 18 of the 24 items: two weak Mokken scales and one moderate Mokken scale. (1) ‘Vitality’ contained a combination of items from all three previously determined factors of the AAQ, with a hierarchy from physical to psychosocial; (2) ‘Legacy’ contained items exclusively from the Psychological Growth scale, with a hierarchy from individual contributions to passing things on; (3) ‘Exclusion’ contained items from the Psychosocial Loss scale, with a hierarchy from general to specific instances. All of the scales were reliable and statistically significant with ‘Legacy’ showing invariant item ordering. The scales correlate as expected with personality, anxiety and depression. Exploratory SEM mostly confirmed the original factor structure. Conclusions The concurrent use of factor analysis and Mokken scaling provides additional information about the AAQ. The previously-described factor structure is mostly confirmed. Mokken scaling identifies a new factor relating to vitality, and a hierarchy of responses within three separate scales, referring to vitality, legacy and exclusion. This shows what older people themselves consider important regarding their own ageing. PMID:24892302
Dimensionality of the Knee Numeric-Entity Evaluation Score (KNEES-ACL): a condition-specific questionnaire.

PubMed

Comins, J D; Krogsgaard, M R; Kreiner, S; Brodersen, J

2013-10-01

The benefit of anterior cruciate ligament (ACL) reconstruction has been questioned based on patient-reported outcome measures (PROMs). Valid interpretation of such results requires confirmation of the psychometric properties of the PROM. Rasch analysis is the gold standard for validation of PROMs, yet PROMs used for ACL reconstruction have not been validated using Rasch analysis. We used Rasch analysis to investigate the psychometric properties of the Knee Numeric-Entity Evaluation Score (KNEES-ACL), a newly developed PROM for patients treated for ACL deficiency. Two-hundred forty-two patients pre- and post-ACL reconstruction completed the pilot PROM. Rasch models were used to assess the psychometric properties (e.g., unidimensionality, local response dependency, and differential item functioning). Forty-one items distributed across seven unidimensional constructs measuring impairment, functional limitations, and psychosocial consequences were confirmed to fit Rasch models. Fourteen items were removed because of statistical lack of fit and inadequate face validity. Local response dependency and differential item functioning were identified and adjusted. The KNEES-ACL is the first Rasch-validated condition-specific PROM constructed for patients with ACL deficiency and patients with ACL reconstruction. Thus, this instrument can be used for within- and between-group comparisons. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Evaluating HIV Knowledge Questionnaires Among Men Who Have Sex with Men: A Multi-Study Item Response Theory Analysis.

PubMed

Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian

2018-01-01

Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
An NCME Instructional Module on Polytomous Item Response Theory Models

ERIC Educational Resources Information Center

Penfield, Randall David

2014-01-01

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Using root cause analysis to promote critical thinking in final year Bachelor of Midwifery students.

PubMed

Carter, Amanda G; Sidebotham, Mary; Creedy, Debra K; Fenwick, Jennifer; Gamble, Jenny

2014-06-01

Midwives require well developed critical thinking to practice autonomously. However, multiple factors impinge on students' deep learning in the clinical context. Analysis of actual case scenarios using root cause analysis may foster students' critical thinking and application of 'best practice' principles in complex clinical situations. To examine the effectiveness of an innovative teaching strategy involving root cause analysis to develop students' perceptions of their critical thinking abilities. A descriptive, mixed methods design was used. Final 3rd year undergraduate midwifery students (n=22) worked in teams to complete and present an assessment item based on root cause analysis. The cases were adapted from coroners' reports. After graduation, 17 (77%) students evaluated the course using a standard university assessment tool. In addition 12 (54%) students provided specific feedback on the teaching strategy using a 16-item survey tool based on the domain concepts of Educational Acceptability, Educational Impact, and Preparation for Practice. Survey responses were on a 5-point Likert scale and analysed using descriptive statistics. Open-ended responses were analysed using content analysis. The majority of students perceived the course and this teaching strategy positively. The domain mean scores were high for Educational Acceptability (mean=4.3, SD=.49) and Educational Impact (mean=4.19, SD=.75) but slightly lower for Preparation for Practice (mean=3.7, SD=.77). Overall student responses to each item were positive with no item mean less than 3.42. Students found the root cause analysis challenging and time consuming but reported development of critical thinking skills about the complexity of practice, clinical governance and risk management principles. Analysing complex real life clinical cases to determine a root cause enhanced midwifery students' perceptions of their critical thinking. Teaching and assessment strategies to promote critical thinking need to be made explicit to students in order to foster ongoing development. © 2013.
"When I saw walking I just kind of took it as wheeling": interpretations of mobility-related items in generic, preference-based health state instruments in the context of spinal cord injury.

PubMed

Michel, Yvonne Anne; Engel, Lidia; Rand-Hendriksen, Kim; Augestad, Liv Ariane; Whitehurst, David Gt

2016-11-28

In health economic analyses, health states are typically valued using instruments with few items per dimension. Due to the generic (and often reductionist) nature of such instruments, certain groups of respondents may experience challenges in describing their health state. This study is concerned with generic, preference-based health state instruments that provide information for decisions about the allocation of resources in health care. Unlike physical measurement instruments, preference-based health state instruments provide health state values that are dependent on how respondents interpret the items. This study investigates how individuals with spinal cord injury (SCI) interpret mobility-related items contained within six preference-based health state instruments. Secondary analysis of focus group transcripts originally collected in Vancouver, Canada, explored individuals' perceptions and interpretations of mobility-related items contained within the 15D, Assessment of Quality of Life 8-dimension (AQoL-8D), EQ-5D-5L, Health Utilities Index (HUI), Quality of Well-Being Scale Self-Administered (QWB-SA), and the 36-item Short Form health survey version 2 (SF-36v2). Ritchie and Spencer's 'Framework Approach' was used to perform thematic analysis that focused on participants' comments concerning the mobility-related items only. Fifteen individuals participated in three focus groups (five per focus group). Four themes emerged: wording of mobility (e.g., 'getting around' vs 'walking'), reference to aids and appliances, lack of suitable response options, and reframing of items (e.g., replacing 'walking' with 'wheeling'). These themes reflected item features that respondents perceived as relevant in enabling them to describe their mobility, and response strategies that respondents could use when faced with inaccessible items. Investigating perceptions to mobility-related items within the context of SCI highlights substantial variation in item interpretation across six preference-based health state instruments. Studying respondents' interpretations of items can help to understand discrepancies in the health state descriptions and values obtained from different instruments. This line of research warrants closer attention in the health economics and quality of life literature.
Recommendations to improve the positive and negative syndrome scale (PANSS) based on item response theory.

PubMed

Levine, Stephen Z; Rabinowitz, Jonathan; Rizopoulos, Dimitris

2011-08-15

The adequacy of the Positive and Negative Syndrome Scale (PANSS) items in measuring symptom severity in schizophrenia was examined using Item Response Theory (IRT). Baseline PANSS assessments were analyzed from two multi-center clinical trials of antipsychotic medication in chronic schizophrenia (n=1872). Generally, the results showed that the PANSS (a) item ratings discriminated symptom severity best for the negative symptoms; (b) has an excess of "Severe" and "Extremely severe" rating options; and (c) assessments are more reliable at medium than very low or high levels of symptom severity. Analysis also showed that the detection of statistically and non-statistically significant differences in treatment were highly similar for the original and IRT-modified PANSS. In clinical trials of chronic schizophrenia, the PANSS appears to require the following modifications: fewer rating options, adjustment of 'Lack of judgment and insight', and improved severe symptom assessment. 2011 Elsevier Ltd. All rights reserved.
Item response theory analysis of the Lichtenberg Financial Decision Screening Scale.

PubMed

Teresi, Jeanne A; Ocepek-Welikson, Katja; Lichtenberg, Peter A

2017-01-01

The focus of these analyses was to examine the psychometric properties of the Lichtenberg Financial Decision Screening Scale (LFDSS). The purpose of the screen was to evaluate the decisional abilities and vulnerability to exploitation of older adults. Adults aged 60 and over were interviewed by social, legal, financial, or health services professionals who underwent in-person training on the administration and scoring of the scale. Professionals provided a rating of the decision-making abilities of the older adult. The analytic sample included 213 individuals with an average age of 76.9 (SD = 10.1). The majority (57%) were female. Data were analyzed using item response theory (IRT) methodology. The results supported the unidimensionality of the item set. Several IRT models were tested. Ten ordinal and binary items evidenced a slightly higher reliability estimate (0.85) than other versions and better coverage in terms of the range of reliable measurement across the continuum of financial incapacity.
Rasch-family models are more valuable than score-based approaches for analysing longitudinal patient-reported outcomes with missing data.

PubMed

de Bock, Élodie; Hardouin, Jean-Benoit; Blanchin, Myriam; Le Neel, Tanguy; Kubis, Gildas; Bonnaud-Antignac, Angélique; Dantan, Étienne; Sébille, Véronique

2016-10-01

The objective was to compare classical test theory and Rasch-family models derived from item response theory for the analysis of longitudinal patient-reported outcomes data with possibly informative intermittent missing items. A simulation study was performed in order to assess and compare the performance of classical test theory and Rasch model in terms of bias, control of the type I error and power of the test of time effect. The type I error was controlled for classical test theory and Rasch model whether data were complete or some items were missing. Both methods were unbiased and displayed similar power with complete data. When items were missing, Rasch model remained unbiased and displayed higher power than classical test theory. Rasch model performed better than the classical test theory approach regarding the analysis of longitudinal patient-reported outcomes with possibly informative intermittent missing items mainly for power. This study highlights the interest of Rasch-based models in clinical research and epidemiology for the analysis of incomplete patient-reported outcomes data. © The Author(s) 2013.
Psychometric properties of a revised version of the Assisting Hand Assessment (Kids-AHA 5.0).

PubMed

Holmefur, Marie M; Krumlinde-Sundholm, Lena

2016-06-01

The aim of this study was to scrutinize the Assisting Hand Assessment (AHA) version 4.4 for possible improvements and to evaluate the psychometric properties regarding internal scale validity and aspects of reliability of a revised version of the AHA. In collaboration with experts, scoring criteria were changed for four items, and one fully new item was constructed. Twenty-two original, one new, and four revised items were scored for 164 assessments of children with unilateral cerebral palsy aged 18 months to 12 years. Rasch measurement analysis was used to evaluate internal scale validity by exploring rating-scale functioning, item and person goodness-of-fit, and principal component analysis. Targeting and scale reliability were also evaluated. After removal of misfitting items, a 20-item scale showed satisfactory goodness-of-fit. Unidimensionality was confirmed by principal component analysis. The rating scale functioned well for the 20 items, and the item difficulty was well suited to the ability level of the sample. The person reliability coefficient was 0.98, indicating high separation ability of the scale. A conversion table of AHA scores between the previous version (4.4) and the new version (5.0) was constructed. The new, 20-item version of the Kids-AHA (version 5.0), demonstrated excellent internal scale validity, suggesting improved responsiveness to changes and shortened scoring time. For comparison of scores from version 4.4 to 5.0, a transformation table is presented. © 2015 Mac Keith Press.
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

ERIC Educational Resources Information Center

Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

2011-01-01

The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
Development and Validation of a Disease-Specific Instrument to Measure Diet-Targeted Quality of Life for Postoperative Patients with Esophagogastric Cancer.

PubMed

Honda, Michitaka; Wakita, Takafumi; Onishi, Yoshihiro; Nunobe, Souya; Miura, Akinori; Nishigori, Tatsuto; Kusanagi, Hiroshi; Yamamoto, Takatsugu; Boddy, Alexander; Fukuhara, Shunichi

2015-12-01

Patients who have undergone esophagectomy or gastrectomy have certain dietary limitations because of changes to the alimentary tract. This study attempted to develop a psychometric scale, named "Esophago-Gastric surgery and Quality of Dietary life (EGQ-D)," for assessment of impact of upper gastrointestinal surgery on diet-targeted quality of life. Using qualitative methods, the study team interviewed both patients and surgeons involved in esophagogastric cancer surgery, and we prepared an item pool and a draft scale. To evaluate the scale's psychometric reliability and validity, a survey involving a large number of patients was conducted. Items for the final scale were selected by factor analysis and item response theory. Cronbach's alpha was used for assessment of reliability, and correlations with the short form (SF)-12, esophagus and stomach surgery symptom scale (ES(4)), and nutritional indicators were analyzed to assess the criterion-related validity. Through multifaceted discussion and the pilot study, a draft questionnaire comprising 14 items was prepared, and a total of 316 patients were enrolled. On the basis of factor analysis and item response theory, six items were excluded, and the remaining eight items demonstrated strong unidimensionality for the final scale. Cronbach's alpha was 0.895. There were significant associations with all the subscale scores for SF-12, ES(4), and nutritional indicators. The EGQ-D scale has good contents and psychometric validity and can be used to evaluate disease-specific instrument to measure diet-targeted quality of life for postoperative patients with esophagogastric cancer.
The Value of Response Times in Item Response Modeling

ERIC Educational Resources Information Center

Molenaar, Dylan

2015-01-01

A new and very interesting approach to the analysis of responses and response times is proposed by Goldhammer (this issue). In his approach, differences in the speed-ability compromise within respondents are considered to confound the differences in ability between respondents. These confounding effects of speed on the inferences about ability can…
Cross-cultural adaptation and construct validity of the Korean version of a physical activity measure for community-dwelling elderly.

PubMed

Choi, Bongsam

2018-01-01

[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Item response theory-based validation of a short form of the Eating Behavior Scale for Japanese adults

PubMed Central

Tayama, Jun; Ogawa, Sayaka; Takeoka, Atsushi; Kobayashi, Masakazu; Shirabe, Susumu

2017-01-01

Abstract Obesity has become a serious social problem in industrialized countries in recent years. Clinically, although the evaluation of dietary behavior abnormalities is as important as any method of risk assessment for obesity, almost all the existing scales with many items may have numerous practical clinical difficulties. In this study, we aimed to prepare a short questionnaire to assess the dietary behavior abnormalities related to obesity. A total of 1032 individuals aged 20 to 59 years participated in the present study. Using item response theory (IRT), we selected the items for a short version from among 30 items of Sakata Eating Behavior Scale (EBS), which is widely used in Japan. As a result of the IRT-based analysis on the original 30-item version, 7 items were adopted as the short version. The correlation between the total score of the original EBS and the EBS short form was extremely high (r = 0.93, P = .001). In examining the criterion validity, for all participants (n = 1032), male (n = 516), and female (n = 516), the correlation coefficients between the total score of the EBS short form and body mass index (BMI) were r = 0.26, r = 0.28, and r = 0.28, respectively. The results of the receiver operating characteristic analysis was performed with obesity BMI > 25 kg/m2 as a dependent variable, the value of the area under the curve in the ROC was significantly higher in the 7-item version than in the total score of the original items (P = .0005). In conclusion, the 7-item EBS short form was created. Furthermore, it was found that the EBS short form is a reliable and valid measure that can be used as an indicator of obesity in both clinical and research settings. PMID:29049248
Testing the PROMIS® Depression measures for monitoring depression in a clinical sample outside the US.

PubMed

Vilagut, G; Forero, C G; Adroher, N D; Olariu, E; Cella, D; Alonso, J

2015-09-01

The Patient Reported Outcomes Measurement Information System (PROMIS) was devised to facilitate assessment of patient self-reported health status, taking advantage of Item Response Theory. We aimed to assess measurement properties of the PROMIS Depression item bank and an 8-item static short form in a Spanish clinical sample. A three-month follow-up study of patients with active mood/anxiety symptoms (n = 218) was carried out. We assessed model unidimensionality (Confirmatory Item Factor Analysis), reliability (internal consistency and Item Information Curves), and validity (convergent-discriminant with correlations; known-groups with comparison of means and effect sizes; and criterion validity with Receiver operating Characteristics (ROC) analysis). We also assessed 3-month responsiveness to change (Cohen's effect sizes (d) in stable and recovered patients). The unidimensional model showed adequate fit (CFI = 0.97, RMSEA = 0.08). Information Curves had reliabilities over 0.90 throughout most of the score continuum. As expected, we observed high correlations with external self-reported depression, and moderate with self-reported anxiety and clinical measures. The item bank showed an increasing severity gradient from no disorder (mean = 48, SE = 0.6) to depression with comorbid anxiety (mean = 55.8, SE = 0.4). PROMIS detected depression disorder with great accuracy according to the area under the curve (AUC = 0.89). Both formats, item bank and short form, were highly responsive to change in recovered patients (d > 0.7) and had small changes in stable patients (d < 0.2). The good metric properties of the Spanish PROMIS Depression measures provide further evidence of their adequacy for monitoring depression levels of patients in clinical settings. This double check of quality (within countries and populations) supports the ability of PROMIS measures for guaranteeing fair comparisons across languages and countries in specific clinical populations. Copyright © 2015 Elsevier Ltd. All rights reserved.
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…

Evaluating Differential Item Functioning in the English General Practice Patient Survey: Comparison of South Asian and White British Subgroups.

PubMed

Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John

2015-09-01

To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.
Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

PubMed

Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

2018-06-01

This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
Validation of Catquest-9SF-A Visual Disability Instrument to Evaluate Patient Function After Corneal Transplantation.

PubMed

Claesson, Margareta; Armitage, W John; Byström, Berit; Montan, Per; Samolov, Branka; Stenvi, Ulf; Lundström, Mats

2017-09-01

Catquest-9SF is a 9-item visual disability questionnaire developed for evaluating patient-reported outcome measures after cataract surgery. The aim of this study was to use Rasch analysis to determine the responsiveness of Catquest-9SF for corneal transplant patients. Patients who underwent corneal transplantation primarily to improve vision were included. One group (n = 199) completed the Catquest-9SF questionnaire before corneal transplantation and a second independent group (n = 199) completed the questionnaire 2 years after surgery. All patients were recorded in the Swedish Cornea Registry, which provided clinical and demographic data for the study. Winsteps software v.3.91.0 (Winsteps.com, Beaverton, OR) was used to assess the fit of the Catquest-9SF data to the Rasch model. Rasch analysis showed that Catquest-9SF applied to corneal transplant patients was unidimensional (infit range, 0.73-1.32; outfit range, 0.81-1.35), and therefore, measured a single underlying construct (visual disability). The Rasch model explained 68.5% of raw variance. The response categories of the 9-item questionnaire were ordered, and the category thresholds were well defined. Item difficulty matched the level of patients' ability (0.36 logit difference between the means). Precision in terms of person separation (3.09) and person reliability (0.91) was good. Differential item functioning was notable for only 1 item (satisfaction with vision), which had a differential item functioning contrast of 1.08 logit. Rasch analysis showed that Catquest-9SF is a valid instrument for measuring visual disability in patients who have undergone corneal transplantation primarily to improve vision.
Evaluation of the Intermittent Exotropia Questionnaire using Rasch analysis.

PubMed

Leske, David A; Holmes, Jonathan M; Melia, B Michele

2015-04-01

The Intermittent Exotropia Questionnaire (IXTQ) is a patient, proxy, and parental report of quality of life specific to children with intermittent exotropia. We refine the IXTQ using Rasch analysis to improve reliability and validity. Rasch analysis was performed on responses of 575 patients with intermittent exotropia enrolled from May 15, 2008, through July 24, 2013, and their parents from each of the 4 IXTQ health-related quality-of-life questionnaires (child 5 through 7 years of age and child 8 through 17 years of age, proxy, and parent questionnaires). Questionnaire performance and structure were confirmed in a separate cohort of 379 patients with intermittent exotropia. One item was removed from the 12-item child and proxy questionnaires, and response options in the 8- to 17-year-old child IXTQ and proxy IXTQ were combined into 3 response options for both questionnaires. Targeting was relatively poor for the child and proxy questionnaires. For the parent questionnaire, 3 subscales (psychosocial, function, and surgery) were evident. One item was removed from the psychosocial subscale. Resulting subscales had appropriate targeting. The Rasch-revised IXTQ may be a useful instrument for determining how intermittent exotropia affects health-related quality of life of children with intermittent exotropia and their parents, particularly for cohort studies.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis.

PubMed

Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén

2017-01-01

Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis

PubMed Central

Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén

2017-01-01

Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument. PMID:28298898
Revised Olweus Bully/Victim Questionnaire: evaluation in visually impaired.

PubMed

Gothwal, Vijaya K; Sumalini, Rebecca; Irfan, Shaik Mohammad; Giridhar, Avula; Bharani, Seelam

2013-08-01

To explore the psychometric properties of the revised Olweus Bully/Victim Questionnaire (OBVQ) in children with visual impairment (VI) using Rasch analysis. One hundred fifty Indian children with VI between 8 and 16 years (mean age, 11.6 years; 69% male; mean acuity in the better eye of 0.80 logMAR [Snellen, 20/126]) were administered the revised OBVQ. The 40-item revised OBVQ was developed to assess victimization (i.e., being bullied) and bullying (bullying others) in normally sighted schoolchildren. Only 16 items are used for Rasch analysis and are divided into two parts: I (victimization, eight items) and II (bullying others, eight items). Separate Rasch analysis was conducted for both parts, and the psychometric properties investigated included behavior of rating scale, extent to which the items measured a single construct (unidimensionality by fit statistics and principal component analysis [PCA] of residuals); ability to discriminate among participants' victimization and bullying behaviors (measurement precision as assessed by person separation reliability [PSR] minimum recommended value, 0.80); and targeting of items to participants' victimization and bullying. Response categories were misused for both parts I and II, which required repair before further analysis. Measurement precision was inadequate for both parts (PSR, 0.64 for part I and 0.19 for part II), indicating poor discriminatory ability. All items fit the Rasch model well in part I, indicating unidimensionality that was further confirmed using PCA of residuals. However, an item misfit in part II that required deletion following which the remaining items fit and PCA of residuals also supported unidimensionality. Targeting was -0.58 logits for part I, indicating that the items were matched well with the participants' victimization. By comparison, targeting was suboptimal for part II (-1.97 logits). In its current state, the revised OBVQ is not a valid psychometric instrument to assess victimization and bullying among children with VI.
Which kind of psychometrics is adequate for patient satisfaction questionnaires?

PubMed

Konerding, Uwe

2016-01-01

The construction and psychometric analysis of patient satisfaction questionnaires are discussed. The discussion is based upon the classification of multi-item questionnaires into scales or indices. Scales consist of items that describe the effects of the latent psychological variable to be measured, and indices consist of items that describe the causes of this variable. Whether patient satisfaction questionnaires should be constructed and analyzed as scales or as indices depends upon the purpose for which these questionnaires are required. If the final aim is improving care with regard to patients' preferences, then these questionnaires should be constructed and analyzed as indices. This implies two requirements: 1) items for patient satisfaction questionnaires should be selected in such a way that the universe of possible causes of patient satisfaction is covered optimally and 2) Cronbach's alpha, principal component analysis, exploratory factor analysis, confirmatory factor analysis, and analyses with models from item response theory, such as the Rasch Model, should not be applied for psychometric analyses. Instead, multivariate regression analyses with a direct rating of patient satisfaction as the dependent variable and the individual questionnaire items as independent variables should be performed. The coefficients produced by such an analysis can be applied for selecting the best items and for weighting the selected items when a sum score is determined. The lower boundaries of the validity of the unweighted and the weighted sum scores can be estimated by their correlations with the direct satisfaction rating. While the first requirement is fulfilled in the majority of the previous patient satisfaction questionnaires, the second one deviates from previous practice. Hence, if patient satisfaction is actually measured with the final aim of improving care with regard to patients' preferences, then future practice should be changed so that the second requirement is also fulfilled.
Measuring Intervention Effectiveness: The Benefits of an Item Response Theory Approach

ERIC Educational Resources Information Center

McEldoon, Katherine; Cho, Sun-Joo; Rittle-Johnson, Bethany

2012-01-01

Assessing the effectiveness of educational interventions relies on quantifying differences between interventions groups over time in a between-within design. Binary outcome variables (e.g., correct responses versus incorrect responses) are often assessed. Widespread approaches use percent correct on assessments, and repeated measures analysis of…
Depression symptoms across cultures: an IRT analysis of standard depression symptoms using data from eight countries.

PubMed

Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J

2016-07-01

Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
Development and refinement of the WAItE: a new obesity-specific quality of life measure for adolescents.

PubMed

Oluboyede, Yemi; Hulme, Claire; Hill, Andrew

2017-08-01

Few weight-specific outcome measures, developed specifically for obese and overweight adolescents, exist and none are suitable for the elicitation of utility values used in the assessment of cost effectiveness. The development of a descriptive system for a new weight-specific measure. Qualitative interviews were conducted with 31 treatment-seeking (above normal weight status) and non-treatment-seeking (school sample) adolescents aged 11-18 years, to identify a draft item pool and associated response options. 315 eligible consenting adolescents, aged 11-18 years, enrolled in weight management services and recruited via an online panel, completed two version of a long-list 29-item descriptive system (consisting of frequency and severity response scales). Psychometric assessments and Rasch analysis were applied to the draft 29-item instrument to identify a brief tool containing the best performing items and associated response options. Seven items were selected, for the final item set; all displayed internal consistency, moderate floor effects and the ability to discriminate between weight categories. The assessment of unidimensionality was supported (t test statistic of 0.024, less than the 0.05 threshold value). The Weight-specific Adolescent Instrument for Economic-evaluation focuses on aspects of life affected by weight that are important to adolescents. It has the potential for adding key information to the assessment of weight management interventions aimed at the younger population.
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks

PubMed Central

Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.

2013-01-01

Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

PubMed

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Development of a Self-Determination Measure for College Students: Validity Evidence for the Basic Needs Satisfaction at College Scale

ERIC Educational Resources Information Center

Jenkins-Guarnieri, Michael A.; Vaughan, Angela L.; Wright, Stephen L.

2015-01-01

We adapted a work self-determination measure to create the Basic Needs Satisfaction at College Scale. Confirmatory factor analysis and item response theory analyses with data from 525 adults supported a 3-factor model with 13 items most sensitive for lower to middle range levels of the autonomy, competence, and relatedness constructs.
An Information Analysis of 2-, 3-, and 4-Word Verbal Discrimination Learning.

ERIC Educational Resources Information Center

Arima, James K.; Gray, Francis D.

Information theory was used to qualify the difficulty of verbal discrimination (VD) learning tasks and to measure VD performance. Words for VD items were selected with high background frequency and equal a priori probabilities of being selected as a first response. Three VD lists containing only 2-, 3-, or 4-word items were created and equated for…
Impact of Missing Data on the Detection of Differential Item Functioning: The Case of Mantel-Haenszel and Logistic Regression Analysis

ERIC Educational Resources Information Center

Robitzsch, Alexander; Rupp, Andre A.

2009-01-01

This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
Developing the Impossible Figures Task to Assess Visual-Spatial Talents among Chinese Students: A Rasch Measurement Model Analysis

ERIC Educational Resources Information Center

Chan, David W.

2010-01-01

Data of item responses to the Impossible Figures Task (IFT) from 492 Chinese primary, secondary, and university students were analyzed using the dichotomous Rasch measurement model. Item difficulty estimates and person ability estimates located on the same logit scale revealed that the pooled sample of Chinese students, who were relatively highly…
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis

PubMed Central

Handren, Lindsay; Crano, William D.

2018-01-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of “dry” weekdays and “wet” weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns. PMID:27488456
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis.

PubMed

Lac, Andrew; Handren, Lindsay; Crano, William D

2016-10-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of "dry" weekdays and "wet" weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns.
Development and validation of a new survey: Perceptions of Teaching as a Profession (PTaP)

NASA Astrophysics Data System (ADS)

Adams, Wendy

2017-01-01

To better understand the impact of efforts to train more science teachers such as the PhysTEC Project and to help with early identification of future teachers, we are developing the survey of Perceptions of Teaching as a Profession (PTaP) to measure students' views of teaching as a career, their interest in teaching and the perceived climate of physics departments towards teaching as a profession. The instrument consists of a series of statements which require a response using a 5-point Likert-scale and can be easily administered online. The survey items were drafted by a team of researchers and physics teacher candidates and then reviewed by an advisory committee of 20 physics teacher educators and practicing teachers. We conducted 27 interviews with both teacher candidates and non-teaching STEM majors. The survey was refined through an iterative process of student interviews and item clarification until all items were interpreted consistently and answered for consistent reasons. In this presentation the preliminary results from the student interviews as well as the results of item analysis and a factor analysis on 900 student responses will be shared.

Development of a disease-specific quality of life questionnaire in Addison's disease.

PubMed

Løvås, Kristian; Curran, Suzanne; Oksnes, Marianne; Husebye, Eystein S; Huppert, Felicia A; Chatterjee, V Krishna K

2010-02-01

Patients with Addison's disease reproducibly self-report impairment in specific dimensions of general well-being questionnaires, suggesting particular deficiencies in health-related quality-of-life (HRQoL). We sought to develop an Addison's disease-specific questionnaire (AddiQoL) that could better quantify altered well-being and treatment effects. Design, Setting, Patients, Intervention, and Outcomes: We reviewed the literature to identify HRQoL issues in Addison's disease and interviewed patients and their partners in-depth to explore various symptom domains. A list of items was generated, and nine expert clinicians and five expert patients assessed the list for impact and clarity. A preliminary questionnaire was presented to 100 Addison's outpatients; the number of items was reduced after analysis of the distribution of the responses. The final questionnaire responses were assessed by Cronbach's alpha and Rasch analysis. Published studies of HRQoL in Addison's disease indicated reduced vitality and general health perception and limitations in physical and emotional functioning. In-depth interviews of 14 patients and seven partners emphasized the impact of the disease on the emotional domain. Seventy HRQoL items were generated; after the expert consultation process and pretesting in 100 patients, the number of items was reduced to 36. Eighty-six patients completed the final questionnaire; the responses showed high internal consistency with Cronbach's alpha 0.95 and Person Separation Index 0.94 (Rasch analysis). We envisage AddiQoL having utility in trials of hormone replacement and management of patients with Addison's disease, analogous to similar questionnaires in GH deficiency (AGHDA) and acromegaly (AcroQoL).
The psychometric properties of the "Reading the Mind in the Eyes" Test: an item response theory (IRT) analysis.

PubMed

Preti, Antonio; Vellante, Marcello; Petretto, Donatella R

2017-05-01

The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
Three pedagogical approaches to introductory physics labs and their effects on student learning outcomes

NASA Astrophysics Data System (ADS)

Chambers, Timothy

This dissertation presents the results of an experiment that measured the learning outcomes associated with three different pedagogical approaches to introductory physics labs. These three pedagogical approaches presented students with the same apparatus and covered the same physics content, but used different lab manuals to guide students through distinct cognitive processes in conducting their laboratory investigations. We administered post-tests containing multiple-choice conceptual questions and free-response quantitative problems one week after students completed these laboratory investigations. In addition, we collected data from the laboratory practical exam taken by students at the end of the semester. Using these data sets, we compared the learning outcomes for the three curricula in three dimensions of ability: conceptual understanding, quantitative problem-solving skill, and laboratory skills. Our three pedagogical approaches are as follows. Guided labs lead students through their investigations via a combination of Socratic-style questioning and direct instruction, while students record their data and answers to written questions in the manual during the experiment. Traditional labs provide detailed written instructions, which students follow to complete the lab objectives. Open labs provide students with a set of apparatus and a question to be answered, and leave students to devise and execute an experiment to answer the question. In general, we find that students performing Guided labs perform better on some conceptual assessment items, and that students performing Open labs perform significantly better on experimental tasks. Combining a classical test theory analysis of post-test results with in-lab classroom observations allows us to identify individual components of the laboratory manuals and investigations that are likely to have influenced the observed differences in learning outcomes associated with the different pedagogical approaches. Due to the novel nature of this research and the large number of item-level results we produced, we recommend additional research to determine the reproducibility of our results. Analyzing the data with item response theory yields additional information about the performance of our students on both conceptual questions and quantitative problems. We find that performing lab activities on a topic does lead to better-than-expected performance on some conceptual questions regardless of pedagogical approach, but that this acquired conceptual understanding is strongly context-dependent. The results also suggest that a single "Newtonian reasoning ability" is inadequate to explain student response patterns to items from the Force Concept Inventory. We develop a framework for applying polytomous item response theory to the analysis of quantitative free-response problems and for analyzing how features of student solutions are influenced by problem-solving ability. Patterns in how students at different abilities approach our post-test problems are revealed, and we find hints as to how features of a free-response problem influence its item parameters. The item-response theory framework we develop provides a foundation for future development of quantitative free-response research instruments. Chapter 1 of the dissertation presents a brief history of physics education research and motivates the present study. Chapter 2 describes our experimental methodology and discusses the treatments applied to students and the instruments used to measure their learning. Chapter 3 provides an introduction to the statistical and analytical methods used in our data analysis. Chapter 4 presents the full data set, analyzed using both classical test theory and item response theory. Chapter 5 contains a discussion of the implications of our results and a data-driven analysis of our experimental methods. Chapter 6 describes the importance of this work to the field and discusses the relevance of our research to curriculum development and to future work in physics education research.
Relationship between Item Responses of Negative Affect Items and the Distribution of the Sum of the Item Scores in the General Population

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Development and Validation of a Six-Item Version of the Interpersonal Dependency Inventory.

PubMed

McClintock, Andrew S; McCarrick, Shannon M; Anderson, Timothy; Himawan, Lina; Hirschfeld, Robert

2017-04-01

The Interpersonal Dependency Inventory (IDI) is a frequently used, 48-item measure of maladaptive dependency. Our goal was to develop and psychometrically evaluate a very brief version of the IDI. An exploratory factor analysis of the IDI in Study 1 ( N = 838) yielded a six-item IDI (IDI-6), with three items loading on an emotional dependency factor (IDI-6-ED), and the other three items loading on a functional dependency factor (IDI-6-FD). This factor solution was validated by confirmatory factor analysis in Study 2 ( N = 916). The IDI-6-ED and IDI-6-FD demonstrated good convergent and divergent validity in Study 3 ( N = 100). In Study 4 ( N = 22-43), the IDI-6-ED and IDI-6-FD were generally stable over 4-week and 8-week intervals and were found to be responsive to the effects of psychological treatment. These results have implications for dependency conceptualizations and support the IDI-6 as a brief, psychometrically sound instrument.
Attitudes Toward Transgender Men and Women: Development and Validation of a New Measure

PubMed Central

Billard, Thomas J

2018-01-01

A series of three studies were conducted to generate, develop, and validate the Attitudes toward Transgender Men and Women (ATTMW) scale. In Study 1, 120 American adults responded to an open-ended questionnaire probing various dimensions of their perceptions of transgender individuals and identity. Qualitative thematic analysis generated 200 items based on their responses. In Study 2, 238 American adults completed a questionnaire consisting of the generated items. Exploratory factor analysis (EFA) revealed two non-identical 12-item subscales (ATTM and ATTW) of the full 24-item scale. In Study 3, 150 undergraduate students completed a survey containing the ATTMW and a number of validity-testing variables. Confirmatory factor analysis (CFA) verified the single-factor structures of the ATTM and ATTW subscales, and the convergent, discriminant, predictive, and concurrent validities of the ATTMW were also established. Together, our results demonstrate that the ATTMW is a reliable and valid measure of attitudes toward transgender individuals. PMID:29666595
Utility of the Mayo-Portland adaptability inventory-4 for self-reported outcomes in a military sample with traumatic brain injury.

PubMed

Kean, Jacob; Malec, James F; Cooper, Douglas B; Bowles, Amy O

2013-12-01

To investigate the psychometric properties of the Mayo-Portland Adaptability Inventory-4 (MPAI-4) obtained by self-report in a large sample of active duty military personnel with traumatic brain injury (TBI). Consecutive cohort who completed the MPAI-4 as a part of a larger battery of clinical outcome measures at the time of intake to an outpatient brain injury clinic. Medical center. Consecutively referred sample of active duty military personnel (N=404) who suffered predominantly mild (n=355), but also moderate (n=37) and severe (n=12), TBI. Not applicable. MPAI-4 RESULTS: Initial factor analysis suggested 2 salient dimensions. In subsequent analysis, the ratio of the first and second eigenvalues (6.84:1) and parallel analysis indicated sufficient unidimensionality in 26 retained items. Iterative Rasch analysis resulted in the rescaling of the measure and the removal of 5 additional items for poor fit. The items of the final 21-item Mayo-Portland Adaptability Inventory-military were locally independent, demonstrated monotonically increasing responses, adequately fit the item response model, and permitted the identification of nearly 5 statistically distinct levels of disability in the study population. Slight mistargeting of the population resulted in the global outcome, as measured by the Mayo-Portland Adaptability Inventory-military, tending to be less reflective of very mild levels of disability. These data collected in a relatively large sample of active duty service members with TBI provide insight into the ability of patients to self-report functional impairment and the distinct effects of military deployment on outcome, providing important guidance for the meaningful measurement of outcome in this population. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9.

PubMed

Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

PubMed

Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

2015-08-01

The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Stochastic Approximation Methods for Latent Regression Item Response Models

ERIC Educational Resources Information Center

von Davier, Matthias; Sinharay, Sandip

2010-01-01

This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Item-level psychometrics and predictors of performance for Spanish/English bilingual speakers on an object and action naming battery.

PubMed

Edmonds, Lisa A; Donovan, Neila J

2012-04-01

There is a pressing need for psychometrically sound naming materials for Spanish/English bilingual adults. To address this need, in this study the authors examined the psychometric properties of An Object and Action Naming Battery (An O&A Battery; Druks & Masterson, 2000) in bilingual speakers. Ninety-one Spanish/English bilinguals named O&A Battery items in English and Spanish. Responses underwent a Rasch analysis. Using correlation and regression analyses, the authors evaluated the effect of psycholinguistic (e.g., imageability) and participant (e.g., proficiency ratings) variables on accuracy. Rasch analysis determined unidimensionality across English and Spanish nouns and verbs and robust item-level psychometric properties, evidence for content validity. Few items did not fit the model, there were no ceiling or floor effects after uninformative and misfit items were removed, and items reflected a range of difficulty. Reliability coefficients were high, and the number of statistically different ability levels provided indices of sensitivity. Regression analyses revealed significant correlations between psycholinguistic variables and accuracy, providing preliminary construct validity. The participant variables that contributed most to accuracy were proficiency ratings and time of language use. Results suggest adequate content and construct validity of O&A items retained in the analysis for Spanish/English bilingual adults and support future efforts to evaluate naming in older bilinguals and persons with bilingual aphasia.
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
A critique of Rasch residual fit statistics.

PubMed

Karabatsos, G

2000-01-01

In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.
Assessing Hopelessness in Terminally Ill Cancer Patients: Development of the Hopelessness Assessment in Illness Questionnaire

PubMed Central

Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William

2013-01-01

Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
Rasch analysis of the UK Functional Assessment Measure in patients with complex disability after stroke.

PubMed

Medvedev, Oleg N; Turner-Stokes, Lynne; Ashford, Stephen; Siegert, Richard J

2018-02-28

To determine whether the UK Functional Assessment Measure (UK FIM+FAM) fits the Rasch model in stroke patients with complex disability and, if so, to derive a conversion table of Rasch-transformed interval level scores. The sample included a UK multicentre cohort of 1,318 patients admitted for specialist rehabilitation following a stroke. Rasch analysis was conducted for the 30-item scale including 3 domains of items measuring physical, communication and psychosocial functions. The fit of items to the Rasch model was examined using 3 different analytical approaches referred to as "pathways". The best fit was achieved in the pathway where responses from motor, communication and psychosocial domains were summarized into 3 super-items and where some items were split because of differential item functioning (DIF) relative to left and right hemisphere location (χ2 (10) = 14.48, p = 0.15). Re-scoring of items showing disordered thresholds did not significantly improve the overall model fit. The UK FIM+FAM with domain super-items satisfies expectations of the unidimensional Rasch model without the need for re-scoring. A conversion table was produced to convert the total scale scores into interval-level data based on person estimates of the Rasch model. The clinical benefits of interval-transformed scores require further evaluation.
Construct Validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R): Lessons Learned

NASA Astrophysics Data System (ADS)

Pruski, Linda A.; Blanco, Sharon L.; Riggs, Rosemary A.; Grimes, Kandi K.; Fordtran, Chase W.; Barbola, Gina M.; Cornell, John E.; Lichtenstein, Michael J.

2013-11-01

Described herein is the academic lineage and independent validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R). Data from 334 K-12 science teachers were analyzed using Partial Credit Rasch models. Principal components analysis on the person-item residuals suggest two latent dimensions: Knowledge and Teaching Self-Efficacies. Item-fit statistics were used to select items for each subscale. Person and item separation (reliability) indices were quite low, and we noted disordered response patterns on the person-item maps that revealed problems with item content and/or scaling for both subscales. These issues include the presence of: verbal negatives, ambiguous modifiers, counter-intuitive scaling, and an "undecided/uncertain" option. The SETAKIST-R, in its current form, cannot be recommended as a measure of science teacher self-efficacy.
Dimensions of vegetable parenting practices among preschoolers.

PubMed

Baranowski, Tom; Chen, Tzu-An; O'Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Frankel, Leslie; Diep, Cassandra; Baranowski, Janice C

2013-10-01

The objective of this study was to determine the factor structure of 31 effective and ineffective vegetable parenting practices used by parents of preschool children based on three theoretically proposed factors: responsiveness, control and structure. The methods employed included both corrected item-total correlations and confirmatory factor analysis. Acceptable fit was obtained only when effective and ineffective parenting practices were analyzed separately. Among effective items the model included one second order factor (effectiveness) and the three proposed first order factors. The same structure was revealed among ineffective items, but required correlated paths be specified among items. A theoretically specified three factor structure was obtained among 31 vegetable parenting practice items, but likely to be effective and ineffective items had to be analyzed separately. Research is needed on how these parenting practices factors predict child vegetable intake. Copyright © 2013 Elsevier Ltd. All rights reserved.
Item response theory and the measurement of motor behavior.

PubMed

Safrit, M J; Cohen, A S; Costa, M G

1989-12-01

Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Calibration of the Dutch-Flemish PROMIS Pain Behavior item bank in patients with chronic pain.

PubMed

Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B

2016-02-01

The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®

Evaluation properties of the French version of the OUT-PATSAT35 satisfaction with care questionnaire according to classical and item response theory analyses.

PubMed

Panouillères, M; Anota, A; Nguyen, T V; Brédart, A; Bosset, J F; Monnier, A; Mercier, M; Hardouin, J B

2014-09-01

The present study investigates the properties of the French version of the OUT-PATSAT35 questionnaire, which evaluates the outpatients' satisfaction with care in oncology using classical analysis (CTT) and item response theory (IRT). This cross-sectional multicenter study includes 692 patients who completed the questionnaire at the end of their ambulatory treatment. CTT analyses tested the main psychometric properties (convergent and divergent validity, and internal consistency). IRT analyses were conducted separately for each OUT-PATSAT35 domain (the doctors, the nurses or the radiation therapists and the services/organization) by models from the Rasch family. We examined the fit of the data to the model expectations and tested whether the model assumptions of unidimensionality, monotonicity and local independence were respected. A total of 605 (87.4%) respondents were analyzed with a mean age of 64 years (range 29-88). Internal consistency for all scales separately and for the three main domains was good (Cronbach's α 0.74-0.98). IRT analyses were performed with the partial credit model. No disordered thresholds of polytomous items were found. Each domain showed high reliability but fitted poorly to the Rasch models. Three items in particular, the item about "promptness" in the doctors' domain and the items about "accessibility" and "environment" in the services/organization domain, presented the highest default of fit. A correct fit of the Rasch model can be obtained by dropping these items. Most of the local dependence concerned items about "information provided" in each domain. A major deviation of unidimensionality was found in the nurses' domain. CTT showed good psychometric properties of the OUT-PATSAT35. However, the Rasch analysis revealed some misfitting and redundant items. Taking the above problems into consideration, it could be interesting to refine the questionnaire in a future study.
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure

PubMed Central

McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.

2013-01-01

Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare

PubMed Central

2011-01-01

Background There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale). Methods We used the Standards for Educational and Psychological Testing as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression. Results Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression. Conclusions The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change. PMID:21595888
Response latencies are alive and well for identifying fakers on a self-report personality inventory: A reconsideration of van Hooft and Born (2012).

PubMed

Holden, Ronald R; Lambert, Christine E

2015-12-01

Van Hooft and Born (Journal of Applied Psychology 97:301-316, 2012) presented data challenging both the correctness of a congruence model of faking on personality test items and the relative merit (i.e., effect size) of response latencies for identifying fakers. We suggest that their analysis of response times was suboptimal, and that it followed neither from a congruence model of faking nor from published protocols on appropriately filtering the noise in personality test item answering times. Using new data and following recommended analytic procedures, we confirmed the relative utility of response times for identifying personality test fakers, and our obtained results, again, reinforce a congruence model of faking.
Sources of Response Bias in Older Ethnic Minorities: A Case of Korean American Elderly

PubMed Central

Kim, Miyong T.; Ko, Jisook; Yoon, Hyunwoo; Kim, Kim B.; Jang, Yuri

2015-01-01

The present study was undertaken to investigate potential sources of response bias in empirical research involving older ethnic minorities and to identify prudent strategies to reduce those biases, using Korean American elderly (KAE) as an example. Data were obtained from three independent studies of KAE (N=1,297; age ≥60) in three states (Florida, New York, and Maryland) from 2000 to 2008. Two common measures, Pearlin’s Mastery Scale and the CES-D scale, were selected for a series of psychometric tests based on classical measurement theory. Survey items were analyzed in depth, using psychometric properties generated from both exploratory factor analysis and confirmatory factor analysis as well as correlational analysis. Two types of potential sources of bias were identified as the most significant contributors to increases in error variances for these psychological instruments. Error variances were most prominent when (1) items were not presented in a manner that was culturally or contextually congruent with respect to the target population and/or (2) the response anchors for items were mixed (e.g., positive vs. negative). The systemic patterns and magnitudes of the biases were also cross-validated for the three studies. The results demonstrate sources and impacts of measurement biases in studies of older ethnic minorities. The identified response biases highlight the need for re-evaluation of current measurement practices, which are based on traditional recommendations that response anchors should be mixed or that the original wording of instruments should be rigidly followed. Specifically, systematic guidelines for accommodating cultural and contextual backgrounds into instrument design are warranted. PMID:26049971
Sources of Response Bias in Older Ethnic Minorities: A Case of Korean American Elderly.

PubMed

Kim, Miyong T; Lee, Ju-Young; Ko, Jisook; Yoon, Hyunwoo; Kim, Kim B; Jang, Yuri

2015-09-01

The present study was undertaken to investigate potential sources of response bias in empirical research involving older ethnic minorities and to identify prudent strategies to reduce those biases, using Korean American elderly (KAE) as an example. Data were obtained from three independent studies of KAE (N = 1,297; age ≥60) in three states (Florida, New York, and Maryland) from 2000 to 2008. Two common measures, Pearlin's Mastery Scale and the CES-D scale, were selected for a series of psychometric tests based on classical measurement theory. Survey items were analyzed in depth, using psychometric properties generated from both exploratory factor analysis and confirmatory factor analysis as well as correlational analysis. Two types of potential sources of bias were identified as the most significant contributors to increases in error variances for these psychological instruments. Error variances were most prominent when (1) items were not presented in a manner that was culturally or contextually congruent with respect to the target population and/or (2) the response anchors for items were mixed (e.g., positive vs. negative). The systemic patterns and magnitudes of the biases were also cross-validated for the three studies. The results demonstrate sources and impacts of measurement biases in studies of older ethnic minorities. The identified response biases highlight the need for re-evaluation of current measurement practices, which are based on traditional recommendations that response anchors should be mixed or that the original wording of instruments should be rigidly followed. Specifically, systematic guidelines for accommodating cultural and contextual backgrounds into instrument design are warranted.
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
Writing, Evaluating and Assessing Data Response Items in Economics.

ERIC Educational Resources Information Center

Trotman-Dickenson, D. I.

1989-01-01

Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores

ERIC Educational Resources Information Center

Johnson, Timothy R.

2013-01-01

One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
Validation of Physics Standardized Test Items

NASA Astrophysics Data System (ADS)

Marshall, Jill

2008-10-01

The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

ERIC Educational Resources Information Center

Polak, Marike; De Rooij, Mark; Heiser, Willem J.

2012-01-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Advancing the efficiency and efficacy of patient reported outcomes with multivariate computer adaptive testing.

PubMed

Morris, Scott; Bass, Mike; Lee, Mirinae; Neapolitan, Richard E

2017-09-01

The Patient Reported Outcomes Measurement Information System (PROMIS) initiative developed an array of patient reported outcome (PRO) measures. To reduce the number of questions administered, PROMIS utilizes unidimensional item response theory and unidimensional computer adaptive testing (UCAT), which means a separate set of questions is administered for each measured trait. Multidimensional item response theory (MIRT) and multidimensional computer adaptive testing (MCAT) simultaneously assess correlated traits. The objective was to investigate the extent to which MCAT reduces patient burden relative to UCAT in the case of PROs. One MIRT and 3 unidimensional item response theory models were developed using the related traits anxiety, depression, and anger. Using these models, MCAT and UCAT performance was compared with simulated individuals. Surprisingly, the root mean squared error for both methods increased with the number of items. These results were driven by large errors for individuals with low trait levels. A second analysis focused on individuals aligned with item content. For these individuals, both MCAT and UCAT accuracies improved with additional items. Furthermore, MCAT reduced the test length by 50%. For the PROMIS Emotional Distress banks, neither UCAT nor MCAT provided accurate estimates for individuals at low trait levels. Because the items in these banks were designed to detect clinical levels of distress, there is little information for individuals with low trait values. However, trait estimates for individuals targeted by the banks were accurate and MCAT asked substantially fewer questions. By reducing the number of items administered, MCAT can allow clinicians and researchers to assess a wider range of PROs with less patient burden. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
A Spanish-Language Risk Perception Survey for Developing Diabetes: Translation Process and Assessment of Psychometric Properties.

PubMed

Joiner, Kevin L; Sternberg, Rosa Maria; Kennedy, Christine; Chen, Jyu-Lin; Fukuoka, Yoshimi; Janson, Susan L

2016-12-01

Create a Spanish-language version of the Risk Perception Survey for Developing Diabetes (RPS-DD) and assess psychometric properties. The Spanish-language version was created through translation, harmonization, and presentation to the tool's original author. It was field tested in a foreignborn Latino sample and properties evaluated in principal components analysis. Personal Control, Optimistic Bias, and Worry multi-item Likert subscale responses did not cluster together. A clean solution was obtained after removing two Personal Control subscale items. Neither the Personal Disease Risk scale nor the Environmental Health Risk scale responses loaded onto single factors. Reliabilities ranged from .54 to .88. Test of knowledge performance varied by item. This study contributes to evidence of validation of a Spanish-language RPS-DD in foreign-born Latinos.
Have a little faith: measuring the impact of illness on positive and negative aspects of faith.

PubMed

Salsman, John M; Garcia, Sofia F; Lai, Jin-Shei; Cella, David

2012-12-01

The importance of faith and its associations with health are well documented. As part of the Patient Reported Outcomes Measurement Information System, items tapping positive and negative impact of illness (PII and NII) were developed across four content domains: Coping/Stress Response, Self-Concept, Social Connection/Isolation, and Meaning and Spirituality. Faith items were included within the concept of meaning and spirituality. This measurement model was tested on a heterogeneous group of 509 cancer survivors. To evaluate dimensionality, we applied two bi-factor models, specifying a general factor (PII or NII) and four local factors: Coping/Stress Response, Self-Concept, Social Connection/Isolation, and Meaning and Spirituality. Bi-factor analysis supported sufficient unidimensionality within PII and NII item sets. The unidimensionality of both PII and NII item sets was enhanced by extraction of the faith items from the rest of the questions. Of the 10 faith items, nine demonstrated higher local than general factor loadings (range for local factor loadings = 0.402 to 0.876), suggesting utility as a separate but related 'faith' factor. The same was true for only two of the remaining 63 items across the PII and NII item sets. Although conceptually and to a degree empirically related to Meaning and Spirituality, Faith appears to be a distinct subdomain of PII and NII, better handled by distinct assessment. A 10-item measure of the impact of illness upon faith (II-Faith) was therefore assembled. Copyright © 2011 John Wiley & Sons, Ltd.
RhinAsthma patient perspective: A Rasch validation study.

PubMed

Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara

2018-02-01

In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.
The Interpretation of Scholars' Interpretations of Confidence Intervals: Criticism, Replication, and Extension of Hoekstra et al. (2014)

PubMed Central

García-Pérez, Miguel A.; Alcalá-Quintana, Rocío

2016-01-01

Hoekstra et al. (Psychonomic Bulletin & Review, 2014, 21:1157–1164) surveyed the interpretation of confidence intervals (CIs) by first-year students, master students, and researchers with six items expressing misinterpretations of CIs. They asked respondents to answer all items, computed the number of items endorsed, and concluded that misinterpretation of CIs is robust across groups. Their design may have produced this outcome artifactually for reasons that we describe. This paper discusses first the two interpretations of CIs and, hence, why misinterpretation cannot be inferred from endorsement of some of the items. Next, a re-analysis of Hoekstra et al.'s data reveals some puzzling differences between first-year and master students that demand further investigation. For that purpose, we designed a replication study with an extended questionnaire including two additional items that express correct interpretations of CIs (to compare endorsement of correct vs. nominally incorrect interpretations) and we asked master students to indicate which items they would have omitted had they had the option (to distinguish deliberate from uninformed endorsement caused by the forced-response format). Results showed that incognizant first-year students endorsed correct and nominally incorrect items identically, revealing that the two item types are not differentially attractive superficially; in contrast, master students were distinctively more prone to endorsing correct items when their uninformed responses were removed, although they admitted to nescience more often that might have been expected. Implications for teaching practices are discussed. PMID:27458424
Preliminary development and psychometric evaluation of an unmet needs measure for adolescents and young adults with cancer: the Cancer Needs Questionnaire - Young People (CNQ-YP).

PubMed

Clinton-McHarg, Tara; Carey, Mariko; Sanson-Fisher, Rob; D'Este, Catherine; Shakeshaft, Anthony

2012-01-30

Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken.
Preliminary development and psychometric evaluation of an unmet needs measure for adolescents and young adults with cancer: the Cancer Needs Questionnaire - Young People (CNQ-YP)

PubMed Central

2012-01-01

Background Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Methods Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. Results The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. Conclusions The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken. PMID:22284545
Methodology for the development and calibration of the SCI-QOL item banks

PubMed Central

Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David

2015-01-01

Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.

PubMed

Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

2015-05-01

To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.

The Cognitive-Miser Response Model: Testing for Intuitive and Deliberate Reasoning

ERIC Educational Resources Information Center

Bockenholt, Ulf

2012-01-01

In a number of psychological studies, answers to reasoning vignettes have been shown to result from both intuitive and deliberate response processes. This paper utilizes a psychometric model to separate these two response tendencies. An experimental application shows that the proposed model facilitates the analysis of dual-process item responses…
An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis.

PubMed

Tarrant, Marie; Ware, James; Mohammed, Ahmed M

2009-07-07

Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
Support for an auto-associative model of spoken cued recall: evidence from fMRI.

PubMed

de Zubicaray, Greig; McMahon, Katie; Eastburn, Mathew; Pringle, Alan J; Lorenz, Lina; Humphreys, Michael S

2007-03-02

Cued recall and item recognition are considered the standard episodic memory retrieval tasks. However, only the neural correlates of the latter have been studied in detail with fMRI. Using an event-related fMRI experimental design that permits spoken responses, we tested hypotheses from an auto-associative model of cued recall and item recognition [Chappell, M., & Humphreys, M. S. (1994). An auto-associative neural network for sparse representations: Analysis and application to models of recognition and cued recall. Psychological Review, 101, 103-128]. In brief, the model assumes that cues elicit a network of phonological short term memory (STM) and semantic long term memory (LTM) representations distributed throughout the neocortex as patterns of sparse activations. This information is transferred to the hippocampus which converges upon the item closest to a stored pattern and outputs a response. Word pairs were learned from a study list, with one member of the pair serving as the cue at test. Unstudied words were also intermingled at test in order to provide an analogue of yes/no recognition tasks. Compared to incorrectly rejected studied items (misses) and correctly rejected (CR) unstudied items, correctly recalled items (hits) elicited increased responses in the left hippocampus and neocortical regions including the left inferior prefrontal cortex (LIPC), left mid lateral temporal cortex and inferior parietal cortex, consistent with predictions from the model. This network was very similar to that observed in yes/no recognition studies, supporting proposals that cued recall and item recognition involve common rather than separate mechanisms.
An item response theory analysis of the Executive Interview and development of the EXIT8: A Project FRONTIER Study.

PubMed

Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E

2015-01-01

The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Rasch analysis and impact factor methods both yield valid and comparable measures of health status in interstitial lung disease.

PubMed

Patel, Amit S; Siegert, Richard J; Bajwah, Sabrina; Brignall, Kate; Gosker, Harry R; Moxham, John; Maher, Toby M; Renzoni, Elisabetta A; Wells, Athol U; Higginson, Irene J; Birring, Surinder S

2015-09-01

Rasch analysis has largely replaced impact factor methodology for developing health status measures. The aim of this study was to develop a health status questionnaire for patients with interstitial lung disease (ILD) using impact factor methodology and to compare its validity with that of another version developed using Rasch analysis. A preliminary 71-item questionnaire was developed and evaluated in 173 patients with ILD. Items were reduced by the impact factor method (King's Brief ILD questionnaire, KBILD-I) and Rasch analysis (KBILD-R). Both questionnaires were validated by assessing their relationship with forced vital capacity (FVC) and St Georges Respiratory Questionnaire (SGRQ) and by evaluating internal reliability, repeatability, and longitudinal responsiveness. The KBILD-R and KBILD-I comprised 15 items each. The content of eight items differed between the KBILD-R and KBILD-I. Internal and test-retest reliability was good for total scores of both questionnaires. There was a good relationship with SGRQ and moderate relationship with FVC for both questionnaires. Effect sizes were comparable. Both questionnaires discriminated patients with differing disease severity. Despite considerable differences in the content of retained items, both KBILD-R and KBILD-I questionnaires demonstrated acceptable measurement properties and performed comparably in a clinical setting. Copyright © 2015 Elsevier Inc. All rights reserved.
Prevalence of responsible hospitality policies in licensed premises that are associated with alcohol-related harm.

PubMed

Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J

2002-06-01

This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
A Rasch-validated version of the upper extremity functional index for interval-level measurement of upper extremity function.

PubMed

Hamilton, Clayon B; Chesworth, Bert M

2013-11-01

The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0-100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.
A Rasch-Validated Version of the Upper Extremity Functional Index for Interval-Level Measurement of Upper Extremity Function

PubMed Central

Chesworth, Bert M.

2013-01-01

Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity. PMID:23813086
Item Response Models for Local Dependence among Multiple Ratings

ERIC Educational Resources Information Center

Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan

2014-01-01

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Lawton IADL scale in dementia: can item response theory make it more informative?

PubMed

McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

2014-07-01

impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Analysis of single items on the Self-Esteem and Relationship questionnaire in men treated with sildenafil citrate for erectile dysfunction: results of two double-blind, placebo-controlled trials.

PubMed

Cappelleri, Joseph C; Althof, Stanley E; O'Leary, Michael P; Tseng, Li-Jung

2008-04-01

To evaluate the effect of sildenafil citrate on each item of the 14-item Self-Esteem And Relationship (SEAR) questionnaire, which is used to measure self-esteem, confidence, satisfaction with sexual relationship, and overall relationship satisfaction in men with erectile dysfunction (ED). Data were combined from two 12-week, double-blind, placebo-controlled, flexible-dose sildenafil trials having identical protocols, one conducted in the USA and the other in Mexico, Brazil, Australia and Japan. All men had ED and were aged >or=18 years. Response categories of each SEAR item used a 4-week reference period and were based on a five-point scale (1, almost never/never; 2, a few times; 3, sometimes; 4, most times; 5, almost always/always). The difference (sildenafil vs placebo) in the change from baseline to week 12 was evaluated with a Wilcoxon rank sum test using ridit analysis, and an analysis of covariance model that included treatment group, centre, study and baseline item score. Compared with the 274 patients receiving placebo, the 279 receiving sildenafil reported significantly greater mean and median improvements (P < 0.001) in each of the 14 SEAR items. The probability of increased psychosocial benefit from baseline to week 12 was higher with sildenafil for each SEAR item, and ranged from 0.60 ('My partner was unhappy with the quality of our sexual relations'[item reverse-scored]) to 0.72 ('I was satisfied with my sexual performance'). Across all items, the mean (sd) probability was 0.67 (0.04) that a randomly selected patient in the sildenafil group would have a more favourable change relative to a randomly selected patient in the placebo group. Sildenafil produced substantial and meaningful improvements at the item-specific level. This analysis complements previously published work on self-esteem, confidence and relationship satisfaction.
Health- and vision-related quality of life in intellectually disabled children.

PubMed

Cui, Yu; Stapleton, Fiona; Suttle, Catherine; Bundy, Anita

2010-01-01

To investigate the psychometric properties of instruments for the assessment of self-reported functional vision performance and health-related quality of life in children with intellectual disabilities (IDs). Two instruments [Autoquestionnaire Enfant Image (AUQUEI), LV Prasad-Functional Vision Questionnaire (LVP-FVQ)] designed for the assessment of functional vision and health-related quality of life were adapted and administered to 168 school children with ID, aged 8 to 18 years. Rasch analysis was used to determine the appropriateness of the rating scales of these instruments and to identify any redundant items. Redundant items were excluded based on descriptive statistics and Rasch analysis, leaving 17 of 23 items in the revised AUQUEI and 16 of 22 in the LVP-FVQ. The AUQUEI items showed disordered thresholds on the rating scale. A modified step calibration (collapsed from four categories to three categories) resulted in ordered response thresholds for all items. The adjusted instrument produced an overall fit to the model (mean item infit = 1.06, SD = 0.32; mean item outfit = 1.11, SD = 0.35), indicating good construct validity. After Rasch analysis, the AUQUEI showed good content validity (person separation = 2.18; item reliability = 0.99; Cronbach alpha = 0.89). Increased similarity of person and item means and SDs on the logit scale after modification would indicate that the instrument was more applicable to the target population in its modified form. In contrast, the LVP-FVQ had a low person separation (1.35), suggesting that a more appropriate instrument is needed for assessment of vision-related quality of life in children with ID. The psychometric properties of two instruments were explored using Rasch analysis. By rescaling and reduction of items, the instruments were modified for use in a population of children with at least mild to moderate ID. However, an alternative instrument is needed for the assessment of vision-related quality of life in intellectually disabled children with normal vision or mild visual abnormalities.
Assessing psychological well-being: self-report instruments for the NIH Toolbox.

PubMed

Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David

2014-02-01

Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
Evaluation of five guidelines for option development in multiple-choice item-writing.

PubMed

Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

2009-05-01

This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Intelligent topical sentiment analysis for the classification of e-learners and their topics of interest.

PubMed

Ravichandran, M; Kulanthaivel, G; Chellatamilan, T

2015-01-01

Every day, huge numbers of instant tweets (messages) are published on Twitter as it is one of the massive social media for e-learners interactions. The options regarding various interesting topics to be studied are discussed among the learners and teachers through the capture of ideal sources in Twitter. The common sentiment behavior towards these topics is received through the massive number of instant messages about them. In this paper, rather than using the opinion polarity of each message relevant to the topic, authors focus on sentence level opinion classification upon using the unsupervised algorithm named bigram item response theory (BIRT). It differs from the traditional classification and document level classification algorithm. The investigation illustrated in this paper is of threefold which are listed as follows: (1) lexicon based sentiment polarity of tweet messages; (2) the bigram cooccurrence relationship using naïve Bayesian; (3) the bigram item response theory (BIRT) on various topics. It has been proposed that a model using item response theory is constructed for topical classification inference. The performance has been improved remarkably using this bigram item response theory when compared with other supervised algorithms. The experiment has been conducted on a real life dataset containing different set of tweets and topics.
Introduction to bifactor polytomous item response theory analysis.

PubMed

Toland, Michael D; Sulis, Isabella; Giambona, Francesca; Porcu, Mariano; Campbell, Jonathan M

2017-02-01

A bifactor item response theory model can be used to aid in the interpretation of the dimensionality of a multifaceted questionnaire that assumes continuous latent variables underlying the propensity to respond to items. This model can be used to describe the locations of people on a general continuous latent variable as well as on continuous orthogonal specific traits that characterize responses to groups of items. The bifactor graded response (bifac-GR) model is presented in contrast to a correlated traits (or multidimensional GR model) and unidimensional GR model. Bifac-GR model specification, assumptions, estimation, and interpretation are demonstrated with a reanalysis of data (Campbell, 2008) on the Shared Activities Questionnaire. We also show the importance of marginalizing the slopes for interpretation purposes and we extend the concept to the interpretation of the information function. To go along with the illustrative example analyses, we have made available supplementary files that include command file (syntax) examples and outputs from flexMIRT, IRTPRO, R, Mplus, and STATA. Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jsp.2016.11.001. Data needed to reproduce analyses in this article are available as supplemental materials (online only) in the Appendix of this article. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Investigating burden of informal caregivers in England, Finland and Greece: an analysis with the short form of the Burden Scale for Family Caregivers (BSFC-s).

PubMed

Konerding, Uwe; Bowen, Tom; Forte, Paul; Karampli, Eleftheria; Malmström, Tomi; Pavi, Elpida; Torkki, Paulus; Graessel, Elmar

2018-02-01

The burden of informal caregivers might show itself in different ways in different cultures. Understanding these differences is important for developing culture-specific measures aimed at alleviating caregiver burden. Hitherto, no findings regarding such cultural differences between different European countries were available. In this paper, differences between English, Finnish and Greek informal caregivers of people with dementia are investigated. A secondary analysis was performed with data from 36 English, 42 Finnish and 46 Greek caregivers obtained with the short form of the Burden Scale for Family Caregivers (BSFC-s). The probabilities of endorsing the BSFC-s items were investigated by computing a logit model with items and countries as categorical factors. Statistically significant deviation of data from this model was taken as evidence for country-specific response patterns. The two-factorial logit model explains the responses to the items quite well (McFadden's pseudo-R-square: 0.77). There are, however, also statistically significant deviations (p < 0.05). English caregivers have a stronger tendency to endorse items addressing impairments in individual well-being; Finnish caregivers have a stronger tendency to endorse items addressing the conflict between the demands resulting from care and demands resulting from the remaining social life and Greek caregivers have a stronger tendency to endorse items addressing impairments in physical health. Caregiver burden shows itself differently in English, Finnish and Greek caregivers. Accordingly, measures for alleviating caregiver burden in these three countries should address different aspects of the caregivers' lives.
Attitude of dental hygienists, general practitioners and periodontists towards preventive oral care: an exploratory study.

PubMed

Thevissen, Eric; De Bruyn, Hugo; Colman, Roos; Koole, Sebastiaan

2017-08-01

Promoting oral hygiene and stimulating patient's responsibility for his/her personal health remain challenging objectives. The presence of dental hygienists has led to delegation of preventive tasks. However, in some countries, such as Belgium, this profession is not yet legalized. The aim of this exploratory study was to compare the attitude towards oral-hygiene instructions and patient motivational actions by dental hygienists and by general practitioners/periodontists in a context without dental hygienists. A questionnaire on demographics (six items), oral-hygiene instructions (eight items) and patient motivational actions (six items) was distributed to 241 Dutch dental hygienists, 692 general practitioners and 32 periodontists in Flanders/Belgium. Statistical analysis included Fisher's exact-test, Pearson's chi-square test and multiple (multinomial) logistic regression analysis to observe the influence of profession, age, workload, practice area and chair-assistance. Significant variance was found between general practitioners and dental hygienists (in 13 of 14 items), between general practitioners and periodontists (in nine of 14 items) and between dental hygienists and periodontists (in five of 14 items). In addition to qualification, chair-assistance was also identified as affecting the attitude towards preventive oral care. The present study identified divergence in the application of, and experienced barriers and opinions about, oral-hygiene instructions and patient motivational actions between dental hygienists and general practitioners/periodontists in a context without dental hygienists. In response to the barriers reported it is suggested that preventive oriented care may benefit from the deployment of dental hygienists to increase access to qualified preventive oral care. © 2017 FDI World Dental Federation.
Using item response theory to address vulnerabilities in FFQ.

PubMed

Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A

2017-09-01

The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
Construct Validity of the Spanish Versions of the Memorial Symptom Assessment Scale Short Form and Condensed Form: Rasch Analysis of Responses in Oncology Outpatients.

PubMed

Llamas-Ramos, Inés; Llamas-Ramos, Rocío; Buz, José; Cortés-Rodríguez, María; Martín-Nogueras, Ana María

2018-06-01

The Memorial Symptom Assessment Scale (MSAS) is a self-rating instrument for the assessment of symptom distress in cancer patients. The Spanish version of the MSAS has recently been validated. However, we lack evidence of the internal construct validity of the shorter versions (short form [MSAS-SF] and condensed form [CMSAS]). In addition, rigorous testing of these scales with modern psychometric methods is needed. The aim of this study was to evaluate the internal construct validity and reliability of the Spanish versions of the MSAS-SF and CMSAS in oncology outpatients using Rasch analysis. Data from a convenience sample of oncology outpatients receiving chemotherapy (n = 306; mean age 60 years; 63% women) at a university hospital were analyzed. The Rasch unidimensional measurement model was used to examine response category functioning, item hierarchy, targeting, unidimensionality, reliability, and differential item functioning by age, gender, and marital status. The response category structure of the symptom distress items was improved by collapsing two categories. The scales were adequately targeted to the study patients, showed overall Rasch model fit (mean Infit MnSq ranged from 0.98 to 1.05), met criteria for unidimensionality, and the reliability of scores was good (person reliability > 0.80), except for the CMSAS prevalence scale. Only four items showed differential item functioning. The present study demonstrated that the Spanish versions of the MSAS-SF and CMSAS have adequate psychometric properties to evaluate symptom distress in oncology outpatients. Additional studies of the CMSAS are recommended. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

ECT Has Greater Efficacy Than Fluoxetine in Alleviating the Burden of Illness for Patients with Major Depressive Disorder: A Taiwanese Pooled Analysis

PubMed Central

Huang, Chun-Jen; Chen, Cheng-Chung

2018-01-01

Abstract Background The burden of major depressive disorder includes suffering due to symptom severity, functional impairment, and quality of life deficits. The aim of this study was to compare the differences between electroconvulsive therapy and pharmacotherapy in reducing such burdens. Methods This was a pooled analysis study including 2 open-label trials for major depressive disorder inpatients receiving either standard bitemporal and modified electroconvulsive therapy with a maximum of 12 sessions or 20 mg/d of fluoxetine for 6 weeks. Symptom severity, functioning, and quality of life were assessed using the 17-item Hamilton Rating Scale for Depression, the Modified Work and Social Adjustment Scale, and SF-36. Side effects following treatment, including subjective memory impairment, nausea/vomiting, and headache, were recorded. The differences between these 2 groups in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, quality of life, side effects, and time to response (at least a 50% reduction of 17-item Hamilton Rating Scale for Depression) and remission (17-item Hamilton Rating Scale for Depression ≤7) following treatment were analyzed. Results Electroconvulsive therapy (n=116) showed a significantly greater reduction in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, and quality of life deficits and had significantly shorter time to response/remission than fluoxetine (n=126). However, the electroconvulsive therapy group was more likely to experience subjective memory impairment and headache. Conclusions Compared with fluoxetine, electroconvulsive therapy was more effective in alleviating the burden of major depressive disorder and had a substantially increased speed of response/remission in the acute phase. Increased education and information about electroconvulsive therapy for clinicians, patients, and their families and the general public is warranted. PMID:29228200
Development of life story experience (LSE) scales for migrant dentists in Australia: a sequential qualitative-quantitative study.

PubMed

Balasubramanian, M; Spencer, A J; Short, S D; Watkins, K; Chrisopoulos, S; Brennan, D S

2016-09-01

The integration of qualitative and quantitative approaches introduces new avenues to bridge strengths, and address weaknesses of both methods. To develop measure(s) for migrant dentist experiences in Australia through a mixed methods approach. The sequential qualitative-quantitative design involved first the harvesting of data items from qualitative study, followed by a national survey of migrant dentists in Australia. Statements representing unique experiences in migrant dentists' life stories were deployed the survey questionnaire, using a five-point Likert scale. Factor analysis was used to examine component factors. Eighty-two statements from 51 participants were harvested from the qualitative analysis. A total of 1,022 of 1,977 migrant dentists (response rate 54.5%) returned completed questionnaires. Factor analysis supported an initial eight-factor solution; further scale development and reliability analysis led to five scales with a final list of 38 life story experience (LSE) items. Three scales were based on home country events: health system and general lifestyle concerns (LSE1; 10 items), society and culture (LSE4; 4 items) and career development (LSE5; 4 items). Two scales included migrant experiences in Australia: appreciation towards Australian way of life (LSE2; 13 items) and settlement concerns (LSE3; 7 items). The five life story experience scales provided necessary conceptual clarity and empirical grounding to explore migrant dentist experiences in Australia. Being based on original migrant dentist narrations, these scales have the potential to offer in-depth insights for policy makers and support future research on dentist migration. Copyright© 2016 Dennis Barber Ltd
[Development of the role scale for municipal supervising public health nurses].

PubMed

Hatono, Yoko; Suzuki, Hiroko; Masaki, Naoko

2013-05-01

As public health nurses are becoming increasingly decentralized in municipalities, recommendations for allocating supervising public health nurses are being made. This study aimed to develop a scale for measuring the implementation of role of municipal supervising public health nurses and to test its reliability and validity. Scale items were developed using results of a qualitative inductive analysis of interview data, and the items were then revised following an examination of content validity by experts, resulting in a provisional scale of 17 items. A self-administered, written questionnaire was then completed by supervising public health nurses or public health nurses holding the most senior positions in all municipalities nationwide, with the exception of three prefectures in the Tohoku region (total 1,621 locations). In total, 1,036 responses were received, and 931 were used for analysis (valid response rate = 57.4%). Of these, 406 were completed by supervising public health nurses. After deleting one item as a result of item analysis and conducting principal component analysis, factor analysis was conducted using the major factor method and Promax rotation. One item with high loading on multiple factors was deleted, resulting in a scale comprising 15 items and 3 factors. The cumulative contribution ratio was 56.10%. The three factors were labeled "Promotion of health activities across the whole locality," "Coordination as a PHN role leader," and "Development of the skills of public health nurses". The reliability coefficient of the RMSP (Role Scale for Municipal Supervising Public Health Nurses) as a whole was 0.84 using the split-half method (Spearman-Brown formula) and 0.91 using Cronbach's alpha, confirming internal consistency. In terms of validity, an examination was conducted of the correlation of two RMSP scale scores (strength of awareness of role as a supervising public health nurse and confidence as a supervising public health nurse) and scores on existing scales assessing management abilities, and a significant correlation (P < 0.01) was obtained. Additionally, a comparison of the RMSP scores of decentralized local public health nurses according to rank and years of service in areas where there were no supervising public health nurses with the RMSP scores of supervising public health nurses showed that the scores of supervising public health nurses were higher. The developed scale was found to be reliable and valid for measuring the implementation of supervising public health nurses' role.
A Multidimensional Ideal Point Item Response Theory Model for Binary Data

ERIC Educational Resources Information Center

Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.

2006-01-01

We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda.

PubMed

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-12-02

The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda

PubMed Central

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-01-01

Background The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. Methods A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. Results The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. Conclusion This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda. PMID:19055716
Rasch analyses of the Activities-specific Balance Confidence Scale with individuals 50 years and older with lower limb amputations

PubMed Central

Sakakibara, Brodie M.; Miller, William C.; Backman, Catherine L.

2012-01-01

Objective To explore shortened response formats for use with the Activities-specific Balance Confidence scale and then: 1) evaluate the unidimensionality of the scale; 2) evaluate the item difficulty; 3) evaluate the scale for redundancy and content gaps; and 4) evaluate the item standard error of measurement (SEM) and internal consistency reliability among aging individuals (≥50 years) with a lower-limb amputation living in the community. Design Secondary analysis of cross-sectional survey and chart review data. Setting Out-patient amputee clinics, Ontario, Canada. Participants Four hundred forty eight community living adults, at least 50 years old (mean = 68 years), who have used a prosthesis for at least 6 months for a major unilateral lower limb amputation. Three hundred twenty five (72.5%) were men. Intervention N/a Main Outcome Measure(s) Activities-specific Balance Confidence Scale. Results A 5-option response format outperformed 4- and 6-option formats. Factor analyses confirmed a unidimensional scale. The distance between response options is not the same for all items on the scale, evident by the Partial Credit Model (PCM) having a better fit to the data than the Rating Scale Model. Two items, however, did not fit the PCM within statistical reason. Revising the wording of the two items may resolve the misfit, and improve the construct validity and lower the SEM. Overall, the difficulty of the scale’s items is appropriate for use with aging individuals with lower-limb amputation, and is most reliable (Cronbach ∝ = 0.94) for use with individuals with moderately low balance confidence levels. Conclusions The ABC-scale with a simplified 5-option response format is a valid and reliable measure of balance confidence for use with individuals aging with a lower limb amputation. PMID:21704978
Retrospectively exploring the importance of items in the decision to leave the emergency medical services (EMS) profession and their relationships to life satisfaction after leaving EMS and likelihood of returning to EMS.

PubMed

Blau, Gary; Chapman, Susan

2011-01-01

An exit survey was returned by a sample of 127 respondents in fully compensated positions who left the EMS profession, most within 12 months prior to filling out the exit survey. A very high percentage continued to work after leaving EMS. Respondents were asked to rate the importance of each of 17 items in affecting their decision to leave EMS. A higher than anticipated response to a "not applicable" response choice affected the usability of 8 of these items. Nine of the 17 items had at least 65 useable responses and were used for further analysis. Within these 9, stress/burnout and lack of job challenges had the highest importance in affecting the decision to leave EMS, while desire for better pay and benefits had the lowest importance. Desire for career change was positively related to life satisfaction after leaving EMS and negatively related to likelihood of returning to EMS. Stress/burnout was positively related to life satisfaction after leaving EMS. Study limitations and future research issues are briefly discussed.
Factor analysis of responses to the Irrational Beliefs Scale in a sample of Iraqi university students.

PubMed

Hassan, Namir; Ismail, Hairul Nizam

2004-06-01

In a study of irrational beliefs within a university population, 282 male and 238 female students responded to the 33-item Students' Irrational Beliefs Scale, and their responses were factor analyzed. Analysis suggested six dimensions could explain 39.5% of the variance. These dimensions were Perfectionism, Negativism, Blame Proneness, Escapism, Anxious Over Concern, and Absolute Demands.
Reliability and validity evidence of the Assessment of Language Use in Social Contexts for Adults (ALUSCA).

PubMed

Valente, Ana Rita S; Hall, Andreia; Alvelos, Helena; Leahy, Margaret; Jesus, Luis M T

2018-04-12

The appropriate use of language in context depends on the speaker's pragmatic language competencies. A coding system was used to develop a specific and adult-focused self-administered questionnaire to adults who stutter and adults who do not stutter, The Assessment of Language Use in Social Contexts for Adults, with three categories: precursors, basic exchanges, and extended literal/non-literal discourse. This paper presents the content validity, item analysis, reliability coefficients and evidences of construct validity of the instrument. Content validity analysis was based on a two-stage process: first, 11 pragmatic questionnaires were assessed to identify items that probe each pragmatic competency and to create the first version of the instrument; second, items were assessed qualitatively by an expert panel composed by adults who stutter and controls, and quantitatively and qualitatively by an expert panel composed by clinicians. A pilot study was conducted with five adults who stutter and five controls to analyse items and calculate reliability. Construct validity evidences were obtained using the hypothesized relationships method and factor analysis with 28 adults who stutter and 28 controls. Concerning content validity, the questionnaires assessed up to 13 pragmatic competencies. Qualitative and quantitative analysis revealed ambiguities in items construction. Disagreement between experts was solved through item modification. The pilot study showed that the instrument presented internal consistency and temporal stability. Significant differences between adults who stutter and controls and different response profiles revealed the instrument's underlying construct. The instrument is reliable and presented evidences of construct validity.
Exploratory Factor Analysis of the Beck Anxiety Inventory and the Beck Depression Inventory-II in a Psychiatric Outpatient Population

PubMed Central

2018-01-01

Background To further understand the relationship between anxiety and depression, this study examined the factor structure of the combined items from two validated measures for anxiety and depression. Methods The participants were 406 patients with mixed psychiatric diagnoses including anxiety and depressive disorders from a psychiatric outpatient unit at a university-affiliated medical center. Responses of the Beck Anxiety Inventory (BAI), Beck Depression Inventory (BDI)-II, and Symptom Checklist-90-Revised (SCL-90-R) were analyzed. We conducted an exploratory factor analysis of 42 items from the BAI and BDI-II. Correlational analyses were performed between subscale scores of the SCL-90-R and factors derived from the factor analysis. Scores of individual items of the BAI and BDI-II were also compared between groups of anxiety disorder (n = 185) and depressive disorder (n = 123). Results Exploratory factor analysis revealed the following five factors explaining 56.2% of the total variance: somatic anxiety (factor 1), cognitive depression (factor 2), somatic depression (factor 3), subjective anxiety (factor 4), and autonomic anxiety (factor 5). The depression group had significantly higher scores for 12 items on the BDI while the anxiety group demonstrated higher scores for six items on the BAI. Conclusion Our results suggest that anxiety and depressive symptoms as measured by the BAI and BDI-II can be empirically differentiated and that particularly items of the cognitive domain in depression and those of physical domain in anxiety are noteworthy. PMID:29651821
Identifying the readiness of patients in implementing telemedicine in northern Louisiana for an oncology practice.

PubMed

Gurupur, Varadraj; Shettian, Kruparaj; Xu, Peixin; Hines, Scott; Desselles, Mitzi; Dhawan, Manish; Wan, Thomas Th; Raffenaud, Amanda; Anderson, Lindsey

2017-09-01

This study identified the readiness factors that may create challenges in the use of telemedicine among patients in northern Louisiana with cancer. To identify these readiness factors, the team of investigators developed 19 survey questions that were provided to the patients or to their caregivers. The team collected responses from 147 respondents from rural and urban residential backgrounds. These responses were used to identify the individuals' readiness for utilising telemedicine through factor analysis, Cronbach's alpha reliability test, analysis of variance and ordinary least squares regression. The analysis results indicated that the favourable factor (positive readiness item) had a mean value of 3.47, whereas the unfavourable factor (negative readiness item) had a mean value of 2.76. Cronbach's alpha reliability test provided an alpha value of 0.79. Overall, our study indicated a positive attitude towards the use of telemedicine in northern Louisiana.
Differential Item Functioning Analysis Using a Mixture 3-Parameter Logistic Model with a Covariate on the TIMSS 2007 Mathematics Test

ERIC Educational Resources Information Center

Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.

2015-01-01

The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…
[Cross-cultural adaptation and validation of the PROMIS Global Health scale in the Portuguese language].

PubMed

Zumpano, Camila Eugênia; Mendonça, Tânia Maria da Silva; Silva, Carlos Henrique Martins da; Correia, Helena; Arnold, Benjamin; Pinto, Rogério de Melo Costa

2017-01-23

This study aimed to perform the cross-cultural adaptation and validation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Global Health scale in the Portuguese language. The ten Global Health items were cross-culturally adapted by the method proposed in the Functional Assessment of Chronic Illness Therapy (FACIT). The instrument's final version in Portuguese was self-administered by 1,010 participants in Brazil. The scale's precision was verified by floor and ceiling effects analysis, reliability of internal consistency, and test-retest reliability. Exploratory and confirmatory factor analyses were used to assess the construct's validity and instrument's dimensionality. Calibration of the items used the Gradual Response Model proposed by Samejima. Four global items required adjustments after the pretest. Analysis of the psychometric properties showed that the Global Health scale has good reliability, with Cronbach's alpha of 0.83 and intra-class correlation of 0.89. Exploratory and confirmatory factor analyses showed good fit in the previously established two-dimensional model. The Global Physical Health and Global Mental Health scale showed good latent trait coverage according to the Gradual Response Model. The PROMIS Global Health items showed equivalence in Portuguese compared to the original version and satisfactory psychometric properties for application in clinical practice and research in the Brazilian population.
Item response theory analysis applied to the Spanish version of the Personal Outcomes Scale.

PubMed

Guàrdia-Olmos, J; Carbó-Carreté, M; Peró-Cebollero, M; Giné, C

2017-11-01

The study of measurements of quality of life (QoL) is one of the great challenges of modern psychology and psychometric approaches. This issue has greater importance when examining QoL in populations that were historically treated on the basis of their deficiency, and recently, the focus has shifted to what each person values and desires in their life, as in cases of people with intellectual disability (ID). Many studies of QoL scales applied in this area have attempted to improve the validity and reliability of their components by incorporating various sources of information to achieve consistency in the data obtained. The adaptation of the Personal Outcomes Scale (POS) in Spanish has shown excellent psychometric attributes, and its administration has three sources of information: self-assessment, practitioner and family. The study of possible congruence or incongruence of observed distributions of each item between sources is therefore essential to ensure a correct interpretation of the measure. The aim of this paper was to analyse the observed distribution of items and dimensions from the three Spanish POS information sources cited earlier, using the item response theory. We studied a sample of 529 people with ID and their respective practitioners and family member, and in each case, we analysed items and factors using Samejima's model of polytomic ordinal scales. The results indicated an important number of items with differential effects regarding sources, and in some cases, they indicated significant differences in the distribution of items, factors and sources of information. As a result of this analysis, we must affirm that the administration of the POS, considering three sources of information, was adequate overall, but a correct interpretation of the results requires that it obtain much more information to consider, as well as some specific items in specific dimensions. The overall ratings, if these comments are considered, could result in bias. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Rasch analysis of the hospital anxiety and depression scale among Chinese cataract patients.

PubMed

Lin, Xianchai; Chen, Ziyan; Jin, Ling; Gao, Wuyou; Qu, Bo; Zuo, Yajing; Liu, Rongjiao; Yu, Minbin

2017-01-01

To analyze the validity of the Hospital Anxiety and Depression Scale (HADS) among Chinese cataract population. A total of 275 participants with unilateral or bilateral cataract were recruited to complete the Chinese version of HADS. The patients' demographic and ophthalmic characteristics were documented. Rasch analysis was conducted to examine the model fit statistics, the thresholds ordering of the polytomous items, targeting, person separation index and reliability, local dependency, unidimentionality, differential item functioning (DIF) and construct validity of the HADS individual and summary measures. Rasch analysis was performed on anxiety and depression subscales as well as HADS-Total score respectively. The items of original HADS-Anxiety, HADS-Depression and HADS-Total demonstrated evidence of misfit of the Rasch model. Removing items A7 for anxiety subscale and rescoring items D14 for depression subscale significantly improved Rasch model fit. A 12-item higher order total scale with further removal of D12 was found to fit the Rasch model. The modified items had ordered response thresholds. No uniform DIF was detected, whereas notable non-uniform DIF in high-ability group was found. The revised cut-off points were given for the modified anxiety and depression subscales. The modified version of HADS with HADS-A and HADS-D as subscale and HADS-T as a higher-order measure is a reliable and valid instrument that may be useful for assessing anxiety and depression states in Chinese cataract population.
Psychometric Principles in Measurement for Geoscience Education Research: A Climate Change Example

NASA Astrophysics Data System (ADS)

Libarkin, J. C.; Gold, A. U.; Harris, S. E.; McNeal, K.; Bowles, R.

2015-12-01

Understanding learning in geoscience classrooms requires that we use valid and reliable instruments aligned with intended learning outcomes. Nearly one hundred instruments assessing conceptual understanding in undergraduate science and engineering classrooms (often called concept inventories) have been published and are actively being used to investigate learning. The techniques used to develop these instruments vary widely, often with little attention to psychometric principles of measurement. This paper will discuss the importance of using psychometric principles to design, evaluate, and revise research instruments, with particular attention to the validity and reliability steps that must be undertaken to ensure that research instruments are providing meaningful measurement. An example from a climate change inventory developed by the authors will be used to exemplify the importance of validity and reliability, including the value of item response theory for instrument development. A 24-item instrument was developed based on published items, conceptions research, and instructor experience. Rasch analysis of over 1000 responses provided evidence for the removal of 5 items for misfit and one item for potential bias as measured via differential item functioning. The resulting 18-item instrument can be considered a valid and reliable measure based on pre- and post-implementation metrics. Consideration of the relationship between respondent demographics and concept inventory scores provides unique insight into the relationship between gender, religiosity, values and climate change understanding.
World Health Assembly agendas and trends of international health issues for the last 43 years: analysis of World Health Assembly agendas between 1970 and 2012.

PubMed

Kitamura, Tomomi; Obara, Hiromi; Takashima, Yoshihiro; Takahashi, Kenzo; Inaoka, Kimiko; Nagai, Mari; Endo, Hiroyoshi; Jimba, Masamine; Sugiura, Yasuo

2013-05-01

To analyse the trends and characteristics of international health issues through agenda items of the World Health Assembly (WHA) from 1970 to 2012. Agendas in Committees A/B of the WHA were classified as Administrative or Technical and Health Matters. Agenda items of Health Matters were sorted into five categories by the WHO reform in the 65th WHA. The agenda items in each category and sub-category were counted. There were 1647 agenda items including 423 Health Matters, which were sorted into five categories: communicable diseases (107, 25.3%), health systems (81, 19.1%), noncommunicable diseases (59, 13.9%), preparedness surveillance and response (58, 13.7%), and health through the life course (36, 8.5%). Among the sub-categories, HIV/AIDS, noncommunicable diseases in general, health for all, millennium development goals, influenza, and international health regulations, were discussed frequently and appeared associated with the public health milestones, but maternal and child health were discussed three times. The number of the agenda items differed for each Director-General's term of office. The WHA agendas cover a variety of items, but not always reflect international health issues in terms of disease burden. The Member States of WHO should take their responsive roles in proposing more balanced agenda items. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
[The application of diminished criminal responsibility rating scale to mental retardation offenders].

PubMed

Guan, Wei; Cai, Wei-Xiong; Huang, Fu-Yin; Wu, Jia-Sheng

2009-10-01

To explore the application of Diminished Criminal Responsibility Rating Scale (DCRRS) to mental retardation offenders. The DCRRS was used to 121 cases of mental retardation offenders who were divided into three groups according to the degree of their diminished criminal responsibility. There were significant differences in rating score among the three groups (mild group 22.12+/-4.69, moderate group 25.50+/-5.48, major group 27.59+/-5.69), and 17 items had good correlation with the total score of the scale with the correlation coefficient from 0.289 to 0.665. Six factors were extracted by the factor analysis, and 69.392% variation could be explained. The DCRRS has rational items, its total score could show the difference among the three degree diminished criminal responsibility of mental retardation offenders.
A Two-Decision Model for Responses to Likert-Type Items

ERIC Educational Resources Information Center

Thissen-Roe, Anne; Thissen, David

2013-01-01

Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…

A novel nonparametric item response theory approach to measuring socioeconomic position: a comparison using household expenditure data from a Vietnam health survey, 2003

PubMed Central

2014-01-01

Background Measures of household socio-economic position (SEP) are widely used in health research. There exist a number of approaches to their measurement, with Principal Components Analysis (PCA) applied to a basket of household assets being one of the most common. PCA, however, carries a number of assumptions about the distribution of the data which may be untenable, and alternative, non-parametric, approaches may be preferred. Mokken scale analysis is a non-parametric, item response theory approach to scale development which appears never to have been applied to household asset data. A Mokken scale can be used to rank order items (measures of wealth) as well as households. Using data on household asset ownership from a national sample of 4,154 consenting households in the World Health Survey from Vietnam, 2003, we construct two measures of household SEP. Seventeen items asking about assets, and utility and infrastructure use were used. Mokken Scaling and PCA were applied to the data. A single item measure of total household expenditure is used as a point of contrast. Results An 11 item scale, out of the 17 items, was identified that conformed to the assumptions of a Mokken Scale. All the items in the scale were identified as strong items (Hi > .5). Two PCA measures of SEP were developed as a point of contrast. One PCA measure was developed using all 17 available asset items, the other used the reduced set of 11 items identified in the Mokken scale analaysis. The Mokken Scale measure of SEP and the 17 item PCA measure had a very high correlation (r = .98), and they both correlated moderately with total household expenditure: r = .59 and r = .57 respectively. In contrast the 11 item PCA measure correlated moderately with the Mokken scale (r = .68), and weakly with the total household expenditure (r = .18). Conclusion The Mokken scale measure of household SEP performed at least as well as PCA, and outperformed the PCA measure developed with the 11 items used in the Mokken scale. Unlike PCA, Mokken scaling carries no assumptions about the underlying shape of the distribution of the data, and can be used simultaneous to order household SEP and items. The approach, however, has not been tested with data from other countries and remains an interesting, but under researched approach. PMID:25126103
A novel nonparametric item response theory approach to measuring socioeconomic position: a comparison using household expenditure data from a Vietnam health survey, 2003.

PubMed

Reidpath, Daniel D; Ahmadi, Keivan

2014-01-01

Measures of household socio-economic position (SEP) are widely used in health research. There exist a number of approaches to their measurement, with Principal Components Analysis (PCA) applied to a basket of household assets being one of the most common. PCA, however, carries a number of assumptions about the distribution of the data which may be untenable, and alternative, non-parametric, approaches may be preferred. Mokken scale analysis is a non-parametric, item response theory approach to scale development which appears never to have been applied to household asset data. A Mokken scale can be used to rank order items (measures of wealth) as well as households. Using data on household asset ownership from a national sample of 4,154 consenting households in the World Health Survey from Vietnam, 2003, we construct two measures of household SEP. Seventeen items asking about assets, and utility and infrastructure use were used. Mokken Scaling and PCA were applied to the data. A single item measure of total household expenditure is used as a point of contrast. An 11 item scale, out of the 17 items, was identified that conformed to the assumptions of a Mokken Scale. All the items in the scale were identified as strong items (Hi > .5). Two PCA measures of SEP were developed as a point of contrast. One PCA measure was developed using all 17 available asset items, the other used the reduced set of 11 items identified in the Mokken scale analaysis. The Mokken Scale measure of SEP and the 17 item PCA measure had a very high correlation (r = .98), and they both correlated moderately with total household expenditure: r = .59 and r = .57 respectively. In contrast the 11 item PCA measure correlated moderately with the Mokken scale (r = .68), and weakly with the total household expenditure (r = .18). The Mokken scale measure of household SEP performed at least as well as PCA, and outperformed the PCA measure developed with the 11 items used in the Mokken scale. Unlike PCA, Mokken scaling carries no assumptions about the underlying shape of the distribution of the data, and can be used simultaneous to order household SEP and items. The approach, however, has not been tested with data from other countries and remains an interesting, but under researched approach.
The development of an instrument to assess chemistry perceptions

NASA Astrophysics Data System (ADS)

Wells, Raymond R.

The instrument, developed in this study, attempted to correct the deficiencies of previous instruments. Statements of belief and opinion can be validly included under the construct of chemistry perceptions. Further, statements that might be better characterized as science attitudes, math attitudes, or attitudes toward a specific course or program were not included. Eliminating statements of math anxiety and test anxiety insured that responses to statements of anxiety were perceptions of anxiety solely related to chemistry. The results of the expert judges' responses to the Validation of Proposed Perception Statements forms were detailed to establish construct and content validity. The nature of Likert scale construction and calculation of internal consistency also supported the validity of the instrument. A pilot Chemistry Perception Questionnaire (CPQ) was then constructed based on agreement of the appropriate subscale and mean importance of the perception statements. The pilot CPQ results were subjected to an item analysis based on three sets of statistics: the frequency of each response and the percentage of respondents making each response for each perception statement, the mean and standard deviations for each item, and the item discrimination index which correlated the item scores with the subscale scores. With no zero or negative correlations to the subscale scores, it was not necessary to replace any of the perception statements contained in the pilot instrument. Therefore, the piloted Chemistry Perception Questionnaire became the final instrument. Factor analysis confirmed the multidimensionality of the instrument. The instrument was administered twice with a separation interval of approximately one month in order to perform a test-retest reliability analysis. One hundred and forty-one pairs were matched and results detailed. The correlation between forms, for the total instrument, was 0.9342. The mean coefficient alpha, for the total instrument, was 0.9495. With test-retest correlations and alphas exceeding 0.70 for all seven subscales and the total instrument, it was determined that the Chemistry Perception Questionnaire instrument achieved reasonably high reliability estimations.
Validation of the Kohnen Restless Legs Syndrome-Quality of Life instrument.

PubMed

Kohnen, Ralf; Martinez-Martin, Pablo; Benes, Heike; Trenkwalder, Claudia; Högl, Birgit; Dunkl, Elmar; Walters, Arthur S

2016-08-01

Due to the symptoms and the sleep disturbances it causes, Restless Legs Syndrome (RLS) has a negative impact on quality of life. Measurement of such impact can be performed by means of questionnaires, such as the Kohnen Restless Legs Syndrome-Quality of Life questionnaire (KRLS-QoL), a specific 12-item instrument that is self-applied by patients. The present study is aimed at performing a first formal validation study of this instrument. Eight hundred ninety-one patients were included for analysis. RLS severity was assessed by the International Restless Legs Scale (IRLS), Restless Legs Syndrome-6 scales (RLS-6), and Clinical Global Impression of Severity. In addition the Epworth Sleepiness Scale (ESS) was assessed. Acceptability, dimensionality, scaling assumptions, reliability, precision, hypotheses-related validity, and responsiveness were tested. There were missing data in 3.58% patients. Floor and ceiling effects were low for the subscales, global evaluation, and summary index derived from items 1 to 11 after checking that scaling assumptions were met. Exploratory parallel factor analysis showed that the KRLS-QoL may be deemed unidimensional, ie, that all components of the scale are part of one overall general quality of life factor. Indexes of internal consistency (alpha = 0.88), item-total correlation (r S = 0.32-0.71), item homogeneity coefficient (0.41), and scale stability (ICC = 0.73) demonstrated a satisfactory reliability of the KRLS-QoL. Moderate or high correlations were obtained between KRLS-QoL scores and the IRLS, some components of the RLS-6, inter-KRLS-QoL domains, and global evaluations. Known-groups validity for severity levels grouping and responsiveness analysis results were satisfactory, the latter showing higher magnitudes of response for treated than for placebo arms. The KRLS-QoL was proven an acceptable, reliable, valid, and responsive measure to assess the impact of the RLS on quality of life. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Capturing the true burden of dystonia on patients: the Cervical Dystonia Impact Profile (CDIP-58).

PubMed

Cano, S J; Warner, T T; Linacre, J M; Bhatia, K P; Thompson, A J; Fitzpatrick, R; Hobart, J C

2004-11-09

To develop a new rating scale for measuring the health impact of cervical dystonia (CD) that includes patients' perceptions and complements existing observer dependent clinician rating scales. Scale development was in three stages. In Stage 1, a large pool of items was generated from patient interviews (n = 25), expert opinion, and literature review. In Stage 2, these items were administered by postal survey to people with CD. The resulting data were analyzed using Rasch item analysis to construct, from the item pool, a rating scale that satisfied criteria for rigorous measurement. In Stage 3, the measurement properties of this rating scale were examined in an independent sample of people with CD. In Stage 1, 150 items concerning the health impact of CD were generated. In Stage 2, 556 people completed questionnaires (87% response rate) and a 58-item rating scale measuring the health impact of CD in eight areas was constructed (CD Impact Profile, CDIP-58). In Stage 3, CDIP-58 data from 391 people (87% response rate) were received. Analyses supported the measurement of eight unidimensional constructs (infit mean square range 0.62 to 1.50), item calibration (33.37 to 67.56), and patient separation statistics (2.59 to 3.38). Items demonstrated stable calibrations in subgroups of people with CD supporting the stability of the CDIP-58. The CDIP-58 is a reliable and valid patient-based rating scale measuring the health impact of CD in eight health dimensions.
Evaluating Shortened Versions of the AUDIT as Screeners for Alcohol Use Problems in a General Population Study.

PubMed

Nayak, Madhabika B; Bond, Jason C; Greenfield, Thomas K

2015-01-01

Efficient alcohol screening measures are important to prevent or treat alcohol use disorders (AUDs). We studied different versions of the Alcohol Use Disorders Identification Test (AUDIT) comparing their performance to the full AUDIT and an AUD measure as screeners for alcohol use problems in Goa, India. Data from a general population study on 743 male drinkers aged 18-49 years are reported. Drinkers completed the AUDIT and an AUD measure. We created shorter versions of the AUDIT by (a) collapsing AUDIT item responses into three and two categories and (b) deleting two items with the lowest factor loadings. Each version was evaluated using factor, reliability and validity, and differential item functioning (DIF) analysis by age, education, standard of living index (SLI), and area of residence. A single factor solution was found for each version with lower factor loadings for items on guilt and concern. There were no significant differences among the different AUDIT versions in predicting AUD. No significant DIF was found by education, SLI or area of residence. DIF was observed for the alcohol frequency item by age. The AUDIT may be used with dichotomized response options without loss of predictive validity. A shortened eight-item dichotomized scale can adequately screen for AUDs in Goa when brevity is of paramount importance, although with lower predictive validity. Although the frequency item was endorsed more by older men, there is no evidence that the AUDIT items perform differently in other groups of male drinkers in Goa.
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
The therapeutic factor inventory-8: Using item response theory to create a brief scale for continuous process monitoring for group psychotherapy.

PubMed

Tasca, Giorgio A; Cabrera, Christine; Kristjansson, Elizabeth; MacNair-Semands, Rebecca; Joyce, Anthony S; Ogrodniczuk, John S

2016-01-01

We tested a very brief version of the 23-item Therapeutic Factors Inventory-Short Form (TFI-S), and describe the use of Item Response Theory (IRT) for the purpose of developing short and reliable scales for group psychotherapy. Group therapy patients (N = 578) completed the TFI-S on one occasion, and their data were used for the IRT analysis. Of those, 304 completed the TFI-S and other measures on more than one occasion to assess sensitivity to change, concurrent, and predictive validity of the brief version. Results suggest that the new TFI-8 is a brief, reliable, and valid measure of a higher-order group therapeutic factor. The TFI-8 may be used for continuous process measurement and feedback to improve the functioning of therapy groups.
Pesticide applicators questionnaire content validation: A fuzzy delphi method.

PubMed

Manakandan, S K; Rosnah, I; Mohd Ridhuan, J; Priya, R

2017-08-01

The most crucial step in forming a set of survey questionnaire is deciding the appropriate items in a construct. Retaining irrelevant items and removing important items will certainly mislead the direction of a particular study. This article demonstrates Fuzzy Delphi method as one of the scientific analysis technique to consolidate consensus agreement within a panel of experts pertaining to each item's appropriateness. This method reduces the ambiguity, diversity, and discrepancy of the opinions among the experts hence enhances the quality of the selected items. The main purpose of this study was to obtain experts' consensus on the suitability of the preselected items on the questionnaire. The panel consists of sixteen experts from the Occupational and Environmental Health Unit of Ministry of Health, Vector-borne Disease Control Unit of Ministry of Health and Occupational and Safety Health Unit of both public and private universities. A set of questionnaires related to noise and chemical exposure were compiled based on the literature search. There was a total of six constructs with 60 items in which three constructs for knowledge, attitude, and practice of noise exposure and three constructs for knowledge, attitude, and practice of chemical exposure. The validation process replicated recent Fuzzy Delphi method that using a concept of Triangular Fuzzy Numbers and Defuzzification process. A 100% response rate was obtained from all the sixteen experts with an average Likert scoring of four to five. Post FDM analysis, the first prerequisite was fulfilled with a threshold value (d) ≤ 0.2, hence all the six constructs were accepted. For the second prerequisite, three items (21%) from noise-attitude construct and four items (40%) from chemical-practice construct had expert consensus lesser than 75%, which giving rise to about 12% from the total items in the questionnaire. The third prerequisite was used to rank the items within the constructs by calculating the average fuzzy numbers. The seven items which did not fulfill the second prerequisite similarly had lower ranks during the analysis, therefore those items were discarded from the final draft. Post FDM analysis, the experts' consensus on the suitability of the pre-selected items on the questionnaire set were obtained, hence it is now ready for further construct validation process.
Clusters of cultures: diversity in meaning of family value and gender role items across Europe.

PubMed

van Vlimmeren, Eva; Moors, Guy B D; Gelissen, John P T M

2017-01-01

Survey data are often used to map cultural diversity by aggregating scores of attitude and value items across countries. However, this procedure only makes sense if the same concept is measured in all countries. In this study we argue that when (co)variances among sets of items are similar across countries, these countries share a common way of assigning meaning to the items. Clusters of cultures can then be observed by doing a cluster analysis on the (co)variance matrices of sets of related items. This study focuses on family values and gender role attitudes. We find four clusters of cultures that assign a distinct meaning to these items, especially in the case of gender roles. Some of these differences reflect response style behavior in the form of acquiescence. Adjusting for this style effect impacts on country comparisons hence demonstrating the usefulness of investigating the patterns of meaning given to sets of items prior to aggregating scores into cultural characteristics.
An in-depth psychometric analysis of the Connor-Davidson Resilience Scale: calibration with Rasch-Andrich model.

PubMed

Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P

2015-09-23

The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

ERIC Educational Resources Information Center

Lee, Wooyeol; Cho, Sun-Joo

2017-01-01

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
The late posterior negativity in ERP studies of episodic memory: action monitoring and retrieval of attribute conjunctions.

PubMed

Johansson, Mikael; Mecklinger, Axel

2003-10-01

The focus of the present paper is a late posterior negative slow wave (LPN) that has frequently been reported in event-related potential (ERP) studies of memory. An overview of these studies suggests that two broad classes of experimental conditions tend to elicit this component: (a) item recognition tasks associated with enhanced action monitoring demands arising from response conflict and (b) memory tasks that require the binding of items with contextual information specifying the study episode. A combined stimulus- and response-locked analysis of data from two studies mapping onto these classes allowed a temporal and functional decomposition of the LPN. While only the LPN observed in the item recognition task could be attributed to the involvement of a posteriorly distributed response-locked error-related negativity (or error negativity; ERN/Ne) occurring immediately after the response, the source-memory task was associated with a stimulus-locked negative slow wave occurring prior and during response execution that was evident when data were matched for response latencies. We argue that the presence of the former reflects action monitoring due to high levels of response conflict, whereas the latter reflects retrieval processes that may act to reconstruct the prior study episode when task-relevant attribute conjunctions are not readily recovered or need continued evaluation.
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30

ERIC Educational Resources Information Center

Antal, Tamás

2007-01-01

A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

ERIC Educational Resources Information Center

Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

2016-01-01

Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…

The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Using Hospital Anxiety and Depression Scale (HADS) on patients with epilepsy: Confirmatory factor analysis and Rasch models.

PubMed

Lin, Chung-Ying; Pakpour, Amir H

2017-02-01

The problems of mood disorders are critical in people with epilepsy. Therefore, there is a need to validate a useful tool for the population. The Hospital Anxiety and Depression Scale (HADS) has been used on the population, and showed that it is a satisfactory screening tool. However, more evidence on its construct validity is needed. A total of 1041 people with epilepsy were recruited in this study, and each completed the HADS. Confirmatory factor analysis (CFA) and Rasch analysis were used to understand the construct validity of the HADS. In addition, internal consistency was tested using Cronbachs' α, person separation reliability, and item separation reliability. Ordering of the response descriptors and the differential item functioning (DIF) were examined using the Rasch models. The HADS showed that 55.3% of our participants had anxiety; 56.0% had depression based on its cutoffs. CFA and Rasch analyses both showed the satisfactory construct validity of the HADS; the internal consistency was also acceptable (α=0.82 in anxiety and 0.79 in depression; person separation reliability=0.82 in anxiety and 0.73 in depression; item separation reliability=0.98 in anxiety and 0.91 in depression). The difficulties of the four-point Likert scale used in the HADS were monotonically increased, which indicates no disordering response categories. No DIF items across male and female patients and across types of epilepsy were displayed in the HADS. The HADS has promising psychometric properties on construct validity in people with epilepsy. Moreover, the additive item score is supported for calculating the cutoff. Copyright © 2016 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.
Construct validity of the Heart Failure Screening Tool (Heart-FaST) to identify heart failure patients at risk of poor self-care: Rasch analysis.

PubMed

Reynolds, Nicholas A; Ski, Chantal F; McEvedy, Samantha M; Thompson, David R; Cameron, Jan

2018-02-14

The aim of this study was to psychometrically evaluate the Heart Failure Screening Tool (Heart-FaST) via: (1) examination of internal construct validity; (2) testing of scale function in accordance with design; and (3) recommendation for change/s, if items are not well adjusted, to improve psychometric credential. Self-care is vital to the management of heart failure. The Heart-FaST may provide a prospective assessment of risk, regarding the likelihood that patients with heart failure will engage in self-care. Psychometric validation of the Heart-FaST using Rasch analysis. The Heart-FaST was administered to 135 patients (median age = 68, IQR = 59-78 years; 105 males) enrolled in a multidisciplinary heart failure management program. The Heart-FaST is a nurse-administered tool for screening patients with HF at risk of poor self-care. A Rasch analysis of responses was conducted which tested data against Rasch model expectations, including whether items serve as unbiased, non-redundant indicators of risk and measure a single construct and that rating scales operate as intended. The results showed that data met Rasch model expectations after rescoring or deleting items due to poor discrimination, disordered thresholds, differential item functioning, or response dependence. There was no evidence of multidimensionality which supports the use of total scores from Heart-FaST as indicators of risk. Aggregate scores from this modified screening tool rank heart failure patients according to their "risk of poor self-care" demonstrating that the Heart-FaST items constitute a meaningful scale to identify heart failure patients at risk of poor engagement in heart failure self-care. © 2018 John Wiley & Sons Ltd.
Rasch Analysis of the Edmonton Symptom Assessment System.

PubMed

Sprague, Emma; Siegert, Richard J; Medvedev, Oleg; Roberts, Margaret H

2018-05-01

The Edmonton Symptom Assessment System (ESAS) is a widely used multisymptom assessment tool in cancer and palliative care settings, but its psychometric properties have not been widely tested using modern psychometric methods such as Rasch analysis. To apply Rasch analysis to the ESAS in a community palliative care setting and determine its suitability for assessing symptom burden in this group. ESAS data collected from 229 patients enrolled in a community hospice service were evaluated using a partial credit Rasch model with RUMM2030 software (RUMM Laboratory Pty, Ltd., Duncraig, WA). Where disordered thresholds were discovered, item rescoring was undertaken. Rasch model fit and differential item functioning were evaluated after each iterative phase. Uniform rescoring was necessary for all 12 items to display ordered thresholds. The best model fit was achieved after item rescoring and combining three pairs of locally dependent items into three superitems (χ 2 = 29.56 [27]; P = 0.33) that permitted ordinal-to-interval conversion. The ESAS satisfied unidimensional Rasch model expectations in a 12-item format after minor modifications. This included uniform rescoring of the disordered response categories and creating superitems to improve model fit and clinical utility. The accuracy of the ESAS scores can be improved by using ordinal-to-interval conversion tables published in the article. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
The medial tibial stress syndrome score: a new patient-reported outcome measure.

PubMed

Winters, Marinus; Moen, Maarten H; Zimmermann, Wessel O; Lindeboom, Robert; Weir, Adam; Backx, Frank Jg; Bakker, Eric Wp

2016-10-01

At present, there is no validated patient-reported outcome measure (PROM) for patients with medial tibial stress syndrome (MTSS). Our aim was to select and validate previously generated items and create a valid, reliable and responsive PROM for patients with MTSS: the MTSS score. A prospective cohort study was performed in multiple sports medicine, physiotherapy and military facilities in the Netherlands. Participants with MTSS filled out the previously generated items for the MTSS score on 3 occasions. From previously generated items, we selected the best items. We assessed the MTSS score for its validity, reliability and responsiveness. The MTSS score was filled out by 133 participants with MTSS. Factor analysis showed the MTSS score to exhibit a single-factor structure with acceptable internal consistency (α=0.58) and good test-retest reliability (intraclass correlation coefficient=0.81). The MTSS score ranges from 0 to 10 points. The smallest detectable change in our sample was 0.69 at the group level and 4.80 at the individual level. Construct validity analysis showed significant moderate-to-large correlations (r=0.34-0.52, p<0.01). Responsiveness of the MTSS score was confirmed by a significant relation with the global perceived effect scale (β=-0.288, R(2)=0.21, p<0.001). The MTSS score is a valid, reliable and responsive PROM to measure the severity of MTSS. It is designed to evaluate treatment outcomes in clinical studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Analysis of structural relationship among the occupational dysfunction on the psychological problem in healthcare workers: a study using structural equation modeling

PubMed Central

Kyougoku, Makoto

2015-01-01

Purpose. The purpose of this study is to demonstrate the hypothetical model based on structural relationship with the occupational dysfunction on psychological problems (stress response, burnout syndrome, and depression) in healthcare workers. Method. Three cross sectional studies were conducted to assess the following relations: (1) occupational dysfunction on stress response (n = 468), (2) occupational dysfunction on burnout syndrome (n = 1,142), and (3) occupational dysfunction on depression (n = 687). Personal characteristics were collected through a questionnaire (such as age, gender, and job category, opportunities for refreshment, time spent on leisure activities, and work relationships) as well as the Classification and Assessment of Occupational Dysfunction (CAOD). Furthermore, study 1 included the Stress Response Scale-18 (SRS-18), study 2 used the Japanese Burnout Scale (JBS), and study 3 employed the Center for Epidemiological Studies Depression Scale (CES-D). The Kolmogorov–Smirnov test, confirmatory factor analysis (CFA), exploratory factor analysis (EFA), and path analysis of structural equation modeling (SEM) analysis were used in all of the studies. EFA and CFA were used to measure structural validity of four assessments; CAOD, SRS-18, JBS, and CES-D. For examination of a potential covariate, we assessed the correlation of the total and factor score of CAOD and personal factors in all studies. Moreover, direct and indirect effects of occupational dysfunction on stress response (Study 1), burnout syndrome (Study 2), and depression (Study 3) were also analyzed. Results. In study 1, CAOD had 16 items and 4 factors. In Study 2 and 3, CAOD had 16 items and 5 factors. SRS-18 had 18 items and 3 factors, JBS had 17 items and 3 factors, and CES-D had 20 items and 4 factors. All studies found that there were significant correlations between the CAOD total score and the personal factor that included opportunities for refreshment, time spent on leisure activities, and work relationships (p < 0.01). The hypothesis model results suggest that the classification of occupational dysfunction had good fit on the stress response (RMSEA = 0.061, CFI = 0.947, and TLI = 0.943), burnout syndrome (RMSEA = 0.076, CFI = 0.919, and TLI = 0.913), and depression (RMSEA = 0.060, CFI = 0.922, TLI = 0.917). Moreover, the detected covariates include opportunities for refreshment, time spent on leisure activities, and work relationships on occupational dysfunction. Conclusion. Our findings indicate that psychological problems are associated with occupational dysfunction in healthcare workers. Reduction of occupational dysfunction might be a strategy of better preventive occupational therapies for healthcare workers with psychological problems. However, longitudinal studies will be needed to determine a causal relationship. PMID:26618078
Analysis of structural relationship among the occupational dysfunction on the psychological problem in healthcare workers: a study using structural equation modeling.

PubMed

Teraoka, Mutsumi; Kyougoku, Makoto

2015-01-01

Purpose. The purpose of this study is to demonstrate the hypothetical model based on structural relationship with the occupational dysfunction on psychological problems (stress response, burnout syndrome, and depression) in healthcare workers. Method. Three cross sectional studies were conducted to assess the following relations: (1) occupational dysfunction on stress response (n = 468), (2) occupational dysfunction on burnout syndrome (n = 1,142), and (3) occupational dysfunction on depression (n = 687). Personal characteristics were collected through a questionnaire (such as age, gender, and job category, opportunities for refreshment, time spent on leisure activities, and work relationships) as well as the Classification and Assessment of Occupational Dysfunction (CAOD). Furthermore, study 1 included the Stress Response Scale-18 (SRS-18), study 2 used the Japanese Burnout Scale (JBS), and study 3 employed the Center for Epidemiological Studies Depression Scale (CES-D). The Kolmogorov-Smirnov test, confirmatory factor analysis (CFA), exploratory factor analysis (EFA), and path analysis of structural equation modeling (SEM) analysis were used in all of the studies. EFA and CFA were used to measure structural validity of four assessments; CAOD, SRS-18, JBS, and CES-D. For examination of a potential covariate, we assessed the correlation of the total and factor score of CAOD and personal factors in all studies. Moreover, direct and indirect effects of occupational dysfunction on stress response (Study 1), burnout syndrome (Study 2), and depression (Study 3) were also analyzed. Results. In study 1, CAOD had 16 items and 4 factors. In Study 2 and 3, CAOD had 16 items and 5 factors. SRS-18 had 18 items and 3 factors, JBS had 17 items and 3 factors, and CES-D had 20 items and 4 factors. All studies found that there were significant correlations between the CAOD total score and the personal factor that included opportunities for refreshment, time spent on leisure activities, and work relationships (p < 0.01). The hypothesis model results suggest that the classification of occupational dysfunction had good fit on the stress response (RMSEA = 0.061, CFI = 0.947, and TLI = 0.943), burnout syndrome (RMSEA = 0.076, CFI = 0.919, and TLI = 0.913), and depression (RMSEA = 0.060, CFI = 0.922, TLI = 0.917). Moreover, the detected covariates include opportunities for refreshment, time spent on leisure activities, and work relationships on occupational dysfunction. Conclusion. Our findings indicate that psychological problems are associated with occupational dysfunction in healthcare workers. Reduction of occupational dysfunction might be a strategy of better preventive occupational therapies for healthcare workers with psychological problems. However, longitudinal studies will be needed to determine a causal relationship.
Behaviors in Advance Care Planning and ACtions Survey (BACPACS): development and validation part 1.

PubMed

Kassam, Aliya; Douglas, Maureen L; Simon, Jessica; Cunningham, Shannon; Fassbender, Konrad; Shaw, Marta; Davison, Sara N

2017-11-22

Although advance care planning (ACP) is fairly well understood, significant barriers to patient participation remain. As a result, tools to assess patient behaviour are required. The objective of this study was to improve the measurement of patient engagement in ACP by detecting existing survey design issues and establishing content and response process validity for a new survey entitled Behaviours in Advance Care Planning and ACtions Survey (BACPACS). We based our new tool on that of an existing ACP engagement survey. Initial item reduction was carried out using behavior change theories by content and design experts to help reduce response burden and clarify questions. Thirty-two patients with chronic diseases (cancer, heart failure or renal failure) were recruited for the think aloud cognitive interviewing with the new, shortened survey evaluating patient engagement with ACP. Of these, n = 27 had data eligible for analysis (n = 8 in round 1 and n = 19 in rounds 2 and 3). Interviews were audio-recorded and analyzed using the constant comparison method. Three reviewers independently listened to the interviews, summarized findings and discussed discrepancies until consensus was achieved. Item reduction from key content expert review and conversation analysis helped decrease number of items from 116 in the original ACP Engagement Survey to 24-38 in the new BACPACS depending on branching of responses. For the think aloud study, three rounds of interviews were needed until saturation for patient clarity was achieved. The understanding of ACP as a construct, survey response options, instructions and terminology pertaining to patient engagement in ACP warranted further clarification. Conversation analysis, content expert review and think aloud cognitive interviewing were useful in refining the new survey instrument entitled BACPACS. We found evidence for both content and response process validity for this new tool.
Perceived freedom-responsibility covariation among Cypriot adolescents.

PubMed

Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R

2008-04-01

Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.
Rasch analysis of the Trypophobia Questionnaire.

PubMed

Imaizumi, Shu; Tanno, Yoshihiko

2018-02-14

This study aimed to assess Rasch-based psychometric properties of the Trypophobia Questionnaire measuring proneness to trypophobia, which refers to disgust and unpleasantness induced by the observation of clusters of objects (e.g., lotus seed pods). Rasch analysis was performed on data from 582 healthy Japanese adults. The results suggested that Trypophobia Questionnaire has a unidimensional structure with ordered response categories and sufficient person and item reliabilities, and that it does not have differential item functioning across sexes and age groups, whereas the targeting of the scale leaves room for improvements. When items that did not fit the Rasch model were removed, the shortened version showed slightly improved psychometric properties. However, results were not conclusive in determining whether the full or shortened version is better for practical use. Further assessment and validation are needed.
Comparison of composite measures of disease activity in an early seropositive rheumatoid arthritis cohort

PubMed Central

Ranganath, Veena K; Yoon, Jeonglim; Khanna, Dinesh; Park, Grace S; Furst, Daniel E; Elashoff, David A; Jawaheer, Damini; Sharp, John T; Gold, Richard H; Keystone, Edward C; Paulus, Harold E

2007-01-01

Objective To evaluate concordance and agreement of the original DAS44/ESR‐4 item composite disease activity status measure with nine simpler derivatives when classifying patient responses by European League of Associations for Rheumatology (EULAR) criteria, using an early rheumatoid factor positive (RF+) rheumatoid arthritis (RA) patient cohort. Methods Disease‐modifying anti‐rheumatic drug‐naïve RF+ patients (n = 223; mean duration of symptoms, 6 months) were categorised as ACR none/20/50/70 responders. One‐way analysis of variance and two‐sample t tests were used to investigate the relationship between the ACR response groups and each composite measure. EULAR reached/change cut‐point scores were calculated for each composite measure. EULAR (good/moderate/none) responses for each composite measure and the degree of agreement with the DAS44/ESR‐4 item were calculated for 203 patients. Results Patients were mostly female (78%) with moderate to high disease activity. A centile‐based nomogram compared equivalent composite measure scores. Changes from baseline in the composite measures in patients with ACRnone were significantly less than those of ACR20/50/70 responders, and those for ACR50 were significantly different from those for ACR70. EULAR reached/change cut‐point scores for our cohort were similar to published cut‐points. When compared with the DAS44/ESR‐4 item, EULAR (good/moderate/none) percentage agreements were 92 with the DAS44/ESR‐3 item, 74 with the Clinical Disease Activity Index, and 80 with the DAS28/ESR‐4 item, the DAS28/CRP‐4 item and the Simplified Disease Activity Index. Conclusion The relationships of nine different RA composite measures against the DAS44/ESR‐4 item when applied to a cohort of seropositive patients with early RA are described. Each of these simplified status and response measures could be useful in assessing patients with RA, but the specific measure selected should be pre‐specified and described for each study. PMID:17472996
Development and psychometric testing of the Canine Owner-Reported Quality of Life questionnaire, an instrument designed to measure quality of life in dogs with cancer.

PubMed

Giuffrida, Michelle A; Brown, Dorothy Cimino; Ellenberg, Susan S; Farrar, John T

2018-05-01

OBJECTIVE To describe development and initial psychometric testing of an owner-reported questionnaire designed to standardize measurement of general quality of life (QOL) in dogs with cancer. DESIGN Key-informant interviews, questionnaire development, and field trial. SAMPLE Owners of 25 dogs with cancer for item development and pretesting and owners of 90 dogs with cancer for reliability and validity testing. PROCEDURES Standard methods for development and testing of questionnaire instruments intended to measure subjective states were used. Items were generated, selected, scaled, and pretested for content, meaning, and readability. Response items were evaluated with exploratory factor analysis and by assessing internal consistency (Cronbach α) and convergence with global QOL as determined with a visual analog scale. Preliminary tests of stability and responsiveness were performed. RESULTS The final questionnaire-which was named the Canine Owner-Reported Quality of Life (CORQ) questionnaire-contained 17 items related to observable behaviors commonly used by owners to evaluate QOL in their dogs. Several items pertaining to physical symptoms performed poorly and were omitted. The 17 items were assigned to 4 factors-vitality, companionship, pain, and mobility-on the basis of the items they contained. The CORQ questionnaire and its factors had high internal consistency (Cronbach α = 0.68 to 0.90) and moderate to strong correlations (r = 0.49 to 0.71) with global QOL as measured on a visual analog scale. Preliminary testing indicated good test-retest reliability and responsiveness to improvements in overall QOL. CONCLUSIONS AND CLINICAL RELEVANCE The CORQ questionnaire was a valid, reliable owner-reported questionnaire that measured general QOL in dogs with cancer and showed promise as a clinical trial outcome measure for quantifying changes in individual dog QOL occurring in response to cancer treatment and progression.
Answering the call: a tool that measures functional breast cancer literacy.

PubMed

Williams, Karen Patricia; Templin, Thomas N; Hines, Resche D

2013-01-01

There is a need for health care providers and health care educators to ensure that the messages they communicate are understood. The purpose of this research was to test the reliability and validity, in a culturally diverse sample of women, of a revised Breast Cancer Literacy Assessment Tool (Breast-CLAT) designed to measure functional understanding of breast cancer in English, Spanish, and Arabic. Community health workers verbally administered the 35-item Breast-CLAT to 543 Black, Latina, and Arab American women. A confirmatory factor analysis using a 2-parameter item response theory model was used to test the proposed 3-factor Breast-CLAT (awareness, screening and knowledge, and prevention and control). The confirmatory factor analysis using a 2-parameter item response theory model had a good fit (TLI = .91, RMSEA = .04) to the proposed 3-factor structure. The total scale reliability ranged from .80 for Black participants to .73 for total culturally diverse sample. The three subscales were differentially predictive of family history of cancer. The revised Breast-CLAT scales demonstrated internal consistency reliability and validity in this multiethnic, community-based sample.
Grooming a CAT: customizing CAT administration rules to increase response efficiency in specific research and clinical settings.

PubMed

Kallen, Michael A; Cook, Karon F; Amtmann, Dagmar; Knowlton, Elizabeth; Gershon, Richard C

2018-05-05

To evaluate the degree to which applying alternative stopping rules would reduce response burden while maintaining score precision in the context of computer adaptive testing (CAT). Analyses were conducted on secondary data comprised of CATs administered in a clinical setting at multiple time points (baseline and up to two follow ups) to 417 study participants who had back pain (51.3%) and/or depression (47.0%). Participant mean age was 51.3 years (SD = 17.2) and ranged from 18 to 86. Participants tended to be white (84.7%), relatively well educated (77% with at least some college), female (63.9%), and married or living in a committed relationship (57.4%). The unit of analysis was individual assessment histories (i.e., CAT item response histories) from the parent study. Data were first aggregated across all individuals, domains, and time points in an omnibus dataset of assessment histories and then were disaggregated by measure for domain-specific analyses. Finally, assessment histories within a "clinically relevant range" (score ≥ 1 SD from the mean in direction of poorer health) were analyzed separately to explore score level-specific findings. Two different sets of CAT administration rules were compared. The original CAT (CAT ORIG ) rules required at least four and no more than 12 items be administered. If the score standard error (SE) reached a value < 3 points (T score metric) before 12 items were administered, the CAT was stopped. We simulated applying alternative stopping rules (CAT ALT ), removing the requirement that a minimum four items be administered, and stopped a CAT if responses to the first two items were both associated with best health, if the SE was < 3, if SE change < 0.1 (T score metric), or if 12 items were administered. We then compared score fidelity and response burden, defined as number of items administered, between CAT ORIG and CAT ALT . CAT ORIG and CAT ALT scores varied little, especially within the clinically relevant range, and response burden was substantially lower under CAT ALT (e.g., 41.2% savings in omnibus dataset). Alternate stopping rules result in substantial reductions in response burden with minimal sacrifice in score precision.
Development and validation of a new condition-specific instrument for evaluation of smile esthetics-related quality of life.

PubMed

Saltovic, Ema; Lajnert, Vlatka; Saltovic, Sabina; Kovacevic Pavicic, Daniela; Pavlic, Andrej; Spalj, Stjepan

2018-03-01

Orofacial esthetics raises psychosocial issues. The purpose was to create and validate new short instrument for psychosocial impacts of altered smile esthetics. A team of an orthodontist, two prosthodontists, psychologist, and a dental student generated items that could draw up specific hypothetical psychosocial dimensions (69 items initially, 39 in final analysis). The sample consisted of 261 Caucasian subjects attending local high schools and university (26% male) aged 14 to 28 years that have self-administrated the designed questionnaire. Factorial analysis, Cronbach's alpha, Pearson correlation, paired samples t-test and analysis of variance were used for analyses of internal consistency, construct validity, responsiveness, and test-retest. Three dimensions of psychosocial impacts of altered smile esthetics were identified: dental self-consciousness, dental self-confidence and social contacts that can be best fitted by 12 items, 4 items in each dimension. Internal consistency was good (α in range 0.85-0.89). Good stability in test-retest was confirmed. In responsiveness testing, tooth whitening induced increase in dental self-confidence (P = 0.002), but no significant changes in other dimensions. The new instrument, Smile Esthetics-Related Quality of Life (SERQoL), is short and has proven to be a good indicator of psychosocial dimensions related to perception of smile esthetics. Smile Esthetics-Related Quality of Life questionnaire might have practical validity when applied in esthetic dental clinical procedures. © 2017 Wiley Periodicals, Inc.
Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

ERIC Educational Resources Information Center

Pohl, Steffi; Gräfe, Linda; Rose, Norman

2014-01-01

Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Rasch Model Analysis Gives New Insights Into the Structural Validity of the QuickDASH in Patients With Musculoskeletal Shoulder Pain.

PubMed

Jerosch-Herold, Christina; Chester, Rachel; Shepstone, Lee

2017-09-01

Study Design Cross-sectional secondary analysis of a prospective cohort study. Background The shortened version of the Disabilities of the Arm, Shoulder and Hand questionnaire (QuickDASH) is a widely used outcome measure that has been extensively evaluated using classical test theory. Rasch model analysis can identify strengths and weaknesses of rating scales and goes beyond classical test theory approaches. It uses a mathematical model to test the fit between the observed data and expected responses and converts ordinal-level scores into interval-level measurement. Objective To test the structural validity of the QuickDASH using Rasch analysis. Methods A prospective cohort study of 1030 patients with shoulder pain provided baseline data. Rasch analysis was conducted to (1) assess how the QuickDASH fits the Rasch model, (2) identify sources of misfit, and (3) explore potential solutions to these. Results There was evidence of multidimensionality and significant misfit to the Rasch model (χ 2 = 331.09, P<.001). Two items had disordered threshold responses with strong floor effects. Response bias was detected in most items for age and sex. Rescoring resulted in ordered thresholds; however, the 11-item scale still did not meet the expectations of the Rasch model. Conclusion Rasch model analysis on the QuickDASH has identified a number of problems that cannot be easily detected using traditional analyses. While revisions to the QuickDASH resulted in better fit, a "shoulder-specific" version is not advocated at present. Caution needs to be exercised when interpreting results of the QuickDASH outcome measure, as it does not meet the criteria for interval-level measurement and shows significant response bias by age and sex. J Orthop Sports Phys Ther 2017;47(9):664-672. Epub 13 Jul 2017. doi:10.2519/jospt.2017.7288.
Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

PubMed

Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

2015-07-01

The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Minimal impact of response shift for SF-12 mental and physical health status in homeless and vulnerably housed individuals: an item-level multi-group analysis.

PubMed

Gadermann, Anne M; Sawatzky, Richard; Palepu, Anita; Hubley, Anita M; Zumbo, Bruno D; Aubry, Tim; Farrell, Susan; Hwang, Stephen W

2017-06-01

The purpose of this study was to examine whether homeless or vulnerably housed individuals experienced response shift over a 12-month time period in their self-reported physical and mental health status. Data were obtained from the Health and Housing in Transition study, a longitudinal multi-site cohort study in Canada (N = 1190 at baseline). Multi-group confirmatory factor analysis (MG-CFA) and methods for response shift detection at the item level, based on the approach by Oort, were used to test for reconceptualization, reprioritization, and recalibration response shift on the SF-12 in four groups of individuals who were homeless (n = 170), housed (n = 437), or who reported a change in their housing status [from homeless to housed (n = 285) or housed to homeless (n = 73)] over a 12-month time period. Mean and variance adjusted weighted-least squares estimation was used to accommodate the ordinal and binary distributions of the SF-12 items. Using MG-CFA, a strict invariance model showed that the measurement model was equivalent for the four groups at baseline. Although we found small but statistically significant response shift for several measurement model parameters, the impact on the predicted average mental and physical health scores within each of the groups was small. Response shift does not appear to be a significant concern when using the SF-12 to obtain change scores over a 12-month period in this population.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

PubMed

Schweizer, Karl; Troche, Stefan

2018-02-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.