applying item response: Topics by Science.gov

Sample records for applying item response

Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Measuring Student Learning with Item Response Theory

ERIC Educational Resources Information Center

Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.

2008-01-01

We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

PubMed

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Analyzing Longitudinal Item Response Data via the Pairwise Fitting Method

ERIC Educational Resources Information Center

Fu, Zhi-Hui; Tao, Jian; Shi, Ning-Zhong; Zhang, Ming; Lin, Nan

2011-01-01

Multidimensional item response theory (MIRT) models can be applied to longitudinal educational surveys where a group of individuals are administered different tests over time with some common items. However, computational problems typically arise as the dimension of the latent variables increases. This is especially true when the latent variable…
Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

ERIC Educational Resources Information Center

Lee, Won-Chan

2010-01-01

In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…
Applying Item Response Theory Methods to Design a Learning Progression-Based Science Assessment

ERIC Educational Resources Information Center

Chen, Jing

2012-01-01

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1)…
Response Mixture Modeling: Accounting for Heterogeneity in Item Characteristics across Response Times.

PubMed

Molenaar, Dylan; de Boeck, Paul

2018-06-01

In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
A validation study of public health knowledge, skills, social responsibility and applied learning.

PubMed

Vackova, Dana; Chen, Coco K; Lui, Juliana N M; Johnston, Janice M

2018-06-22

To design and validate a questionnaire to measure medical students' Public Health (PH) knowledge, skills, social responsibility and applied learning as indicated in the four domains recommended by the Association of Schools & Programmes of Public Health (ASPPH). A cross-sectional study was conducted to develop an evaluation tool for PH undergraduate education through item generation, reduction, refinement and validation. The 74 preliminary items derived from the existing literature were reduced to 55 items based on expert panel review which included those with expertise in PH, psychometrics and medical education, as well as medical students. Psychometric properties of the preliminary questionnaire were assessed as follows: frequency of endorsement for item variance; principal component analysis (PCA) with varimax rotation for item reduction and factor estimation; Cronbach's Alpha, item-total correlation and test-retest validity for internal consistency and reliability. PCA yielded five factors: PH Learning Experience (6 items); PH Risk Assessment and Communication (5 items); Future Use of Evidence in Practice (6 items); Recognition of PH as a Scientific Discipline (4 items); and PH Skills Development (3 items), explaining 72.05% variance. Internal consistency and reliability tests were satisfactory (Cronbach's Alpha ranged from 0.87 to 0.90; item-total correlation > 0.59). Lower paired test-retest correlations reflected instability in a social science environment. An evaluation tool for community-centred PH education has been developed and validated. The tool measures PH knowledge, skills, social responsibilities and applied learning as recommended by the internationally recognised Association of Schools & Programmes of Public Health (ASPPH).
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.

ERIC Educational Resources Information Center

Muraki, Eiji

1999-01-01

Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Modelling Mathematics Problem Solving Item Responses Using a Multidimensional IRT Model

ERIC Educational Resources Information Center

Wu, Margaret; Adams, Raymond

2006-01-01

This research examined students' responses to mathematics problem-solving tasks and applied a general multidimensional IRT model at the response category level. In doing so, cognitive processes were identified and modelled through item response modelling to extract more information than would be provided using conventional practices in scoring…
Application of Group-Level Item Response Models in the Evaluation of Consumer Reports about Health Plan Quality

ERIC Educational Resources Information Center

Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.

2006-01-01

Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…
Adaptive Quadrature for Item Response Models. Research Report. ETS RR-06-29

ERIC Educational Resources Information Center

Haberman, Shelby J.

2006-01-01

Adaptive quadrature is applied to marginal maximum likelihood estimation for item response models with normal ability distributions. Even in one dimension, significant gains in speed and accuracy of computation may be achieved.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

ERIC Educational Resources Information Center

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Estimating the Nominal Response Model under Nonnormal Conditions

ERIC Educational Resources Information Center

Preston, Kathleen Suzanne Johnson; Reise, Steven Paul

2014-01-01

The nominal response model (NRM), a much understudied polytomous item response theory (IRT) model, provides researchers the unique opportunity to evaluate within-item category distinctions. Polytomous IRT models, such as the NRM, are frequently applied to psychological assessments representing constructs that are unlikely to be normally…
Item Response Modeling: An Evaluation of the Children's Fruit and Vegetable Self-Efficacy Questionnaire

ERIC Educational Resources Information Center

Watson, Kathy; Baranowski, Tom; Thompson, Debbe

2006-01-01

Perceived self-efficacy (SE) for eating fruit and vegetables (FV) is a key variable mediating FV change in interventions. This study applies item response modeling (IRM) to a fruit, juice and vegetable self-efficacy questionnaire (FVSEQ) previously validated with classical test theory (CTT) procedures. The 24-item (five-point Likert scale) FVSEQ…
Characterizing Sources of Uncertainty in Item Response Theory Scale Scores

ERIC Educational Resources Information Center

Yang, Ji Seung; Hansen, Mark; Cai, Li

2012-01-01

Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…
Real and Artificial Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Andrich, David; Hagquist, Curt

2015-01-01

Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…

Applying Multidimensional Item Response Theory Models in Validating Test Dimensionality: An Example of K-12 Large-Scale Science Assessment

ERIC Educational Resources Information Center

Li, Ying; Jiao, Hong; Lissitz, Robert W.

2012-01-01

This study investigated the application of multidimensional item response theory (IRT) models to validate test structure and dimensionality. Multiple content areas or domains within a single subject often exist in large-scale achievement tests. Such areas or domains may cause multidimensionality or local item dependence, which both violate the…
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model

ERIC Educational Resources Information Center

Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet

2012-01-01

Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…
Model-Based Collaborative Filtering Analysis of Student Response Data: Machine-Learning Item Response Theory

ERIC Educational Resources Information Center

Bergner, Yoav; Droschler, Stefan; Kortemeyer, Gerd; Rayyan, Saif; Seaton, Daniel; Pritchard, David E.

2012-01-01

We apply collaborative filtering (CF) to dichotomously scored student response data (right, wrong, or no interaction), finding optimal parameters for each student and item based on cross-validated prediction accuracy. The approach is naturally suited to comparing different models, both unidimensional and multidimensional in ability, including a…
Applying the Nominal Response Model within a Longitudinal Framework to Construct the Positive Family Relationships Scale

ERIC Educational Resources Information Center

Preston, Kathleen Suzanne Johnson; Parral, Skye N.; Gottfried, Allen W.; Oliver, Pamella H.; Gottfried, Adele Eskeles; Ibrahim, Sirena M.; Delany, Danielle

2015-01-01

A psychometric analysis was conducted using the nominal response model under the item response theory framework to construct the Positive Family Relationships scale. Using data from the Fullerton Longitudinal Study, this scale was constructed within a long-term longitudinal framework spanning middle childhood through adolescence. Items tapping…
Evaluating Job Demands and Control Measures for Use in Farm Worker Health Surveillance.

PubMed

Alterman, Toni; Gabbard, Susan; Grzywacz, Joseph G; Shen, Rui; Li, Jia; Nakamoto, Jorge; Carroll, Daniel J; Muntaner, Carles

2015-10-01

Workplace stress likely plays a role in health disparities; however, applying standard measures to studies of immigrants requires thoughtful consideration. The goal of this study was to determine the appropriateness of two measures of occupational stressors ('decision latitude' and 'job demands') for use with mostly immigrant Latino farm workers. Cross-sectional data from a pilot module containing a four-item measure of decision latitude and a two-item measure of job demands were obtained from a subsample (N = 409) of farm workers participating in the National Agricultural Workers Survey. Responses to items for both constructs were clustered toward the low end of the structured response-set. Percentages of responses of 'very often' and 'always' for each of the items were examined by educational attainment, birth country, dominant language spoken, task, and crop. Cronbach's α, when stratified by subgroups of workers, for the decision latitude items were (0.65-0.90), but were less robust for the job demands items (0.25-0.72). The four-item decision latitude scale can be applied to occupational stress research with immigrant farm workers, and potentially other immigrant Latino worker groups. The short job demands scale requires further investigation and evaluation before suggesting widespread use.
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.

PubMed

Polak, Marike; de Rooij, Mark; Heiser, Willem J

2012-09-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

NASA Astrophysics Data System (ADS)

Chiu, Tina

This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
A note on monotonicity of item response functions for ordered polytomous item response theory models.

PubMed

Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua

2018-03-08

A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
A Combined IRT and SEM Approach for Individual-Level Assessment in Test-Retest Studies

ERIC Educational Resources Information Center

Ferrando, Pere J.

2015-01-01

The standard two-wave multiple-indicator model (2WMIM) commonly used to analyze test-retest data provides information at both the group and item level. Furthermore, when applied to binary and graded item responses, it is related to well-known item response theory (IRT) models. In this article the IRT-2WMIM relations are used to obtain additional…
Development of a Computerized Adaptive Testing for Diagnosing the Cognitive Process of Grade 7 Students in Learning Algebra, Using Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Senarat, Somprasong; Tayraukham, Sombat; Piyapimonsit, Chatsiri; Tongkhambanjong, Sakesan

2013-01-01

The purpose of this research is to develop a multidimensional computerized adaptive test for diagnosing the cognitive process of grade 7 students in learning algebra by applying multidimensional item response theory. The research is divided into 4 steps: 1) the development of item bank of algebra, 2) the development of the multidimensional…
Item Response Theory and Health Outcomes Measurement in the 21st Century

PubMed Central

Hays, Ron D.; Morales, Leo S.; Reise, Steve P.

2006-01-01

Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Item response theory and the measurement of motor behavior.

PubMed

Safrit, M J; Cohen, A S; Costa, M G

1989-12-01

Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
The Long-Term Sustainability of Different Item Response Theory Scaling Methods

ERIC Educational Resources Information Center

Keller, Lisa A.; Keller, Robert R.

2011-01-01

This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests…
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

2013-01-01

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Evaluating Job Demands and Control Measures for Use in Farm Worker Health Surveillance

PubMed Central

Alterman, Toni; Gabbard, Susan; Grzywacz, Joseph G.; Shen, Rui; Li, Jia; Nakamoto, Jorge; Carroll, Daniel J.; Muntaner, Carles

2015-01-01

Workplace stress likely plays a role in health disparities; however, applying standard measures to studies of immigrants requires thoughtful consideration. The goal of this study was to determine the appropriateness of two measures of occupational stressors (‘decision latitude’ and ‘job demands’) for use with mostly immigrant Latino farm workers. Cross-sectional data from a pilot module containing a four-item measure of decision latitude and a two-item measure of job demands were obtained from a subsample (N = 409) of farm workers participating in the National Agricultural Workers Survey. Responses to items for both constructs were clustered toward the low end of the structured response-set. Percentages of responses of ‘very often’ and ‘always’ for each of the items were examined by educational attainment, birth country, dominant language spoken, task, and crop. Cronbach’s α, when stratified by subgroups of workers, for the decision latitude items were (0.65–0.90), but were less robust for the job demands items (0.25–0.72). The four-item decision latitude scale can be applied to occupational stress research with immigrant farm workers, and potentially other immigrant Latino worker groups. The short job demands scale requires further investigation and evaluation before suggesting widespread use. PMID:25138138
Fitting a Mixture Item Response Theory Model to Personality Questionnaire Data: Characterizing Latent Classes and Investigating Possibilities for Improving Prediction

ERIC Educational Resources Information Center

Maij-de Meij, Annette M.; Kelderman, Henk; van der Flier, Henk

2008-01-01

Mixture item response theory (IRT) models aid the interpretation of response behavior on personality tests and may provide possibilities for improving prediction. Heterogeneity in the population is modeled by identifying homogeneous subgroups that conform to different measurement models. In this study, mixture IRT models were applied to the…
Applying Item Response Theory Methods to Examine the Impact of Different Response Formats

ERIC Educational Resources Information Center

Hohensinn, Christine; Kubinger, Klaus D.

2011-01-01

In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

PubMed

Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

2018-02-01

The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.

An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

ERIC Educational Resources Information Center

Wyse, Adam E.; Hao, Shiqi

2012-01-01

This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…
Item Response Theory: Overview, Applications, and Promise for Institutional Research

ERIC Educational Resources Information Center

Bowman, Nicholas A.; Herzog, Serge; Sharkness, Jessica

2014-01-01

Item Response Theory (IRT) is a measurement theory that is ideal for scale and test development in institutional research, but it is not without its drawbacks. This chapter provides an overview of IRT, describes an example of its use, and highlights the pros and cons of using IRT in applied settings.
Validating the European Health Literacy Survey Questionnaire in people with type 2 diabetes: Latent trait analyses applying multidimensional Rasch modelling and confirmatory factor analysis.

PubMed

Finbråten, Hanne Søberg; Pettersen, Kjell Sverre; Wilde-Larsson, Bodil; Nordström, Gun; Trollvik, Anne; Guttersrud, Øystein

2017-11-01

To validate the European Health Literacy Survey Questionnaire (HLS-EU-Q47) in people with type 2 diabetes mellitus. The HLS-EU-Q47 latent variable is outlined in a framework with four cognitive domains integrated in three health domains, implying 12 theoretically defined subscales. Valid and reliable health literacy measurers are crucial to effectively adapt health communication and education to individuals and groups of patients. Cross-sectional study applying confirmatory latent trait analyses. Using a paper-and-pencil self-administered approach, 388 adults responded in March 2015. The data were analysed using the Rasch methodology and confirmatory factor analysis. Response violation (response dependency) and trait violation (multidimensionality) of local independence were identified. Fitting the "multidimensional random coefficients multinomial logit" model, 1-, 3- and 12-dimensional Rasch models were applied and compared. Poor model fit and differential item functioning were present in some items, and several subscales suffered from poor targeting and low reliability. Despite multidimensional data, we did not observe any unordered response categories. Interpreting the domains as distinct but related latent dimensions, the data fit a 12-dimensional Rasch model and a 12-factor confirmatory factor model best. Therefore, the analyses did not support the estimation of one overall "health literacy score." To support the plausibility of claims based on the HLS-EU score(s), we suggest: removing the health care aspect to reduce the magnitude of multidimensionality; rejecting redundant items to avoid response dependency; adding "harder" items and applying a six-point rating scale to improve subscale targeting and reliability; and revising items to improve model fit and avoid bias owing to person factors. © 2017 John Wiley & Sons Ltd.
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Do large-scale assessments measure students' ability to integrate scientific knowledge?

NASA Astrophysics Data System (ADS)

Lee, Hee-Sun

2010-03-01

Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

PubMed

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Comparing Vertical Scales Derived from Dichotomous and Polytomous IRT Models for a Test Composed of Testlets.

ERIC Educational Resources Information Center

Bishop, N. Scott; Omar, Md Hafidz

Previous research has shown that testlet structures often violate important assumptions of dichotomous item response theory (D-IRT) models, applied to item-level scores, that can in turn affect the results of many measurement applications. In this situation, polytomous IRT (P-IRT) models, applied to testlet-level scores, have been used as an…
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).

PubMed

Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul

2018-03-01

Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Fit of Item Response Theory Models: A Survey of Data from Several Operational Tests. Research Report. ETS RR-11-29

ERIC Educational Resources Information Center

Sinharay, Sandip; Haberman, Shelby J.; Jia, Helena

2011-01-01

Standard 3.9 of the "Standards for Educational and Psychological Testing" (American Educational Research Association, American Psychological Association, & National Council for Measurement in Education, 1999) demands evidence of model fit when an item response theory (IRT) model is used to make inferences from a data set. We applied two recently…
An Item Response Theory Analysis of the Community of Inquiry Scale

ERIC Educational Resources Information Center

Horzum, Mehmet Baris; Uyanik, Gülden Kaya

2015-01-01

The aim of this study is to examine validity and reliability of Community of Inquiry Scale commonly used in online learning by the means of Item Response Theory. For this purpose, Community of Inquiry Scale version 14 is applied on 1,499 students of a distance education center's online learning programs at a Turkish state university via internet.…
An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

ERIC Educational Resources Information Center

Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.

2014-01-01

As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Application of Item Response Theory to Tests of Substance-related Associative Memory

PubMed Central

Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

2015-01-01

A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Assessing the Evaluative Content of Personality Questionnaires Using Bifactor Models.

PubMed

Biderman, Michael D; McAbee, Samuel T; Job Chen, Zhuo; Hendy, Nhung T

2018-01-01

Exploratory bifactor models with keying factors were applied to item response data for the NEO-FFI-3 and HEXACO-PI-R questionnaires. Loadings on a general factor and positive and negative keying factors correlated with independent estimates of item valence, suggesting that item valence influences responses to these questionnaires. Correlations between personality domain scores and measures of self-esteem, depression, and positive and negative affect were all reduced significantly when the influence of evaluative content represented by the general and keying factors was removed. Findings support the need to model personality inventories in ways that capture reactions to evaluative item content.
An Application of Unfolding and Cumulative Item Response Theory Models for Noncognitive Scaling: Examining the Assumptions and Applicability of the Generalized Graded Unfolding Model

ERIC Educational Resources Information Center

Sgammato, Adrienne N.

2009-01-01

This study examined the applicability of a relatively new unidimensional, unfolding item response theory (IRT) model called the generalized graded unfolding model (GGUM; Roberts, Donoghue, & Laughlin, 2000). A total of four scaling methods were applied. Two commonly used cumulative IRT models for polytomous data, the Partial Credit Model and…
Improving Measurement in Health Education and Health Behavior Research Using Item Response Modeling: Comparison with the Classical Test Theory Approach

ERIC Educational Resources Information Center

Wilson, Mark; Allen, Diane D.; Li, Jun Corser

2006-01-01

This paper compares the approach and resultant outcomes of item response models (IRMs) and classical test theory (CTT). First, it reviews basic ideas of CTT, and compares them to the ideas about using IRMs introduced in an earlier paper. It then applies a comparison scheme based on the AERA/APA/NCME "Standards for Educational and…
Probing the Relative Importance of Different Attributes in L2 Reading and Listening Comprehension Items: An Application of Cognitive Diagnostic Models

ERIC Educational Resources Information Center

Yi, Yeon-Sook

2017-01-01

The present study examines the relative importance of attributes within and across items by applying four cognitive diagnostic assessment models. The current study utilizes the function of the models that can indicate inter-attribute relationships that reflect the response behaviors of examinees to analyze scored test-taker responses to four forms…
Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

ERIC Educational Resources Information Center

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee

2017-01-01

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
An item response curves analysis of the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.

2012-09-01

Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.
Evaluation of the Psychometric Properties of the Asian Adolescent Depression Scale and Construction of a Short Form: An Item Response Theory Analysis.

PubMed

Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen

2017-07-01

The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.

Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).

PubMed

Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian

2017-07-01

The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU-QOL) instrument.

PubMed

Gorecki, C; Lamping, D L; Nixon, J; Brown, J M; Cano, S

2012-04-01

Pretesting is key in the development of patient-reported outcome (PRO) instruments. We describe a mixed-methods approach based on interviews and Rasch measurement methods in the pretesting of the Pressure Ulcer Quality of Life (PU-QOL) instrument. We used cognitive interviews to pretest the PU-QOL in 35 patients with pressure ulcers with the view to identifying problematic items, followed by Rasch analysis to examine response options, appropriateness of the item series and biases due to question ordering (item fit). We then compared findings in an interactive and iterative process to identify potential strengths and weaknesses of PU-QOL items, and guide decision-making about further revisions to items and design/layout. Although cognitive interviews largely supported items, they highlighted problems with layout, response options and comprehension. Findings from the Rasch analysis identified problems with response options through reversed thresholds. The use of a mixed-methods approach in pretesting the PU-QOL instrument proved beneficial for identifying problems with scale layout, response options and framing/wording of items. Rasch measurement methods are a useful addition to standard qualitative pretesting for evaluating strengths and weaknesses of early stage PRO instruments.
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Examining student heuristic usage in a hydrogen bonding assessment.

PubMed

Miller, Kathryn; Kim, Thomas

2017-09-01

This study investigates the role of representational competence in student responses to an assessment of hydrogen bonding. The assessment couples the use of a multiple-select item ("Choose all that apply") with an open-ended item to allow for an examination of students' cognitive processes as they relate to the assignment of hydrogen bonding within a structural representation. Response patterns from the multiple-select item implicate heuristic usage as a contributing factor to students' incorrect responses. The use of heuristics is further supported by the students' corresponding responses to the open-ended assessment item. Taken together, these data suggest that poor representational competence may contribute to students' previously observed inability to correctly navigate the concept of hydrogen bonding. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(5):411-416, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Comparing Different Approaches of Bias Correction for Ability Estimation in IRT Models. Research Report. ETS RR-08-13

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2008-01-01

The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…
Development of a Microsoft Excel tool for one-parameter Rasch model of continuous items: an application to a safety attitude survey.

PubMed

Chien, Tsair-Wei; Shao, Yang; Kuo, Shu-Chun

2017-01-10

Many continuous item responses (CIRs) are encountered in healthcare settings, but no one uses item response theory's (IRT) probabilistic modeling to present graphical presentations for interpreting CIR results. A computer module that is programmed to deal with CIRs is required. To present a computer module, validate it, and verify its usefulness in dealing with CIR data, and then to apply the model to real healthcare data in order to show how the CIR that can be applied to healthcare settings with an example regarding a safety attitude survey. Using Microsoft Excel VBA (Visual Basic for Applications), we designed a computer module that minimizes the residuals and calculates model's expected scores according to person responses across items. Rasch models based on a Wright map and on KIDMAP were demonstrated to interpret results of the safety attitude survey. The author-made CIR module yielded OUTFIT mean square (MNSQ) and person measures equivalent to those yielded by professional Rasch Winsteps software. The probabilistic modeling of the CIR module provides messages that are much more valuable to users and show the CIR advantage over classic test theory. Because of advances in computer technology, healthcare users who are familiar to MS Excel can easily apply the study CIR module to deal with continuous variables to benefit comparisons of data with a logistic distribution and model fit statistics.
Combining agreement and frequency rating scales to optimize psychometrics in measuring behavioral health functioning.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M

2014-07-01

The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Development and validation of an item response theory-based Social Responsiveness Scale short form.

PubMed

Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T

2017-09-01

Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
A new item response theory model to adjust data allowing examinee choice

PubMed Central

Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

2018-01-01

In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
Cognitive Diagnostic Models for Tests with Multiple-Choice and Constructed-Response Items

ERIC Educational Resources Information Center

Kuo, Bor-Chen; Chen, Chun-Hua; Yang, Chih-Wei; Mok, Magdalena Mo Ching

2016-01-01

Traditionally, teachers evaluate students' abilities via their total test scores. Recently, cognitive diagnostic models (CDMs) have begun to provide information about the presence or absence of students' skills or misconceptions. Nevertheless, CDMs are typically applied to tests with multiple-choice (MC) items, which provide less diagnostic…
The Discriminating Power of Items that Measure More than One Dimension.

ERIC Educational Resources Information Center

Reckase, Mark D.

The work presented in this paper defined conceptually the concepts of multidimensional discrimination and information, derived mathematical expressions for the concepts for a particular multidimensional item response theory (IRT) model, and applied the concepts to actual test data. Multidimensional discrimination was defined as a function of the…
Cognitive Diagnostic Attribute-Level Discrimination Indices

ERIC Educational Resources Information Center

Henson, Robert; Roussos, Louis; Douglas, Jeff; He, Xuming

2008-01-01

Cognitive diagnostic models (CDMs) model the probability of correctly answering an item as a function of an examinee's attribute mastery pattern. Because estimation of the mastery pattern involves more than a continuous measure of ability, reliability concepts introduced by classical test theory and item response theory do not apply. The cognitive…
Discriminant content validity: a quantitative methodology for assessing content of theory-based measures, with illustrative applications.

PubMed

Johnston, Marie; Dixon, Diane; Hart, Jo; Glidewell, Liz; Schröder, Carin; Pollard, Beth

2014-05-01

In studies involving theoretical constructs, it is important that measures have good content validity and that there is not contamination of measures by content from other constructs. While reliability and construct validity are routinely reported, to date, there has not been a satisfactory, transparent, and systematic method of assessing and reporting content validity. In this paper, we describe a methodology of discriminant content validity (DCV) and illustrate its application in three studies. Discriminant content validity involves six steps: construct definition, item selection, judge identification, judgement format, single-sample test of content validity, and assessment of discriminant items. In three studies, these steps were applied to a measure of illness perceptions (IPQ-R) and control cognitions. The IPQ-R performed well with most items being purely related to their target construct, although timeline and consequences had small problems. By contrast, the study of control cognitions identified problems in measuring constructs independently. In the final study, direct estimation response formats for theory of planned behaviour constructs were found to have as good DCV as Likert format. The DCV method allowed quantitative assessment of each item and can therefore inform the content validity of the measures assessed. The methods can be applied to assess content validity before or after collecting data to select the appropriate items to measure theoretical constructs. Further, the data reported for each item in Appendix S1 can be used in item or measure selection. Statement of contribution What is already known on this subject? There are agreed methods of assessing and reporting construct validity of measures of theoretical constructs, but not their content validity. Content validity is rarely reported in a systematic and transparent manner. What does this study add? The paper proposes discriminant content validity (DCV), a systematic and transparent method of assessing and reporting whether items assess the intended theoretical construct and only that construct. In three studies, DCV was applied to measures of illness perceptions, control cognitions, and theory of planned behaviour response formats. Appendix S1 gives content validity indices for each item of each questionnaire investigated. Discriminant content validity is ideally applied while the measure is being developed, before using to measure the construct(s), but can also be applied after using a measure. © 2014 The British Psychological Society.
Use of non-parametric item response theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS).

PubMed

Khan, Anzalee; Lewis, Charles; Lindenmayer, Jean-Pierre

2011-11-16

Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.
Use of NON-PARAMETRIC Item Response Theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS)

PubMed Central

2011-01-01

Background Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Methods Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. Results The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. Conclusions The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity. PMID:22087503
Item Response Theory Applied to Factors Affecting the Patient Journey Towards Hearing Rehabilitation

PubMed Central

Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien

2016-01-01

To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428
Measurement equivalence of the KINDL questionnaire across child self-reports and parent proxy-reports: a comparison between item response theory and ordinal logistic regression.

PubMed

Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara

2014-06-01

Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Dental responsibility loadings and the relative value of dental services.

PubMed

Teusner, D N; Ju, X; Brennan, D S

2017-09-01

To estimate responsibility loadings for a comprehensive list of dental services, providing a standardized unit of clinical work effort. Dentists (n = 2500) randomly sampled from the Australian Dental Association membership (2011) were randomly assigned to one of 25 panels. Panels were surveyed by questionnaires eliciting responsibility loadings for eight common dental services (core items) and approximately 12 other items unique to that questionnaire. In total, loadings were elicited for 299 items listed in the Australian Dental Schedule 9th Edition. Data were weighted to reflect the age and sex distribution of the workforce. To assess reliability, regression models assessed differences in core item loadings by panel assignment. Estimated loadings were described by reporting the median and mean. Response rate was 37%. Panel composition did not vary by practitioner characteristics. Core item loadings did not vary by panel assignment. Oral surgery and endodontic service areas had the highest proportion (91%) of services with median loadings ≥1.5, followed by prosthodontics (78%), periodontics (76%), orthodontics (63%), restorative (62%) and diagnostic services (31%). Preventive services had median loadings ≤1.25. Dental responsibility loadings estimated by this study can be applied in the development of relative value scales. © 2017 Australian Dental Association.

Examining Student Heuristic Usage in a Hydrogen Bonding Assessment

ERIC Educational Resources Information Center

Miller, Kathryn; Kim, Thomas

2017-01-01

This study investigates the role of representational competence in student responses to an assessment of hydrogen bonding. The assessment couples the use of a multiple-select item ("Choose all that apply") with an open-ended item to allow for an examination of students' cognitive processes as they relate to the assignment of hydrogen…
Comparing five depression measures in depressed Chinese patients using item response theory: an examination of item properties, measurement precision and score comparability.

PubMed

Zhao, Yue; Chan, Wai; Lo, Barbara Chuen Yee

2017-04-04

Item response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures. The purpose of this study is to apply IRT to examine item properties (discrimination and severity of depressive symptoms), measurement precision and score comparability across five depression measures, which is the first study of its kind in the Chinese context. A clinical sample of 207 Hong Kong Chinese outpatients was recruited. Data analyses were performed including classical item analysis, IRT concurrent calibration and IRT true score equating. The IRT assumptions of unidimensionality and local independence were tested respectively using confirmatory factor analysis and chi-square statistics. The IRT linking assumptions of construct similarity, equity and subgroup invariance were also tested. The graded response model was applied to concurrently calibrate all five depression measures in a single IRT run, resulting in the item parameter estimates of these measures being placed onto a single common metric. IRT true score equating was implemented to perform the outcome score linking and construct score concordances so as to link scores from one measure to corresponding scores on another measure for direct comparability. Findings suggested that (a) symptoms on depressed mood, suicidality and feeling of worthlessness served as the strongest discriminating indicators, and symptoms concerning suicidality, changes in appetite, depressed mood, feeling of worthlessness and psychomotor agitation or retardation reflected high levels of severity in the clinical sample. (b) The five depression measures contributed to various degrees of measurement precision at varied levels of depression. (c) After outcome score linking was performed across the five measures, the cut-off scores led to either consistent or discrepant diagnoses for depression. The study provides additional evidence regarding the psychometric properties and clinical utility of the five depression measures, offers methodological contributions to the appropriate use of IRT in PRO measures, and helps elucidate cultural variation in depressive symptomatology. The approach of concurrently calibrating and linking multiple PRO measures can be applied to the assessment of PROs other than the depression context.
Generalized Full-Information Item Bifactor Analysis

PubMed Central

Cai, Li; Yang, Ji Seung; Hansen, Mark

2011-01-01

Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682
Application of the IRT and TRT Models to a Reading Comprehension Test

ERIC Educational Resources Information Center

Kim, Weon H.

2017-01-01

The purpose of the present study is to apply the item response theory (IRT) and testlet response theory (TRT) models to a reading comprehension test. This study applied the TRT models and the traditional IRT model to a seventh-grade reading comprehension test (n = 8,815) with eight testlets. These three models were compared to determine the best…
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The Impact of Kentucky's Educational Reform Act on Writing throughout the Commonwealth.

ERIC Educational Resources Information Center

Harnack, Andrew; And Others

1994-01-01

The central role of writing in Kentucky's Education Reform Act is most evident in Kentucky's new assessment system, which employs writing on all levels. Even tests that have recently included multiple-choice items may be replaced by response items that require students to apply knowledge, concepts, and skills in a writing format. Writing itself is…
Assessing items on the SF-8 Japanese version for health-related quality of life: a psychometric analysis based on the nominal categories model of item response theory.

PubMed

Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya

2009-06-01

The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Grooming a CAT: customizing CAT administration rules to increase response efficiency in specific research and clinical settings.

PubMed

Kallen, Michael A; Cook, Karon F; Amtmann, Dagmar; Knowlton, Elizabeth; Gershon, Richard C

2018-05-05

To evaluate the degree to which applying alternative stopping rules would reduce response burden while maintaining score precision in the context of computer adaptive testing (CAT). Analyses were conducted on secondary data comprised of CATs administered in a clinical setting at multiple time points (baseline and up to two follow ups) to 417 study participants who had back pain (51.3%) and/or depression (47.0%). Participant mean age was 51.3 years (SD = 17.2) and ranged from 18 to 86. Participants tended to be white (84.7%), relatively well educated (77% with at least some college), female (63.9%), and married or living in a committed relationship (57.4%). The unit of analysis was individual assessment histories (i.e., CAT item response histories) from the parent study. Data were first aggregated across all individuals, domains, and time points in an omnibus dataset of assessment histories and then were disaggregated by measure for domain-specific analyses. Finally, assessment histories within a "clinically relevant range" (score ≥ 1 SD from the mean in direction of poorer health) were analyzed separately to explore score level-specific findings. Two different sets of CAT administration rules were compared. The original CAT (CAT ORIG ) rules required at least four and no more than 12 items be administered. If the score standard error (SE) reached a value < 3 points (T score metric) before 12 items were administered, the CAT was stopped. We simulated applying alternative stopping rules (CAT ALT ), removing the requirement that a minimum four items be administered, and stopped a CAT if responses to the first two items were both associated with best health, if the SE was < 3, if SE change < 0.1 (T score metric), or if 12 items were administered. We then compared score fidelity and response burden, defined as number of items administered, between CAT ORIG and CAT ALT . CAT ORIG and CAT ALT scores varied little, especially within the clinically relevant range, and response burden was substantially lower under CAT ALT (e.g., 41.2% savings in omnibus dataset). Alternate stopping rules result in substantial reductions in response burden with minimal sacrifice in score precision.
Development and validation of a patient-reported outcome measure for stroke patients.

PubMed

Luo, Yanhong; Yang, Jie; Zhang, Yanbo

2015-05-08

Family support and patient satisfaction with treatment are crucial for aiding in the recovery from stroke. However, current validated stroke-specific questionnaires may not adequately capture the impact of these two variables on patients undergoing clinical trials of new drugs. Therefore, the aim of this study was to develop and evaluate a new stroke patient-reported outcome measure (Stroke-PROM) instrument for capturing more comprehensive effects of stroke on patients participating in clinical trials of new drugs. A conceptual framework and a pool of items for the preliminary Stroke-PROM were generated by consulting the relevant literature and other questionnaires created in China and other countries, and interviewing 20 patients and 4 experts to ensure that all germane parameters were included. During the first item-selection phase, classical test theory and item response theory were applied to an initial scale completed by 133 patients with stroke. During the item-revaluation phase, classical test theory and item response theory were used again, this time with 475 patients with stroke and 104 healthy participants. During the scale assessment phase, confirmatory factor analysis was applied to the final scale of the Stroke-PROM using the same study population as in the second item-selection phase. Reliability, validity, responsiveness and feasibility of the final scale were tested. The final scale of Stroke-PROM contained 46 items describing four domains (physiology, psychology, society and treatment). These four domains were subdivided into 10 subdomains. Cronbach's α coefficients for the four domains ranged from 0.861 to 0.908. Confirmatory factor analysis supported the validity of the final scale, and the model fit index satisfied the criterion. Differences in the Stroke-PROM mean scores were significant between patients with stroke and healthy participants in nine subdomains (P < 0.001), indicating that the scale showed good responsiveness. The Stroke-PROM is a patient-reported outcome multidimensional questionnaire developed especially for clinical trials of new drugs and is focused on issues of family support and patient satisfaction with treatment. Extensive data analyses supported the validity, reliability and responsiveness of the Stroke-PROM.
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

PubMed

Shou, Yiyun; Sellbom, Martin; Xu, Jing

2018-05-01

There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Multiple-Choice versus Constructed-Response Tests in the Assessment of Mathematics Computation Skills.

ERIC Educational Resources Information Center

Gadalla, Tahany M.

The equivalence of multiple-choice (MC) and constructed response (discrete) (CR-D) response formats as applied to mathematics computation at grade levels two to six was tested. The difference between total scores from the two response formats was tested for statistical significance, and the factor structure of items in both response formats was…
A novel nonparametric item response theory approach to measuring socioeconomic position: a comparison using household expenditure data from a Vietnam health survey, 2003

PubMed Central

2014-01-01

Background Measures of household socio-economic position (SEP) are widely used in health research. There exist a number of approaches to their measurement, with Principal Components Analysis (PCA) applied to a basket of household assets being one of the most common. PCA, however, carries a number of assumptions about the distribution of the data which may be untenable, and alternative, non-parametric, approaches may be preferred. Mokken scale analysis is a non-parametric, item response theory approach to scale development which appears never to have been applied to household asset data. A Mokken scale can be used to rank order items (measures of wealth) as well as households. Using data on household asset ownership from a national sample of 4,154 consenting households in the World Health Survey from Vietnam, 2003, we construct two measures of household SEP. Seventeen items asking about assets, and utility and infrastructure use were used. Mokken Scaling and PCA were applied to the data. A single item measure of total household expenditure is used as a point of contrast. Results An 11 item scale, out of the 17 items, was identified that conformed to the assumptions of a Mokken Scale. All the items in the scale were identified as strong items (Hi > .5). Two PCA measures of SEP were developed as a point of contrast. One PCA measure was developed using all 17 available asset items, the other used the reduced set of 11 items identified in the Mokken scale analaysis. The Mokken Scale measure of SEP and the 17 item PCA measure had a very high correlation (r = .98), and they both correlated moderately with total household expenditure: r = .59 and r = .57 respectively. In contrast the 11 item PCA measure correlated moderately with the Mokken scale (r = .68), and weakly with the total household expenditure (r = .18). Conclusion The Mokken scale measure of household SEP performed at least as well as PCA, and outperformed the PCA measure developed with the 11 items used in the Mokken scale. Unlike PCA, Mokken scaling carries no assumptions about the underlying shape of the distribution of the data, and can be used simultaneous to order household SEP and items. The approach, however, has not been tested with data from other countries and remains an interesting, but under researched approach. PMID:25126103
A novel nonparametric item response theory approach to measuring socioeconomic position: a comparison using household expenditure data from a Vietnam health survey, 2003.

PubMed

Reidpath, Daniel D; Ahmadi, Keivan

2014-01-01

Measures of household socio-economic position (SEP) are widely used in health research. There exist a number of approaches to their measurement, with Principal Components Analysis (PCA) applied to a basket of household assets being one of the most common. PCA, however, carries a number of assumptions about the distribution of the data which may be untenable, and alternative, non-parametric, approaches may be preferred. Mokken scale analysis is a non-parametric, item response theory approach to scale development which appears never to have been applied to household asset data. A Mokken scale can be used to rank order items (measures of wealth) as well as households. Using data on household asset ownership from a national sample of 4,154 consenting households in the World Health Survey from Vietnam, 2003, we construct two measures of household SEP. Seventeen items asking about assets, and utility and infrastructure use were used. Mokken Scaling and PCA were applied to the data. A single item measure of total household expenditure is used as a point of contrast. An 11 item scale, out of the 17 items, was identified that conformed to the assumptions of a Mokken Scale. All the items in the scale were identified as strong items (Hi > .5). Two PCA measures of SEP were developed as a point of contrast. One PCA measure was developed using all 17 available asset items, the other used the reduced set of 11 items identified in the Mokken scale analaysis. The Mokken Scale measure of SEP and the 17 item PCA measure had a very high correlation (r = .98), and they both correlated moderately with total household expenditure: r = .59 and r = .57 respectively. In contrast the 11 item PCA measure correlated moderately with the Mokken scale (r = .68), and weakly with the total household expenditure (r = .18). The Mokken scale measure of household SEP performed at least as well as PCA, and outperformed the PCA measure developed with the 11 items used in the Mokken scale. Unlike PCA, Mokken scaling carries no assumptions about the underlying shape of the distribution of the data, and can be used simultaneous to order household SEP and items. The approach, however, has not been tested with data from other countries and remains an interesting, but under researched approach.
General mixture item response models with different item response structures: Exposition with an application to Likert scales.

PubMed

Tijmstra, Jesper; Bolsinova, Maria; Jeon, Minjeong

2018-01-10

This article proposes a general mixture item response theory (IRT) framework that allows for classes of persons to differ with respect to the type of processes underlying the item responses. Through the use of mixture models, nonnested IRT models with different structures can be estimated for different classes, and class membership can be estimated for each person in the sample. If researchers are able to provide competing measurement models, this mixture IRT framework may help them deal with some violations of measurement invariance. To illustrate this approach, we consider a two-class mixture model, where a person's responses to Likert-scale items containing a neutral middle category are either modeled using a generalized partial credit model, or through an IRTree model. In the first model, the middle category ("neither agree nor disagree") is taken to be qualitatively similar to the other categories, and is taken to provide information about the person's endorsement. In the second model, the middle category is taken to be qualitatively different and to reflect a nonresponse choice, which is modeled using an additional latent variable that captures a person's willingness to respond. The mixture model is studied using simulation studies and is applied to an empirical example.
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.

PubMed

Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland

2014-04-01

To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales in school children.

PubMed

Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra

2012-03-13

Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Cultural Consensus Theory: Aggregating Continuous Responses in a Finite Interval

NASA Astrophysics Data System (ADS)

Batchelder, William H.; Strashny, Alex; Romney, A. Kimball

Cultural consensus theory (CCT) consists of cognitive models for aggregating responses of "informants" to test items about some domain of their shared cultural knowledge. This paper develops a CCT model for items requiring bounded numerical responses, e.g. probability estimates, confidence judgments, or similarity judgments. The model assumes that each item generates a latent random representation in each informant, with mean equal to the consensus answer and variance depending jointly on the informant and the location of the consensus answer. The manifest responses may reflect biases of the informants. Markov Chain Monte Carlo (MCMC) methods were used to estimate the model, and simulation studies validated the approach. The model was applied to an existing cross-cultural dataset involving native Japanese and English speakers judging the similarity of emotion terms. The results sharpened earlier studies that showed that both cultures appear to have very similar cognitive representations of emotion terms.
Analysing task design and students' responses to context-based problems through different analytical frameworks

NASA Astrophysics Data System (ADS)

Broman, Karolina; Bernholt, Sascha; Parchmann, Ilka

2015-05-01

Background:Context-based learning approaches are used to enhance students' interest in, and knowledge about, science. According to different empirical studies, students' interest is improved by applying these more non-conventional approaches, while effects on learning outcomes are less coherent. Hence, further insights are needed into the structure of context-based problems in comparison to traditional problems, and into students' problem-solving strategies. Therefore, a suitable framework is necessary, both for the analysis of tasks and strategies. Purpose:The aim of this paper is to explore traditional and context-based tasks as well as students' responses to exemplary tasks to identify a suitable framework for future design and analyses of context-based problems. The paper discusses different established frameworks and applies the Higher-Order Cognitive Skills/Lower-Order Cognitive Skills (HOCS/LOCS) taxonomy and the Model of Hierarchical Complexity in Chemistry (MHC-C) to analyse traditional tasks and students' responses. Sample:Upper secondary students (n=236) at the Natural Science Programme, i.e. possible future scientists, are investigated to explore learning outcomes when they solve chemistry tasks, both more conventional as well as context-based chemistry problems. Design and methods:A typical chemistry examination test has been analysed, first the test items in themselves (n=36), and thereafter 236 students' responses to one representative context-based problem. Content analysis using HOCS/LOCS and MHC-C frameworks has been applied to analyse both quantitative and qualitative data, allowing us to describe different problem-solving strategies. Results:The empirical results show that both frameworks are suitable to identify students' strategies, mainly focusing on recall of memorized facts when solving chemistry test items. Almost all test items were also assessing lower order thinking. The combination of frameworks with the chemistry syllabus has been found successful to analyse both the test items as well as students' responses in a systematic way. The framework can therefore be applied in the design of new tasks, the analysis and assessment of students' responses, and as a tool for teachers to scaffold students in their problem-solving process. Conclusions:This paper gives implications for practice and for future research to both develop new context-based problems in a structured way, as well as providing analytical tools for investigating students' higher order thinking in their responses to these tasks.
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis

PubMed Central

Handren, Lindsay; Crano, William D.

2018-01-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of “dry” weekdays and “wet” weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns. PMID:27488456
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis.

PubMed

Lac, Andrew; Handren, Lindsay; Crano, William D

2016-10-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of "dry" weekdays and "wet" weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns.

Assessing patients' experiences with communication across the cancer care continuum.

PubMed

Mazor, Kathleen M; Street, Richard L; Sue, Valerie M; Williams, Andrew E; Rabin, Borsika A; Arora, Neeraj K

2016-08-01

To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. A total of 366 adults were included in the analyses. Relatively few selected Does Not Apply, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. The PACE is a new tool for eliciting patients' perspectives on communication during cancer care. It is freely available online for practitioners, researchers and others. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Analyzing Multiple-Choice Questions by Model Analysis and Item Response Curves

NASA Astrophysics Data System (ADS)

Wattanakasiwich, P.; Ananta, S.

2010-07-01

In physics education research, the main goal is to improve physics teaching so that most students understand physics conceptually and be able to apply concepts in solving problems. Therefore many multiple-choice instruments were developed to probe students' conceptual understanding in various topics. Two techniques including model analysis and item response curves were used to analyze students' responses from Force and Motion Conceptual Evaluation (FMCE). For this study FMCE data from more than 1000 students at Chiang Mai University were collected over the past three years. With model analysis, we can obtain students' alternative knowledge and the probabilities for students to use such knowledge in a range of equivalent contexts. The model analysis consists of two algorithms—concentration factor and model estimation. This paper only presents results from using the model estimation algorithm to obtain a model plot. The plot helps to identify a class model state whether it is in the misconception region or not. Item response curve (IRC) derived from item response theory is a plot between percentages of students selecting a particular choice versus their total score. Pros and cons of both techniques are compared and discussed.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.

PubMed

McCabe, Erin; Gross, Douglas P; Bulut, Okan

2018-06-07

The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Measuring emotion socialization in families affected by pediatric cancer: Refinement and reduction of the Parents' Beliefs about Children's Emotions questionnaire.

PubMed

Beitra, Danette; El-Behadli, Ana F; Faith, Melissa A

2018-01-01

The aim of this study is to conduct a multimethod psychometric reduction in the Parents' Beliefs about Children's Emotions (PBCE) questionnaire using an item response theory framework with a pediatric oncology sample. Participants were 216 pediatric oncology caregivers who completed the PBCE. The PBCE contains 105 items (11 subscales) rated on a 6-point Likert-type scale. We evaluated the PBCE subscale performance by applying a partial credit model in WINSTEPS. Sixty-six statistically weak items were removed, creating a 44-item PBCE questionnaire with 10 subscales and 3 response options per item. The refined scale displayed good psychometric properties and correlated .910 with the original PBCE. Additional analyses examined dimensionality, item-level (e.g. difficulty), and person-level (e.g. ethnicity) characteristics. The refined PBCE questionnaire provides better test information, improves instrument reliability, and reduces burden on families, providers, and researchers. With this improved measure, providers can more easily identify families who may benefit from psychosocial interventions targeting emotion socialization. The results of the multistep approach presented should be considered preliminary, given the limited sample size.
Testing manifest monotonicity using order-constrained statistical inference.

PubMed

Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas

2013-01-01

Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

PubMed Central

2012-01-01

Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12) – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension. Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware. PMID:22686586
Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

PubMed

Stochl, Jan; Jones, Peter B; Croudace, Tim J

2012-06-11

Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension.Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware.
Scale Refinement and Initial Evaluation of a Behavioral Health Function Measurement Tool for Work Disability Evaluation

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.

2014-01-01

Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children

PubMed Central

2012-01-01

Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Response Latency Measures for Biographical Inventories

DTIC Science & Technology

1991-03-01

research (Trent et al., 1989). Procedures The ASAP, followed by one or more experimental cognitive tests, was computer administered to groups of...comprehension, and binary " true /false" decision about the item. This last stage, in turn, is divided into two substages: self-referent decision...apply stage) As a first step in partitioning latencies, it would be prudent to control experimentally for item length, as had been done in a few studies
Response latencies are alive and well for identifying fakers on a self-report personality inventory: A reconsideration of van Hooft and Born (2012).

PubMed

Holden, Ronald R; Lambert, Christine E

2015-12-01

Van Hooft and Born (Journal of Applied Psychology 97:301-316, 2012) presented data challenging both the correctness of a congruence model of faking on personality test items and the relative merit (i.e., effect size) of response latencies for identifying fakers. We suggest that their analysis of response times was suboptimal, and that it followed neither from a congruence model of faking nor from published protocols on appropriately filtering the noise in personality test item answering times. Using new data and following recommended analytic procedures, we confirmed the relative utility of response times for identifying personality test fakers, and our obtained results, again, reinforce a congruence model of faking.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Applied Cognition – General Concerns, Short Forms in Ethnically Diverse Groups

PubMed Central

Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P.; Crane, Paul K.; Cella, David; Teresi, Jeanne A.

2017-01-01

Aims The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System® (PROMIS®) Applied Cognition – General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample (n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. Methods DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. Results DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: “I have had to work really hard to pay attention or I would make a mistake” and “I have had trouble shifting back and forth between different activities that require thinking”. For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Conclusion Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition – General Concerns short form item set. One item, “It has seemed like my brain was not working as well as usual” might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups. PMID:28523238
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Applied Cognition - General Concerns, Short Forms in Ethnically Diverse Groups.

PubMed

Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P; Crane, Paul K; Cella, David; Teresi, Jeanne A

2016-01-01

The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System ® (PROMIS ® ) Applied Cognition - General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample ( n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: "I have had to work really hard to pay attention or I would make a mistake" and "I have had trouble shifting back and forth between different activities that require thinking". For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition - General Concerns short form item set. One item, "It has seemed like my brain was not working as well as usual" might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups.
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

PubMed

van Rijn, Peter W; Ali, Usama S

2017-05-01

We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.
Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

PubMed

Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

2006-11-01

We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Development and refinement of the WAItE: a new obesity-specific quality of life measure for adolescents.

PubMed

Oluboyede, Yemi; Hulme, Claire; Hill, Andrew

2017-08-01

Few weight-specific outcome measures, developed specifically for obese and overweight adolescents, exist and none are suitable for the elicitation of utility values used in the assessment of cost effectiveness. The development of a descriptive system for a new weight-specific measure. Qualitative interviews were conducted with 31 treatment-seeking (above normal weight status) and non-treatment-seeking (school sample) adolescents aged 11-18 years, to identify a draft item pool and associated response options. 315 eligible consenting adolescents, aged 11-18 years, enrolled in weight management services and recruited via an online panel, completed two version of a long-list 29-item descriptive system (consisting of frequency and severity response scales). Psychometric assessments and Rasch analysis were applied to the draft 29-item instrument to identify a brief tool containing the best performing items and associated response options. Seven items were selected, for the final item set; all displayed internal consistency, moderate floor effects and the ability to discriminate between weight categories. The assessment of unidimensionality was supported (t test statistic of 0.024, less than the 0.05 threshold value). The Weight-specific Adolescent Instrument for Economic-evaluation focuses on aspects of life affected by weight that are important to adolescents. It has the potential for adding key information to the assessment of weight management interventions aimed at the younger population.
Comparison promotes learning and transfer of relational categories.

PubMed

Kurtz, Kenneth J; Boukrina, Olga; Gentner, Dedre

2013-07-01

We investigated the effect of co-presenting training items during supervised classification learning of novel relational categories. Strong evidence exists that comparison induces a structural alignment process that renders common relational structure more salient. We hypothesized that comparisons between exemplars would facilitate learning and transfer of categories that cohere around a common relational property. The effect of comparison was investigated using learning trials that elicited a separate classification response for each item in presentation pairs that could be drawn from the same or different categories. This methodology ensures consideration of both items and invites comparison through an implicit same-different judgment inherent in making the two responses. In a test phase measuring learning and transfer, the comparison group significantly outperformed a control group receiving an equivalent training session of single-item classification learning. Comparison-based learners also outperformed the control group on a test of far transfer, that is, the ability to accurately classify items from a novel domain that was relationally alike, but surface-dissimilar, to the training materials. Theoretical and applied implications of this comparison advantage are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Robust Measurement via A Fused Latent and Graphical Item Response Theory Model.

PubMed

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang

2018-03-12

Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

PubMed

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.

Item response theory analysis applied to the Spanish version of the Personal Outcomes Scale.

PubMed

Guàrdia-Olmos, J; Carbó-Carreté, M; Peró-Cebollero, M; Giné, C

2017-11-01

The study of measurements of quality of life (QoL) is one of the great challenges of modern psychology and psychometric approaches. This issue has greater importance when examining QoL in populations that were historically treated on the basis of their deficiency, and recently, the focus has shifted to what each person values and desires in their life, as in cases of people with intellectual disability (ID). Many studies of QoL scales applied in this area have attempted to improve the validity and reliability of their components by incorporating various sources of information to achieve consistency in the data obtained. The adaptation of the Personal Outcomes Scale (POS) in Spanish has shown excellent psychometric attributes, and its administration has three sources of information: self-assessment, practitioner and family. The study of possible congruence or incongruence of observed distributions of each item between sources is therefore essential to ensure a correct interpretation of the measure. The aim of this paper was to analyse the observed distribution of items and dimensions from the three Spanish POS information sources cited earlier, using the item response theory. We studied a sample of 529 people with ID and their respective practitioners and family member, and in each case, we analysed items and factors using Samejima's model of polytomic ordinal scales. The results indicated an important number of items with differential effects regarding sources, and in some cases, they indicated significant differences in the distribution of items, factors and sources of information. As a result of this analysis, we must affirm that the administration of the POS, considering three sources of information, was adequate overall, but a correct interpretation of the results requires that it obtain much more information to consider, as well as some specific items in specific dimensions. The overall ratings, if these comments are considered, could result in bias. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
[Mokken scaling of the Cognitive Screening Test].

PubMed

Diesfeldt, H F A

2009-10-01

The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression scores compared to independent item response theory model reestimation.

PubMed

Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix

2016-03-01

To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.
A Mixed Effects Randomized Item Response Model

ERIC Educational Resources Information Center

Fox, J.-P.; Wyrick, Cheryl

2008-01-01

The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
Item Randomized-Response Models for Measuring Noncompliance: Risk-Return Perceptions, Social Influences, and Self-Protective Responses

ERIC Educational Resources Information Center

Bockenholt, Ulf; Van Der Heijden, Peter G. M.

2007-01-01

Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because: (i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite its privacy-protection mechanism the RR design may not be followed by every respondent; and…
Extending LMS to Support IRT-Based Assessment Test Calibration

NASA Astrophysics Data System (ADS)

Fotaris, Panagiotis; Mastoras, Theodoros; Mavridis, Ioannis; Manitsaris, Athanasios

Developing unambiguous and challenging assessment material for measuring educational attainment is a time-consuming, labor-intensive process. As a result Computer Aided Assessment (CAA) tools are becoming widely adopted in academic environments in an effort to improve the assessment quality and deliver reliable results of examinee performance. This paper introduces a methodological and architectural framework which embeds a CAA tool in a Learning Management System (LMS) so as to assist test developers in refining items to constitute assessment tests. An Item Response Theory (IRT) based analysis is applied to a dynamic assessment profile provided by the LMS. Test developers define a set of validity rules for the statistical indices given by the IRT analysis. By applying those rules, the LMS can detect items with various discrepancies which are then flagged for review of their content. Repeatedly executing the aforementioned procedure can improve the overall efficiency of the testing process.
Evaluating The Influence of Postsession Reinforcement on Choice of Reinforcers

PubMed Central

Kodak, Tiffany; Lerman, Dorothea C; Call, Nathan

2007-01-01

Factors that influence reinforcer choice have been examined in a number of applied studies (e.g., Neef, Mace, Shea, & Shade, 1992; Shore, Iwata, DeLeon, Kahng, & Smith, 1997; Tustin, 1994). However, no applied studies have evaluated the effects of postsession reinforcement on choice between concurrently available reinforcers, even though basic findings indicate that this is an important factor to consider (Hursh, 1978; Zeiler, 1999). In this bridge investigation, we evaluated the influence of postsession reinforcement on choice of two food items when task responding was reinforced on progressive-ratio schedules. Participants were 3 children who had been diagnosed with developmental disabilities. Results indicated that response allocation shifted from one food item to the other food item under thinner schedules of reinforcement when no postsession reinforcement was provided. These findings suggest that the efficacy of instructional programs or treatments for problem behavior may be improved by restricting reinforcers outside treatment sessions. PMID:17970264
Evaluating the influence of postsession reinforcement on choice of reinforcers.

PubMed

Kodak, Tiffany; Lerman, Dorothea C; Call, Nathan

2007-01-01

Factors that influence reinforcer choice have been examined in a number of applied studies (e.g., Neef, Mace, Shea, & Shade, 1992; Shore, Iwata, DeLeon, Kahng, & Smith, 1997; Tustin, 1994). However, no applied studies have evaluated the effects of postsession reinforcement on choice between concurrently available reinforcers, even though basic findings indicate that this is an important factor to consider (Hursh, 1978; Zeiler, 1999). In this bridge investigation, we evaluated the influence of postsession reinforcement on choice of two food items when task responding was reinforced on progressive-ratio schedules. Participants were 3 children who had been diagnosed with developmental disabilities. Results indicated that response allocation shifted from one food item to the other food item under thinner schedules of reinforcement when no postsession reinforcement was provided. These findings suggest that the efficacy of instructional programs or treatments for problem behavior may be improved by restricting reinforcers outside treatment sessions.
Have a little faith: measuring the impact of illness on positive and negative aspects of faith.

PubMed

Salsman, John M; Garcia, Sofia F; Lai, Jin-Shei; Cella, David

2012-12-01

The importance of faith and its associations with health are well documented. As part of the Patient Reported Outcomes Measurement Information System, items tapping positive and negative impact of illness (PII and NII) were developed across four content domains: Coping/Stress Response, Self-Concept, Social Connection/Isolation, and Meaning and Spirituality. Faith items were included within the concept of meaning and spirituality. This measurement model was tested on a heterogeneous group of 509 cancer survivors. To evaluate dimensionality, we applied two bi-factor models, specifying a general factor (PII or NII) and four local factors: Coping/Stress Response, Self-Concept, Social Connection/Isolation, and Meaning and Spirituality. Bi-factor analysis supported sufficient unidimensionality within PII and NII item sets. The unidimensionality of both PII and NII item sets was enhanced by extraction of the faith items from the rest of the questions. Of the 10 faith items, nine demonstrated higher local than general factor loadings (range for local factor loadings = 0.402 to 0.876), suggesting utility as a separate but related 'faith' factor. The same was true for only two of the remaining 63 items across the PII and NII item sets. Although conceptually and to a degree empirically related to Meaning and Spirituality, Faith appears to be a distinct subdomain of PII and NII, better handled by distinct assessment. A 10-item measure of the impact of illness upon faith (II-Faith) was therefore assembled. Copyright © 2011 John Wiley & Sons, Ltd.
An item response theory analysis of the Executive Interview and development of the EXIT8: A Project FRONTIER Study.

PubMed

Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E

2015-01-01

The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Lawton IADL scale in dementia: can item response theory make it more informative?

PubMed

McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

2014-07-01

impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Non-ignorable missingness item response theory models for choice effects in examinee-selected items.

PubMed

Liu, Chen-Wei; Wang, Wen-Chung

2017-11-01

Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a Rheumatoid Arthritis Population using Rasch Analysis

PubMed Central

Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan

2007-01-01

Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
The modified Memorial Symptom Assessment Scale Short Form: a modified response format and rational scoring rules.

PubMed

Sharp, J L; Gough, K; Pascoe, M C; Drosdowsky, A; Chang, V T; Schofield, P

2018-07-01

The Memorial Symptom Assessment Scale Short Form (MSAS-SF) is a widely used symptom assessment instrument. Patients who self-complete the MSAS-SF have difficulty following the two-part response format, resulting in incorrectly completed responses. We describe modifications to the response format to improve useability, and rational scoring rules for incorrectly completed items. The modified MSAS-SF was completed by 311 women in our Peer and Nurse support Trial to Assist women in Gynaecological Oncology; the PeNTAGOn study. Descriptive statistics were used to summarise completion of the modified MSAS-SF, and provide symptom statistics before and after applying the rational scoring rules. Spearman's correlations with the Functional Assessment for Cancer Therapy-General (FACT-G) and Hospital Anxiety and Depression Scale (HADS) were assessed. Correct completion of the modified MSAS-SF items ranged from 91.5 to 98.7%. The rational scoring rules increased the percentage of useable responses on average 4% across all symptoms. MSAS-SF item statistics were similar with and without the scoring rules. The pattern of correlations with FACT-G and HADS was compatible with prior research. The modified MSAS-SF was useable for self-completion and responses demonstrated validity. The rational scoring rules can minimise loss of data from incorrectly completed responses. Further investigation is recommended.
Calibrating well-being, quality of life and common mental disorder items: psychometric epidemiology in public mental health research.

PubMed

Böhnke, Jan R; Croudace, Tim J

2016-08-01

The assessment of 'general health and well-being' in public mental health research stimulates debates around relative merits of questionnaire instruments and their items. Little evidence regarding alignment or differential advantages of instruments or items has appeared to date. Population-based psychometric study of items employed in public mental health narratives. Multidimensional item response theory was applied to General Health Questionnaire (GHQ-12), Warwick-Edinburgh Mental Well-being Scale (WEMWBS) and EQ-5D items (Health Survey for England, 2010-2012; n = 19 290). A bifactor model provided the best account of the data and showed that the GHQ-12 and WEMWBS items assess mainly the same construct. Only one item of the EQ-5D showed relevant overlap with this dimension (anxiety/depression). Findings were corroborated by comparisons with alternative models and cross-validation analyses. The consequences of this lack of differentiation (GHQ-12 v. WEMWBS) for mental health and well-being narratives deserves discussion to enrich debates on priorities in public mental health and its assessment. © The Royal College of Psychiatrists 2015.
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

PubMed Central

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
Identifying the Source of Misfit in Item Response Theory Models.

PubMed

Liu, Yang; Maydeu-Olivares, Alberto

2014-01-01

When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X(2), (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X(2) with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X(2) is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.
Three-dimensional structural representation of the sleep-wake adaptability.

PubMed

Putilov, Arcady A

2016-01-01

Various characteristics of the sleep-wake cycle can determine the success or failure of individual adjustment to certain temporal conditions of the today's society. However, it remains to be explored how many such characteristics can be self-assessed and how they are inter-related one to another. The aim of the present report was to apply a three-dimensional structural representation of the sleep-wake adaptability in the form of "rugby cake" (scalene or triaxial ellipsoid) to explain the results of analysis of the pattern of correlations of the responses to the initial 320-item list of a new inventory with scores on the six scales designed for multidimensional self-assessment of the sleep-wake adaptability (Morning and Evening Lateness, Anytime and Nighttime Sleepability, and Anytime and Daytime Wakeability). The results obtained for sample consisting of 149 respondents were confirmed by the results of similar analysis of earlier collected responses of 139 respondents to the same list of 320 items and responses of 1213 respondents to the 72 items of one of the earlier established questionnaire tools. Empirical evidence was provided in support of the model-driven prediction of the possibility to identify items linked to as many as 36 narrow (6 core and 30 mixed) adaptabilities of the sleep-wake cycle. The results enabled the selection of 168 items for self-assessment of all these adaptabilities predicted by the rugby cake model.
Multidimensional student skills with collaborative filtering

NASA Astrophysics Data System (ADS)

Bergner, Yoav; Rayyan, Saif; Seaton, Daniel; Pritchard, David E.

2013-01-01

Despite the fact that a physics course typically culminates in one final grade for the student, many instructors and researchers believe that there are multiple skills that students acquire to achieve mastery. Assessment validation and data analysis in general may thus benefit from extension to multidimensional ability. This paper introduces an approach for model determination and dimensionality analysis using collaborative filtering (CF), which is related to factor analysis and item response theory (IRT). Model selection is guided by machine learning perspectives, seeking to maximize the accuracy in predicting which students will answer which items correctly. We apply the CF to response data for the Mechanics Baseline Test and combine the results with prior analysis using unidimensional IRT.
A new computerized adaptive test advancing the measurement of health-related quality of life (HRQoL) in children: the Kids-CAT.

PubMed

Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U

2015-04-01

Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.

Analysis of Validity and Reliability of the Health Literacy Index for Female Marriage Immigrants (HLI-FMI).

PubMed

Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok

2016-05-01

The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
Conditional Covariance-Based Nonparametric Multidimensionality Assessment.

ERIC Educational Resources Information Center

Stout, William; And Others

1996-01-01

Three nonparametric procedures that use estimates of covariances of item-pair responses conditioned on examinee trait level for assessing dimensionality of a test are described. The HCA/CCPROX, DIMTEST, and DETECT are applied to a dimensionality study of the Law School Admission Test. (SLD)
Developmental performance of 5-year-old Bulgarian children-An example of translational neuroscience in practice.

PubMed

Yordanova, Ralitsa; Ivanov, Ivan

2018-04-25

Developmental testing is essential for early recognition of the various developmental impairments. The tools used should be composed of items that are age specific, adapted, and standardized for the population they are applied to. The achievements of neurosciences, medicine, psychology, pedagogy, etc. are applied in the elaboration of a comprehensive examination tool that should screen all major areas of development. The key age of 5 years permits identification of almost all major developmental disabilities leaving time for therapeutic intervention before school entrance. The aim of the research is to evaluate the developmental performance of 5-year-old Bulgarian children using the approach of translation neuroscience. A comprehensive test program was developed composed of 89 items grouped in the following domains: fine and gross motor development, coordination and balance, central motor neuron disturbances, language development and articulation, perception, attention and behavior, visual acuity, and strabismus. The overall sample comprises 434 children of mean age 63.5 months (SD-3.7). Male to female ratio is 1:1.02. From this group, 390 children are between 60 and 71 months of age. The children are examined in 51 kindergartens in 21 villages and 18 cities randomly chosen in southern Bulgaria. Eight children were excluded from the final analysis because they fulfilled less than 50% of the test items (7 children did not cooperate and 1 child was with autistic spectrum disorder). The items with abnormal response in less than 5% of the children are 43. The items with abnormal response in 6% to 35% of the children are 37. The items with high abnormal response (more than 35%) rate are only 9. The test is an example of a translational approach in neuroscience. On one hand, it is based on the results of several sciences studying growth and development from different perspective. On the other hand, the results from the present research may be implemented in other fields of child development-education, psychology, speech and language therapy, and intervention programs. © 2018 John Wiley & Sons, Ltd.
Using structural equation modeling to detect response shifts and true change in discrete variables: an application to the items of the SF-36.

PubMed

Verdam, Mathilde G E; Oort, Frans J; Sprangers, Mirjam A G

2016-06-01

The structural equation modeling (SEM) approach for detection of response shift (Oort in Qual Life Res 14:587-598, 2005. doi: 10.1007/s11136-004-0830-y ) is especially suited for continuous data, e.g., questionnaire scales. The present objective is to explain how the SEM approach can be applied to discrete data and to illustrate response shift detection in items measuring health-related quality of life (HRQL) of cancer patients. The SEM approach for discrete data includes two stages: (1) establishing a model of underlying continuous variables that represent the observed discrete variables, (2) using these underlying continuous variables to establish a common factor model for the detection of response shift and to assess true change. The proposed SEM approach was illustrated with data of 485 cancer patients whose HRQL was measured with the SF-36, before and after start of antineoplastic treatment. Response shift effects were detected in items of the subscales mental health, physical functioning, role limitations due to physical health, and bodily pain. Recalibration response shifts indicated that patients experienced relatively fewer limitations with "bathing or dressing yourself" (effect size d = 0.51) and less "nervousness" (d = 0.30), but more "pain" (d = -0.23) and less "happiness" (d = -0.16) after antineoplastic treatment as compared to the other symptoms of the same subscale. Overall, patients' mental health improved, while their physical health, vitality, and social functioning deteriorated. No change was found for the other subscales of the SF-36. The proposed SEM approach to discrete data enables response shift detection at the item level. This will lead to a better understanding of the response shift phenomena at the item level and therefore enhances interpretation of change in the area of HRQL.
What is the Ability Emotional Intelligence Test (MSCEIT) good for? An evaluation using item response theory.

PubMed

Fiori, Marina; Antonietti, Jean-Philippe; Mikolajczak, Moira; Luminet, Olivier; Hansenne, Michel; Rossier, Jérôme

2014-01-01

The ability approach has been indicated as promising for advancing research in emotional intelligence (EI). However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.
Development of the Sexual Minority Adolescent Stress Inventory

PubMed Central

Schrager, Sheree M.; Goldbach, Jeremy T.; Mamey, Mary Rose

2018-01-01

Although construct measurement is critical to explanatory research and intervention efforts, rigorous measure development remains a notable challenge. For example, though the primary theoretical model for understanding health disparities among sexual minority (e.g., lesbian, gay, bisexual) adolescents is minority stress theory, nearly all published studies of this population rely on minority stress measures with poor psychometric properties and development procedures. In response, we developed the Sexual Minority Adolescent Stress Inventory (SMASI) with N = 346 diverse adolescents ages 14–17, using a comprehensive approach to de novo measure development designed to produce a measure with desirable psychometric properties. After exploratory factor analysis on 102 candidate items informed by a modified Delphi process, we applied item response theory techniques to the remaining 72 items. Discrimination and difficulty parameters and item characteristic curves were estimated overall, within each of 12 initially derived factors, and across demographic subgroups. Two items were removed for excessive discrimination and three were removed following reliability analysis. The measure demonstrated configural and scalar invariance for gender and age; a three-item factor was excluded for demonstrating substantial differences by sexual identity and race/ethnicity. The final 64-item measure comprised 11 subscales and demonstrated excellent overall (α = 0.98), subscale (α range 0.75–0.96), and test–retest (scale r > 0.99; subscale r range 0.89–0.99) reliabilities. Subscales represented a mix of proximal and distal stressors, including domains of internalized homonegativity, identity management, intersectionality, and negative expectancies (proximal) and social marginalization, family rejection, homonegative climate, homonegative communication, negative disclosure experiences, religion, and work domains (distal). Thus, the SMASI development process illustrates a method to incorporate information from multiple sources, including item response theory models, to guide item selection in building a psychometrically sound measure. We posit that similar methods can be used to improve construct measurement across all areas of psychological research, particularly in areas where a strong theoretical framework exists but existing measures are limited. PMID:29599737
Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment.

PubMed

Fayers, Peter M

2007-01-01

We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
Comparison of composite measures of disease activity in an early seropositive rheumatoid arthritis cohort

PubMed Central

Ranganath, Veena K; Yoon, Jeonglim; Khanna, Dinesh; Park, Grace S; Furst, Daniel E; Elashoff, David A; Jawaheer, Damini; Sharp, John T; Gold, Richard H; Keystone, Edward C; Paulus, Harold E

2007-01-01

Objective To evaluate concordance and agreement of the original DAS44/ESR‐4 item composite disease activity status measure with nine simpler derivatives when classifying patient responses by European League of Associations for Rheumatology (EULAR) criteria, using an early rheumatoid factor positive (RF+) rheumatoid arthritis (RA) patient cohort. Methods Disease‐modifying anti‐rheumatic drug‐naïve RF+ patients (n = 223; mean duration of symptoms, 6 months) were categorised as ACR none/20/50/70 responders. One‐way analysis of variance and two‐sample t tests were used to investigate the relationship between the ACR response groups and each composite measure. EULAR reached/change cut‐point scores were calculated for each composite measure. EULAR (good/moderate/none) responses for each composite measure and the degree of agreement with the DAS44/ESR‐4 item were calculated for 203 patients. Results Patients were mostly female (78%) with moderate to high disease activity. A centile‐based nomogram compared equivalent composite measure scores. Changes from baseline in the composite measures in patients with ACRnone were significantly less than those of ACR20/50/70 responders, and those for ACR50 were significantly different from those for ACR70. EULAR reached/change cut‐point scores for our cohort were similar to published cut‐points. When compared with the DAS44/ESR‐4 item, EULAR (good/moderate/none) percentage agreements were 92 with the DAS44/ESR‐3 item, 74 with the Clinical Disease Activity Index, and 80 with the DAS28/ESR‐4 item, the DAS28/CRP‐4 item and the Simplified Disease Activity Index. Conclusion The relationships of nine different RA composite measures against the DAS44/ESR‐4 item when applied to a cohort of seropositive patients with early RA are described. Each of these simplified status and response measures could be useful in assessing patients with RA, but the specific measure selected should be pre‐specified and described for each study. PMID:17472996
The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

PubMed

Tendeiro, Jorge N

2017-01-01

Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.
Confirmatory Factor Analysis of the Finnish Job Content Questionnaire (JCQ) in 590 Professional Musicians.

PubMed

Vastamäki, Heidi; Vastamäki, Martti; Laimi, Katri; Saltychev, Michail

2017-07-01

Poorly functioning work environments may lead to dissatisfaction for the employees and financial loss for the employers. The Job Content Questionnaire (JCQ) was designed to measure social and psychological characteristics of work environments. To investigate the factor construct of the Finnish 14-item version of JCQ when applied to professional orchestra musicians. In a cross-sectional survey, the questionnaire was sent by mail to 1550 orchestra musicians and students. 630 responses were received. Full data were available for 590 respondents (response rate 38%).The questionnaire also contained questions on demographics, job satisfaction, health status, health behaviors, and intensity of playing music. Confirmatory factor analysis of the 2-factor model of JCQ was conducted. Of the 5 estimates, JCQ items in the "job demand" construct, the "conflicting demands" (question 5) explained most of the total variance in this construct (79%) demonstrating almost perfect correlation of 0.63. In the construct of "job control," "opinions influential" (question 10) demonstrated a perfect correlation index of 0.84 and the items "little decision freedom" (question 14) and "allows own decisions" (question 6) showed substantial correlations of 0.77 and 0.65. The 2-factor model of the Finnish 14-item version of JCQ proposed in this study fitted well into the observed data. The "conflicting demands," "opinions influential," "little decision freedom," and "allows own decisions" items demonstrated the strongest correlations with latent factors suggesting that in a population similar to the studied one, especially these items should be taken into account when observed in the response of a population.
Proposal for a unified selection to medical residency programs.

PubMed

Toffoli, Sônia Ferreira Lopes; Ferreira Filho, Olavo Franco; Andrade, Dalton Francisco de

2013-01-01

This paper proposes the unification of entrance exams to medical residency programs (MRP) in Brazil. Problems related to MRP and its interface with public health problems in Brazil are highlighted and how this proposal are able to help solving these problems. The proposal is to create a database to be applied in MRP unified exams. Some advantages of using the Item Response Theory (IRT) in this database are highlighted. The MRP entrance exams are developed and applied decentralized where each school is responsible for its examination. These exams quality are questionable. Reviews about items quality, validity and reliability of appliances are not common disclosed. Evaluation is important in every education system bringing on required changes and control of teaching and learning. The proposal of MRP entrance exams unification, besides offering high quality exams to institutions participants, could be as an extra source to rate medical school and cause improvements, provide studies with a database and allow a regional mobility. Copyright © 2013 Elsevier Editora Ltda. All rights reserved.
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning.

PubMed

Kim, Kyong-Jee; Hwang, Jee-Young

2016-03-01

Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students' experience with ubiquitous testing and its impact on student learning. A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students' experiences of ubiquitous testing. The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings.
integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory.

PubMed

Tong, Pan; Coombes, Kevin R

2012-11-15

Identifying genes altered in cancer plays a crucial role in both understanding the mechanism of carcinogenesis and developing novel therapeutics. It is known that there are various mechanisms of regulation that can lead to gene dysfunction, including copy number change, methylation, abnormal expression, mutation and so on. Nowadays, all these types of alterations can be simultaneously interrogated by different types of assays. Although many methods have been proposed to identify altered genes from a single assay, there is no method that can deal with multiple assays accounting for different alteration types systematically. In this article, we propose a novel method, integration using item response theory (integIRTy), to identify altered genes by using item response theory that allows integrated analysis of multiple high-throughput assays. When applied to a single assay, the proposed method is more robust and reliable than conventional methods such as Student's t-test or the Wilcoxon rank-sum test. When used to integrate multiple assays, integIRTy can identify novel-altered genes that cannot be found by looking at individual assay separately. We applied integIRTy to three public cancer datasets (ovarian carcinoma, breast cancer, glioblastoma) for cross-assay type integration which all show encouraging results. The R package integIRTy is available at the web site http://bioinformatics.mdanderson.org/main/OOMPA:Overview. kcoombes@mdanderson.org. Supplementary data are available at Bioinformatics online.
Modeling the dynamics of recognition memory testing with an integrated model of retrieval and decision making.

PubMed

Osth, Adam F; Jansson, Anna; Dennis, Simon; Heathcote, Andrew

2018-08-01

A robust finding in recognition memory is that performance declines monotonically across test trials. Despite the prevalence of this decline, there is a lack of consensus on the mechanism responsible. Three hypotheses have been put forward: (1) interference is caused by learning of test items (2) the test items cause a shift in the context representation used to cue memory and (3) participants change their speed-accuracy thresholds through the course of testing. We implemented all three possibilities in a combined model of recognition memory and decision making, which inherits the memory retrieval elements of the Osth and Dennis (2015) model and uses the diffusion decision model (DDM: Ratcliff, 1978) to generate choice and response times. We applied the model to four datasets that represent three challenges, the findings that: (1) the number of test items plays a larger role in determining performance than the number of studied items, (2) performance decreases less for strong items than weak items in pure lists but not in mixed lists, and (3) lexical decision trials interspersed between recognition test trials do not increase the rate at which performance declines. Analysis of the model's parameter estimates suggests that item interference plays a weak role in explaining the effects of recognition testing, while context drift plays a very large role. These results are consistent with prior work showing a weak role for item noise in recognition memory and that retrieval is a strong cause of context change in episodic memory. Copyright © 2018 Elsevier Inc. All rights reserved.
Scoring best-worst data in unbalanced many-item designs, with applications to crowdsourcing semantic judgments.

PubMed

Hollis, Geoff

2018-04-01

Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.
Does segmental overlap help or hurt? Evidence from blocked cyclic naming in spoken and written production.

PubMed

Breining, Bonnie; Nozari, Nazbanou; Rapp, Brenda

2016-04-01

Past research has demonstrated interference effects when words are named in the context of multiple items that share a meaning. This interference has been explained within various incremental learning accounts of word production, which propose that each attempt at mapping semantic features to lexical items induces slight but persistent changes that result in cumulative interference. We examined whether similar interference-generating mechanisms operate during the mapping of lexical items to segments by examining the production of words in the context of others that share segments. Previous research has shown that initial-segment overlap amongst a set of target words produces facilitation, not interference. However, this initial-segment facilitation is likely due to strategic preparation, an external factor that may mask underlying interference. In the present study, we applied a novel manipulation in which the segmental overlap across target items was distributed unpredictably across word positions, in order to reduce strategic response preparation. This manipulation led to interference in both spoken (Exp. 1) and written (Exp. 2) production. We suggest that these findings are consistent with a competitive learning mechanism that applies across stages and modalities of word production.
An Investigation of Factors Affecting the Degree of Naïve Impetus Theory Application

NASA Astrophysics Data System (ADS)

Liu, Xiufeng; MacIsaac, Dan

2005-03-01

This study investigates factors affecting the degree of novice physics students' application of the naïve impetus theory. Six hundred and fourteen first-year university engineering physics students answered the Force Concept Inventory as a pre-test for their calculus-based course. We examined the degree to which students consistently applied the naïve impetus theory across different items. We used a 2-way repeated measures ANOVA and linear regression to analyze data coded from incorrect student responses. It was found that there were statistically significant main effects for item familiarity and item requirement for explanation vs. prediction on the measured degree of impetus theory application. Student course grades had no significant effect on impetus theory application. When faced with items that were unfamiliar and predictive, students appeared to rely on non-theoretical, knowledge-in-pieces reasoning. Reasoning characteristic of naïve theories was more frequently applied when students were completing familiar problem tasks that required explanation. When considering all the above factors simultaneously, we found that the degree of naïve impetus theory application by students is attributable to variables in the following order: familiarity, prediction, and explanation.
1999 Survey of Active Duty Personnel: Administration, Datasets, and Codebook. Appendix G: Frequency and Percentage Distributions for Variables in the Survey Analysis Files.

DTIC Science & Technology

2000-12-01

A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Multiple Hypnotizabilities: Differentiating the Building Blocks of Hypnotic Response

ERIC Educational Resources Information Center

Woody, Erik Z.; Barnier, Amanda J.; McConkey, Kevin M.

2005-01-01

Although hypnotizability can be conceptualized as involving component subskills, standard measures do not differentiate them from a more general unitary trait, partly because the measures include limited sets of dichotomous items. To overcome this, the authors applied full-information factor analysis, a sophisticated analytic approach for…
Structure and Measurement of Depression in Youth: Applying Item Response Theory to Clinical Data

PubMed Central

Cole, David A.; Cai, Li; Martin, Nina C.; Findling, Robert L; Youngstrom, Eric A.; Garber, Judy; Curry, John F.; Hyde, Janet S.; Essex, Marilyn J.; Compas, Bruce E.; Goodyer, Ian M.; Rohde, Paul; Stark, Kevin D.; Slattery, Marcia J.; Forehand, Rex

2013-01-01

Goals of the paper were to use item response theory (IRT) to assess the relation of depressive symptoms to the underlying dimension of depression and to demonstrate how IRT-based measurement strategies can yield more reliable data about depression severity than conventional symptom counts. Participants were 3403 clinic and nonclinic children and adolescents from 12 contributing samples, all of whom received the Kiddie Schedule of Affective Disorders and Schizophrenia for school-aged children. Results revealed that some symptoms reflected higher levels of depression and were more discriminating than others. Results further demonstrated that utilization of IRT-based information about symptom severity and discriminability in the measurement of depression severity can reduce measurement error and increase measurement fidelity. PMID:21534696

Which kind of psychometrics is adequate for patient satisfaction questionnaires?

PubMed

Konerding, Uwe

2016-01-01

The construction and psychometric analysis of patient satisfaction questionnaires are discussed. The discussion is based upon the classification of multi-item questionnaires into scales or indices. Scales consist of items that describe the effects of the latent psychological variable to be measured, and indices consist of items that describe the causes of this variable. Whether patient satisfaction questionnaires should be constructed and analyzed as scales or as indices depends upon the purpose for which these questionnaires are required. If the final aim is improving care with regard to patients' preferences, then these questionnaires should be constructed and analyzed as indices. This implies two requirements: 1) items for patient satisfaction questionnaires should be selected in such a way that the universe of possible causes of patient satisfaction is covered optimally and 2) Cronbach's alpha, principal component analysis, exploratory factor analysis, confirmatory factor analysis, and analyses with models from item response theory, such as the Rasch Model, should not be applied for psychometric analyses. Instead, multivariate regression analyses with a direct rating of patient satisfaction as the dependent variable and the individual questionnaire items as independent variables should be performed. The coefficients produced by such an analysis can be applied for selecting the best items and for weighting the selected items when a sum score is determined. The lower boundaries of the validity of the unweighted and the weighted sum scores can be estimated by their correlations with the direct satisfaction rating. While the first requirement is fulfilled in the majority of the previous patient satisfaction questionnaires, the second one deviates from previous practice. Hence, if patient satisfaction is actually measured with the final aim of improving care with regard to patients' preferences, then future practice should be changed so that the second requirement is also fulfilled.
A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

PubMed

Massof, Robert W

2014-10-01

A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Student perceptions of gamified audience response system interactions in large group lectures and via lecture capture technology.

PubMed

Pettit, Robin K; McCoy, Lise; Kinney, Marjorie; Schwartz, Frederic N

2015-05-22

Higher education students have positive attitudes about the use of audience response systems (ARS), but even technology-enhanced lessons can become tiresome if the pedagogical approach is exactly the same with each implementation. Gamification is the notion that gaming mechanics can be applied to routine activities. In this study, TurningPoint (TP) ARS interactions were gamified and implemented in 22 large group medical microbiology lectures throughout an integrated year 1 osteopathic medical school curriculum. A 32-item questionnaire was used to measure students' perceptions of the gamified TP interactions at the end of their first year. The survey instrument generated both Likert scale and open-ended response data that addressed game design and variety, engagement and learning features, use of TP questions after class, and any value of lecture capture technology for reviewing these interactive presentations. The Chi Square Test was used to analyze grouped responses to Likert scale questions. Responses to open-ended prompts were categorized using open-coding. Ninety-one students out of 106 (86 %) responded to the survey. A significant majority of the respondents agreed or strongly agreed that the games were engaging, and an effective learning tool. The questionnaire investigated the degree to which specific features of these interactions were engaging (nine items) and promoted learning (seven items). The most highly ranked engagement aspects were peer competition and focus on the activity (tied for highest ranking), and the most highly ranked learning aspect was applying theoretical knowledge to clinical scenarios. Another notable item was the variety of interactions, which ranked in the top three in both the engagement and learning categories. Open-ended comments shed light on how students use TP questions for exam preparation, and revealed engaging and non-engaging attributes of these interactive sessions for students who review them via lecture capture. Students clearly valued the engagement and learning aspects of gamified TP interactions. The overwhelming majority of students surveyed in this study were engaged by the variety of TP games, and gained an interest in microbiology. The methods described in this study may be useful for other educators wishing to expand the utility of ARS in their classrooms.
A Practical Guide to Check the Consistency of Item Response Patterns in Clinical Research Through Person-Fit Statistics: Examples and a Computer Program.

PubMed

Meijer, Rob R; Niessen, A Susan M; Tendeiro, Jorge N

2016-02-01

Although there are many studies devoted to person-fit statistics to detect inconsistent item score patterns, most studies are difficult to understand for nonspecialists. The aim of this tutorial is to explain the principles of these statistics for researchers and clinicians who are interested in applying these statistics. In particular, we first explain how invalid test scores can be detected using person-fit statistics; second, we provide the reader practical examples of existing studies that used person-fit statistics to detect and to interpret inconsistent item score patterns; and third, we discuss a new R-package that can be used to identify and interpret inconsistent score patterns. © The Author(s) 2015.
Optimization of injection molding process parameters for a plastic cell phone housing component

NASA Astrophysics Data System (ADS)

Rajalingam, Sokkalingam; Vasant, Pandian; Khe, Cheng Seong; Merican, Zulkifli; Oo, Zeya

2016-11-01

To produce thin-walled plastic items, injection molding process is one of the most widely used application tools. However, to set optimal process parameters is difficult as it may cause to produce faulty items on injected mold like shrinkage. This study aims at to determine such an optimum injection molding process parameters which can reduce the fault of shrinkage on a plastic cell phone cover items. Currently used setting of machines process produced shrinkage and mis-specified length and with dimensions below the limit. Thus, for identification of optimum process parameters, maintaining closer targeted length and width setting magnitudes with minimal variations, more experiments are needed. The mold temperature, injection pressure and screw rotation speed are used as process parameters in this research. For optimal molding process parameters the Response Surface Methods (RSM) is applied. The major contributing factors influencing the responses were identified from analysis of variance (ANOVA) technique. Through verification runs it was found that the shrinkage defect can be minimized with the optimal setting found by RSM.
Dimensions of Acculturation in Native American College Students

ERIC Educational Resources Information Center

Reynolds, Amy L.; Sodano, Sandro M.; Ecklund, Timothy R.; Guyker, Wendy

2012-01-01

Exploratory and confirmatory factor analyses were applied to the responses of two respective independent samples of Native American college students on the Native American Acculturation Scale (NAAS). Three correlated dimensions were found to underlie NAAS items and these dimensions may also comprise a broader higher order dimension of Native…
Cross-cultural adaptation and construct validity of the Korean version of a physical activity measure for community-dwelling elderly.

PubMed

Choi, Bongsam

2018-01-01

[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Inter-rater reliability of the PIPES tool: validation of a surgical capacity index for use in resource-limited settings.

PubMed

Markin, Abraham; Barbero, Roxana; Leow, Jeffrey J; Groen, Reinou S; Perlman, Greg; Habermann, Elizabeth B; Apelgren, Keith N; Kushner, Adam L; Nwomeh, Benedict C

2014-09-01

In response to the need for simple, rapid means of quantifying surgical capacity in low resource settings, Surgeons OverSeas (SOS) developed the personnel, infrastructure, procedures, equipment and supplies (PIPES) tool. The present investigation assessed the inter-rater reliability of the PIPES tool. As part of a government assessment of surgical services in Santa Cruz, Bolivia, the PIPES tool was translated into Spanish and applied in interviews with physicians at 31 public hospitals. An additional interview was conducted with nurses at a convenience sample of 25 of these hospitals. Physician and nurse responses were then compared to generate an estimate of reliability. For dichotomous survey items, inter-rater reliability between physicians and nurses was assessed using the Cohen's kappa statistic and percent agreement. The Pearson correlation coefficient was used to assess agreement for continuous items. Cohen's kappa was 0.46 for infrastructure, 0.43 for procedures, 0.26 for equipment, and 0 for supplies sections. The median correlation coefficient was 0.91 for continuous items. Correlation was 0.79 for the PIPES index, and ranged from 0.32 to 0.98 for continuous response items. Reliability of the PIPES tool was moderate for the infrastructure and procedures sections, fair for the equipment section, and poor for supplies section when comparing surgeons' responses to nurses' responses-an extremely rigorous test of reliability. These results indicate that the PIPES tool is an effective measure of surgical capacity but that the equipment and supplies sections may need to be revised.
Validation of a single summary score for the Prolapse/Incontinence Sexual Questionnaire-IUGA revised (PISQ-IR).

PubMed

Constantine, Melissa L; Pauls, Rachel N; Rogers, Rebecca R; Rockwood, Todd H

2017-12-01

The Prolapse/Incontinence Sexual Questionnaire-International Urogynecology Association (IUGA) Revised (PISQ-IR) measures sexual function in women with pelvic floor disorders (PFDs) yet is unwieldy, with six individual subscale scores for sexually active women and four for women who are not. We hypothesized that a valid and responsive summary score could be created for the PISQ-IR. Item response data from participating women who completed a revised version of the PISQ-IR at three clinical sites were used to generate item weights using a magnitude estimation (ME) and Q-sort (Q) approaches. Item weights were applied to data from the original PISQ-IR validation to generate summary scores. Correlation and factor analysis methods were used to evaluate validity and responsiveness of summary scores. Weighted and nonweighted summary scores for the sexually active PISQ-IR demonstrated good criterion validity with condition-specific measures: Incontinence Severity Index = 0.12, 0.11, 0.11; Pelvic Floor Distress Inventory-20 = 0.39, 0.39, 0.12; Epidemiology of Prolapse and Incontinence Questionnaire-Q35 = 0.26 0,.25, 0.40); Female Sexual Functioning Index subscale total score = 0.72, 0.75, 0.72 for nonweighted, ME, and Q summary scores, respectively. Responsiveness evaluation showed weighted and nonweighted summary scores detected moderate effect sizes (Cohen's d > 0.5). Weighted items for those NSA demonstrated significant floor effects and did not meet criterion validity. A PISQ-IR summary score for use with sexually active women, nonweighted or calculated with ME or Q item weights, is a valid and reliable measure for clinical use. The summary scores provide value for assesing clinical treatment of pelvic floor disorders.
Measures of emergency preparedness contributing to nursing home resilience.

PubMed

Lane, Sandi J; McGrady, Elizabeth

2017-12-13

Resilience approaches have been successfully applied in crisis management, disaster response, and high reliability organizations and have the potential to enhance existing systems of nursing home disaster preparedness. This study's purpose was to determine how the Center for Medicare and Medicaid Services (CMS) "Emergency Preparedness Checklist Recommended Tool for Effective Health Care Facility Planning" contributes to organizational resilience by identifying the benchmark resilience items addressed by the CMS Emergency Preparedness Checklist and items not addressed by the CMS Emergency Preparedness Checklist, and to recommend tools and processes to improve resilience for nursing homes. The CMS Emergency Preparedness Checklist items were compared to the Resilience Benchmark Tool items; similar items were considered matches. Resilience Benchmark Tool items with no CMS Emergency Preparedness Checklist item matches were considered breaches in nursing home resilience. The findings suggest that the CMS Emergency Preparedness Checklist can be used to measure some aspects of resilience, however, there were many resilience factors not addressed. For nursing homes to prepare and respond to crisis situations, organizations need to embrace a culture that promotes individual resilience-related competencies that when aggregated enable the organization to improve its resiliency. Social workers have the skills and experience to facilitate this change.
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning

PubMed Central

Kim, Kyong-Jee; Hwang, Jee-Young

2016-01-01

Purpose: Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students’ experience with ubiquitous testing and its impact on student learning. Methods: A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students’ experiences of ubiquitous testing. Results: The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Conclusion: Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings. PMID:26838569
Partnering with patients using social media to develop a hypertension management instrument.

PubMed

Kear, Tamara; Harrington, Magdalena; Bhattacharya, Anand

2015-09-01

Hypertension is a lifelong condition; thus, long-term adherence to lifestyle modification, self-monitoring, and medication regimens remains a challenge for patients. The aim of this study was to develop a patient-reported hypertension instrument that measured attitudes, lifestyle behaviors, adherence, and barriers to hypertension management using patient-reported outcome data. The study was conducted using the Open Research Exchange software platform created by PatientsLikeMe. A total of 360 participants completed the psychometric phase of the study; incomplete responses were obtained from 147 patients, and 150 patients opted out. Principal component analysis with orthogonal (varimax) rotation was executed on a data set with all completed responses (N = 249) and applied to 43 items. Based on the review of the factor solution, eigenvalues, and item loadings, 16 items were eliminated and model with 29 items was tested. The process was repeated two more times until final model with 14 items was established. In interpreting the rotated factor pattern, an item was said to load on any given component if the factor loading was ≥0.40 for that component and was <0.40 for the other. In addition to the newly generated instrument, demographic and self-reported clinical characteristics of the study participants such as the type of prescribed hypertension medications, frequency of blood pressure monitoring, and comorbid conditions were examined. The Open Research Exchange platform allowed for ongoing input from patients through each stage of the 14-item instrument development. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Process-specific analysis in episodic memory retrieval using fast optical signals and hemodynamic signals in the right prefrontal cortex

NASA Astrophysics Data System (ADS)

Dong, Sunghee; Jeong, Jichai

2018-02-01

Objective. Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. Approach. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. Main results. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. Significance. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
Process-specific analysis in episodic memory retrieval using fast optical signals and hemodynamic signals in the right prefrontal cortex.

PubMed

Dong, Sunghee; Jeong, Jichai

2018-02-01

Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
Development of an item bank and computer adaptive test for role functioning.

PubMed

Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B

2012-11-01

Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
The Recovery Knowledge Inventory for Measurement of Nursing Student Views on Recovery-oriented Mental Health Services.

PubMed

Happell, Brenda; Byrne, Louise; Platania-Phung, Chris

2015-01-01

Recovery-oriented services are a goal for policy and practice in the Australian mental health service system. Evidence-based reform requires an instrument to measure knowledge of recovery concepts. The Recovery Knowledge Inventory (RKI) was designed for this purpose, however, its suitability and validity for student health professionals has not been evaluated. The purpose of the current article is to report the psychometric features of the RKI for measuring nursing students' views on recovery. The RKI, a self-report measure, consists of four scales: (I) Roles and Responsibilities, (II) Non-Linearity of the Recovery Process, (III) Roles of Self-Definition and Peers, and (IV) Expectations Regarding Recovery. Confirmatory and exploratory factor analyses of the baseline data (n = 167) were applied to assess validity and reliability. Exploratory factor analyses generally replicated the item structure suggested by the three main scales, however more stringent analyses (confirmatory factor analysis) did not provide strong support for convergent validity. A refined RKI with 16 items had internal reliabilities of α = .75 for Roles and Responsibilities, α = .49 for Roles of Self-Definition and Peers, and α = .72, for Recovery as Non-Linear Process. If the RKI is to be applied to nursing student populations, the conceptual underpinning of the instrument needs to be reworked, and new items should be generated to evaluate and improve scale validity and reliability.
An NCME Instructional Module on Polytomous Item Response Theory Models

ERIC Educational Resources Information Center

Penfield, Randall David

2014-01-01

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Measurement of multiple nicotine dependence domains among cigarette, non-cigarette and poly-tobacco users: Insights from item response theory.

PubMed

Strong, David R; Messer, Karen; Hartman, Sheri J; Conway, Kevin P; Hoffman, Allison C; Pharris-Ciurej, Nikolas; White, Martha; Green, Victoria R; Compton, Wilson M; Pierce, John

2015-07-01

Nicotine dependence (ND) is a key construct that organizes physiological and behavioral symptoms associated with persistent nicotine intake. Measurement of ND has focused primarily on cigarette smokers. Thus, validation of brief instruments that apply to a broad spectrum of tobacco product users is needed. We examined multiple domains of ND in a longitudinal national study of the United States population, the United States National Epidemiological Survey of Alcohol and Related Conditions (NESARC). We used methods based in item response theory to identify and validate increasingly brief measures of ND that included symptoms to assess ND similarly among cigarette, cigar, smokeless, and poly tobacco users. Confirmatory factor analytic models supported a single, primary dimension underlying symptoms of ND across tobacco use groups. Differential Item Functioning (DIF) analysis generated little support for systematic differences in response to symptoms of ND across tobacco use groups. We established significant concurrent and predictive validity of brief 3- and 5-symptom indices for measuring ND. Measuring ND across tobacco use groups with a common set of symptoms facilitates evaluation of tobacco use in an evolving marketplace of tobacco and nicotine products. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

ERIC Educational Resources Information Center

Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

2011-01-01

The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…

Application of Three Cognitive Diagnosis Models to ESL Reading and Listening Assessments

ERIC Educational Resources Information Center

Lee, Yong-Won; Sawaki, Yasuyo

2009-01-01

The present study investigated the functioning of three psychometric models for cognitive diagnosis--the general diagnostic model, the fusion model, and latent class analysis--when applied to large-scale English as a second language listening and reading comprehension assessments. Data used in this study were scored item responses and incidence…
Applying Item Response Theory Modeling in Educational Research

ERIC Educational Resources Information Center

Le, Dai-Trang

2013-01-01

In an effort to understand how school boards in America's K-12 school system function, a research collaboration was undertaken among four agencies: the National School Boards Association, the Thomas B. Fordham Institute, the Iowa Association of School Boards, and the Wallace Foundation. These groups joined effort to conduct research on school…
Applying Kaplan-Meier to Item Response Data

ERIC Educational Resources Information Center

McNeish, Daniel

2018-01-01

Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
Assessing the Accuracy and Consistency of Language Proficiency Classification under Competing Measurement Models

ERIC Educational Resources Information Center

Zhang, Bo

2010-01-01

This article investigates how measurement models and statistical procedures can be applied to estimate the accuracy of proficiency classification in language testing. The paper starts with a concise introduction of four measurement models: the classical test theory (CTT) model, the dichotomous item response theory (IRT) model, the testlet response…
Exploring Unidimensional Proficiency Classification Accuracy from Multidimensional Data in a Vertical Scaling Context

ERIC Educational Resources Information Center

Kroopnick, Marc Howard

2010-01-01

When Item Response Theory (IRT) is operationally applied for large scale assessments, unidimensionality is typically assumed. This assumption requires that the test measures a single latent trait. Furthermore, when tests are vertically scaled using IRT, the assumption of unidimensionality would require that the battery of tests across grades…
Estimation of Latent Group Effects: Psychometric Technical Report No. 2.

ERIC Educational Resources Information Center

Mislevy, Robert J.

Conventional methods of multivariate normal analysis do not apply when the variables of interest are not observed directly, but must be inferred from fallible or incomplete data. For example, responses to mental test items may depend upon latent aptitude variables, which modeled in turn as functions of demographic effects in the population. A…
Information Needs within a Multi-District Environment.

ERIC Educational Resources Information Center

Thomas, Gregory P.

This paper argues that no single measurement strategy serves all purposes and that applying methods and techniques which allow a variety of data elements to be retrieved and juxtaposed may be an investment in the future. Item response theory, Rasch model, and latent trait theory are all approaches to a single conceptual topic. An abbreviated look…
Exploring the impact of disability on self-determination measurement.

PubMed

Mumbardó-Adam, Cristina; Guàrdia-Olmos, Joan; Giné, Climent

2018-07-01

Self-determination is a psychological construct that applies to both the general population and to individuals with disabilities that can be self-determined with adequate accommodations and opportunities. As the relevance of self-determination-related skills in life has been recently acknowledged, researchers have created a measure to assess self-determination in adolescents and young adults with and without disabilities. The Self-Determination Inventory: Student Report (Spanish interim version) is empirically being validated into Spanish. As this scale is the first assessment addressed to all youth, further exploration of its psychometric properties is required to ensure the reliability of the self-determination measurement and gain further insight into the construct when applied to youth with and without disabilities. More than 600 participants were asked to complete the scale. The impact of disability on the item response distributions across the dimensions of self-determination was explored. Differential item functioning (DIF) was found in only 5 of the scale's 45 items. Differences primary favored youth without disabilities. The weak presence of DIF across the items supports the instrument's psychometrical robustness when measuring self-determination in youth with and without disabilities and provides further understanding of the self-determination construct. Implications and future research directions are also discussed. Copyright © 2018 Elsevier Ltd. All rights reserved.
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

ERIC Educational Resources Information Center

Penfield, Randall D.

2006-01-01

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
48 CFR 12.103 - Commercially available off-the-shelf (COTS) items.

Code of Federal Regulations, 2010 CFR

2010-10-01

... ACQUISITION REGULATION ACQUISITION PLANNING ACQUISITION OF COMMERCIAL ITEMS Acquisition of Commercial Items... indicated otherwise, all of the policies that apply to commercial items also apply to COTS. Section 12.505 lists the laws that are not applicable to COTS (in addition to 12.503 and 12.504); the components test...
Assessing Hopelessness in Terminally Ill Cancer Patients: Development of the Hopelessness Assessment in Illness Questionnaire

PubMed Central

Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William

2013-01-01

Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
Three pedagogical approaches to introductory physics labs and their effects on student learning outcomes

NASA Astrophysics Data System (ADS)

Chambers, Timothy

This dissertation presents the results of an experiment that measured the learning outcomes associated with three different pedagogical approaches to introductory physics labs. These three pedagogical approaches presented students with the same apparatus and covered the same physics content, but used different lab manuals to guide students through distinct cognitive processes in conducting their laboratory investigations. We administered post-tests containing multiple-choice conceptual questions and free-response quantitative problems one week after students completed these laboratory investigations. In addition, we collected data from the laboratory practical exam taken by students at the end of the semester. Using these data sets, we compared the learning outcomes for the three curricula in three dimensions of ability: conceptual understanding, quantitative problem-solving skill, and laboratory skills. Our three pedagogical approaches are as follows. Guided labs lead students through their investigations via a combination of Socratic-style questioning and direct instruction, while students record their data and answers to written questions in the manual during the experiment. Traditional labs provide detailed written instructions, which students follow to complete the lab objectives. Open labs provide students with a set of apparatus and a question to be answered, and leave students to devise and execute an experiment to answer the question. In general, we find that students performing Guided labs perform better on some conceptual assessment items, and that students performing Open labs perform significantly better on experimental tasks. Combining a classical test theory analysis of post-test results with in-lab classroom observations allows us to identify individual components of the laboratory manuals and investigations that are likely to have influenced the observed differences in learning outcomes associated with the different pedagogical approaches. Due to the novel nature of this research and the large number of item-level results we produced, we recommend additional research to determine the reproducibility of our results. Analyzing the data with item response theory yields additional information about the performance of our students on both conceptual questions and quantitative problems. We find that performing lab activities on a topic does lead to better-than-expected performance on some conceptual questions regardless of pedagogical approach, but that this acquired conceptual understanding is strongly context-dependent. The results also suggest that a single "Newtonian reasoning ability" is inadequate to explain student response patterns to items from the Force Concept Inventory. We develop a framework for applying polytomous item response theory to the analysis of quantitative free-response problems and for analyzing how features of student solutions are influenced by problem-solving ability. Patterns in how students at different abilities approach our post-test problems are revealed, and we find hints as to how features of a free-response problem influence its item parameters. The item-response theory framework we develop provides a foundation for future development of quantitative free-response research instruments. Chapter 1 of the dissertation presents a brief history of physics education research and motivates the present study. Chapter 2 describes our experimental methodology and discusses the treatments applied to students and the instruments used to measure their learning. Chapter 3 provides an introduction to the statistical and analytical methods used in our data analysis. Chapter 4 presents the full data set, analyzed using both classical test theory and item response theory. Chapter 5 contains a discussion of the implications of our results and a data-driven analysis of our experimental methods. Chapter 6 describes the importance of this work to the field and discusses the relevance of our research to curriculum development and to future work in physics education research.
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

ERIC Educational Resources Information Center

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

ERIC Educational Resources Information Center

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol

2016-01-01

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

PubMed

Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

2018-06-01

This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions

ERIC Educational Resources Information Center

Liang, Longjuan; Browne, Michael W.

2015-01-01

If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Assessment of School-Based Quasi-Experimental Nutrition and Food Safety Health Education for Primary School Students in Two Poverty-Stricken Counties of West China.

PubMed

Shen, Minxue; Hu, Ming; Sun, Zhenqiu

2015-01-01

Few studies on nutrition and food safety education intervention for students in remote areas of China were reported. The study aimed to assess the questionnaire used to measure the knowledge, attitude and behavior with respect to nutrition and food safety, and to evaluate the effectiveness of a quasi-experimental nutrition and food safety education intervention among primary school students in poverty-stricken counties of west China. Twelve primary schools in west China were randomly selected from Zhen'an of Shaanxi province and Huize of Yunnan province. Six geographically dispersed schools were assigned to the intervention group in a nonrandom way. Knowledge, attitude and behavior questionnaire was developed, assessed, and used for outcome measurement. Students were investigated at baseline and the end of the study respectively without follow-up. Students in intervention group received targeted nutrition and food safety lectures 0.5 hour per week for two semesters. Item response theory was applied for assessment of questionnaire, and a two-level difference-in-differences model was applied to assess the effectiveness of the intervention. The Cronbach's alpha of the original questionnaire was 0.84. According to item response model, 22 knowledge items, 6 attitude items and 8 behavior items showed adequate discrimination parameter and were retained. 378 and 478 valid questionnaires were collected at baseline and the end point. Differences of demographic characteristics were statistically insignificant between the two groups. Two-level difference-in-differences models showed that health education improved 2.92 (95% CI: 2.06-3.78) and 2.92 (95% CI: 1.37-4.47) in knowledge and behavior scores respectively, but had no effect on attitude. The questionnaire met the psychometric standards and showed good internal consistence and discrimination power. The nutrition and food safety education was effective in improving the knowledge and behavior of primary school students in the two poverty-stricken counties of China.
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks

PubMed Central

Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.

2013-01-01

Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904

An Alternative Approach for the Analyses and Interpretation of Attachment Sort Items

ERIC Educational Resources Information Center

Kirkland, John; Bimler, David; Drawneek, Andrew; McKim, Margaret; Scholmerich, Axel

2004-01-01

Attachment Q-Sort (AQS) is a tool for quantifying observations about toddler/caregiver relationships. Previous studies have applied factor analysis to the full 90 AQS item set to explore the structure underlying them. Here we explore that structure by applying multidimensional scaling (MDS) to judgements of inter-item similarity. AQS items are…
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

PubMed Central

Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

2004-01-01

Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Relationship between Item Responses of Negative Affect Items and the Distribution of the Sum of the Item Scores in the General Population

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Stochastic Approximation Methods for Latent Regression Item Response Models

ERIC Educational Resources Information Center

von Davier, Matthias; Sinharay, Sandip

2010-01-01

This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Screening for adolescents' internalizing symptoms in primary care: item response theory analysis of the behavior health screen depression, anxiety, and suicidal risk scales.

PubMed

Bevans, Katherine B; Diamond, Guy; Levy, Suzanne

2012-05-01

To apply a modern psychometric approach to validate the Behavioral Health Screen (BHS) Depression, Anxiety, and Suicidal Risk Scales among adolescents in primary care. Psychometric analyses were conducted using data collected from 426 adolescents aged 12 to 21 years (mean = 15.8, SD = 2.2). Rasch-Masters partial credit models were fit to the data to determine whether items supported the comprehensive measurement of internalizing symptoms with minimal gaps and redundancies. Scales were reduced to ensure that they measured singular dimensions of generalized anxiety, depressed affect, and suicidal risk both comprehensively and efficiently. Although gender bias was observed for some depression and anxiety items, differential item functioning did not impact overall subscale scores. Future revisions to the BHS should include additional items that assess low-level internalizing symptoms. The BHS is an accurate and efficient tool for identifying adolescents with internalizing symptoms in primary care settings. Access to psychometrically sound and cost-effective behavioral health screening tools is essential for meeting the increasing demands for adolescent behavioral health screening in primary/ambulatory care.
Are life satisfaction and self-esteem distinct constructs? A black South African perspective.

PubMed

Westaway, Margaret S; Maluka, Constance S

2005-10-01

As part of a longitudinal project on Quality of Life, a study was undertaken to extend the applicability of the 5-item Satisfaction With Life Scale, developed in the USA, in South Africa. Data on basic sociodemographic characteristics, the scale, and the 10-item Rosenberg Self-esteem scale were available for 360 Black South Africans (151 men and 209 women), ages 21 to 83 years (M = 38.6 yr., SD = 10.3). Factor analysis applied to scale scores gave two factors, accounting for 71% of the variance. Factor I was loaded by 10 Self-esteem items and Factor II by four of the five Life Satisfaction items. Coefficient alpha was .77 for the Satisfaction With Life Scale and .97 for the Rosenberg Self-esteem Scale. Life Satisfaction was related to Self-esteem (r = .17, p < .01). It was concluded that Life Satisfaction and Self-esteem appear to be distinct, unitary constructs, but responses to Item 5 on the Satisfaction With Life Scale require cautious interpretation and may contribute to the weak r, although so may the collectivist culture of Black South Africans.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
The Health Education Impact Questionnaire (heiQ): an outcomes and evaluation measure for patient education and self-management interventions for people with chronic conditions.

PubMed

Osborne, Richard H; Elsworth, Gerald R; Whitfield, Kathryn

2007-05-01

This paper describes the development and validation of the Health Education Impact Questionnaire (heiQ). The aim was to develop a user-friendly, relevant, and psychometrically sound instrument for the comprehensive evaluation of patient education programs, which can be applied across a broad range of chronic conditions. Item development for the heiQ was guided by a Program Logic Model, Concept Mapping, interviews with stakeholders and psychometric analyses. Construction (N=591) and confirmatory (N=598) samples were drawn from consumers of patient education programs and hospital outpatients. The properties of the heiQ were investigated using item response theory and structural equation modeling. Over 90 candidate items were generated, with 42 items selected for inclusion in the final scale. Eight independent dimensions were derived: Positive and Active Engagement in Life (five items, Cronbach's alpha (alpha)=0.86); Health Directed Behavior (four items, alpha=0.80); Skill and Technique Acquisition (five items, alpha=0.81); Constructive Attitudes and Approaches (five items, alpha=0.81); Self-Monitoring and Insight (seven items, alpha=0.70); Health Service Navigation (five items, alpha=0.82); Social Integration and Support (five items, alpha=0.86); and Emotional Wellbeing (six items, alpha=0.89). The heiQ has high construct validity and is a reliable measure of a broad range of patient education program benefits. The heiQ will provide valuable information to clinicians, researchers, policymakers and other stakeholders about the value of patient education programs in chronic disease management.
On the validity of measuring change over time in routine clinical assessment: a close examination of item-level response shifts in psychosomatic inpatients.

PubMed

Nolte, S; Mierke, A; Fischer, H F; Rose, M

2016-06-01

Significant life events such as severe health status changes or intensive medical treatment often trigger response shifts in individuals that may hamper the comparison of measurements over time. Drawing from the Oort model, this study aims at detecting response shift at the item level in psychosomatic inpatients and evaluating its impact on the validity of comparing repeated measurements. Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on average 24 days after intake. Reconceptualization, reprioritization, and recalibration response shifts were explored applying tests of measurement invariance. In the item-level approach, all model parameters were constrained to be equal between pretest and posttest. If non-invariance was detected, these were linked to the different types of response shift. When constraining across-occasion model parameters, model fit worsened as indicated by a significant Satorra-Bentler Chi-square difference test suggesting potential presence of response shifts. A close examination revealed presence of two types of response shift, i.e., (non)uniform recalibration and both higher- and lower-level reconceptualization response shifts leading to four model adjustments. Our analyses suggest that psychosomatic inpatients experienced some response shifts during their hospital stay. According to the hierarchy of measurement invariance, however, only one of the detected non-invariances is critical for unbiased mean comparisons over time, which did not have a substantial impact on estimating change. Hence, the use of the ISR can be recommended for outcomes assessment in clinical routine, as change score estimates do not seem hampered by response shift effects.
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics

PubMed Central

Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.

2009-01-01

Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods

PubMed Central

Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-01-01

Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
The Performance of Local Dependence Measures with Psychological Data

ERIC Educational Resources Information Center

Houts, Carrie R.; Edwards, Michael C.

2013-01-01

The violation of the assumption of local independence when applying item response theory (IRT) models has been shown to have a negative impact on all estimates obtained from the given model. Numerous indices and statistics have been proposed to aid analysts in the detection of local dependence (LD). A Monte Carlo study was conducted to evaluate…
The Rasch Rating Model and the Disordered Threshold Controversy

ERIC Educational Resources Information Center

Adams, Raymond J.; Wu, Margaret L.; Wilson, Mark

2012-01-01

The Rasch rating (or partial credit) model is a widely applied item response model that is used to model ordinal observed variables that are assumed to collectively reflect a common latent variable. In the application of the model there is considerable controversy surrounding the assessment of fit. This controversy is most notable when the set of…
Explore the Usefulness of Person-Fit Analysis on Large-Scale Assessment

ERIC Educational Resources Information Center

Cui, Ying; Mousavi, Amin

2015-01-01

The current study applied the person-fit statistic, l[subscript z], to data from a Canadian provincial achievement test to explore the usefulness of conducting person-fit analysis on large-scale assessments. Item parameter estimates were compared before and after the misfitting student responses, as identified by l[subscript z], were removed. The…
Standard Errors of Estimated Latent Variable Scores with Estimated Structural Parameters

ERIC Educational Resources Information Center

Hoshino, Takahiro; Shigemasu, Kazuo

2008-01-01

The authors propose a concise formula to evaluate the standard error of the estimated latent variable score when the true values of the structural parameters are not known and must be estimated. The formula can be applied to factor scores in factor analysis or ability parameters in item response theory, without bootstrap or Markov chain Monte…
A Common Capacity Limitation for Response and Item Selection in Working Memory

ERIC Educational Resources Information Center

Janczyk, Markus

2017-01-01

Successful completion of any cognitive task requires selecting a particular action and the object the action is applied to. Oberauer (2009) suggested a working memory (WM) model comprising a declarative and a procedural part with analogous structures. One important assumption of this model is that both parts work independently of each other, and…
Applying Systems Design and Item Response Theory to the Problem of Measuring Information Literacy Skills.

ERIC Educational Resources Information Center

O'Connor, Lisa G.; Radcliff, Carolyn J.; Gedeon, Julie A.

2002-01-01

Reports on the development of the Standardized Assessment of Information Literacy Skills (SAILS) at Kent State University (Ohio) for programmatic-level assessment of information literacy skills. Once validated, the instrument will be used to assess entry skills upon admission and longitudinally to ascertain whether there is significant change in…
Development and validation of instrument for ergonomic evaluation of tablet arm chairs

PubMed Central

Tirloni, Adriana Seára; dos Reis, Diogo Cunha; Bornia, Antonio Cezar; de Andrade, Dalton Francisco; Borgatto, Adriano Ferreti; Moro, Antônio Renato Pereira

2016-01-01

The purpose of this study was to develop and validate an evaluation instrument for tablet arm chairs based on ergonomic requirements, focused on user perceptions and using Item Response Theory (IRT). This exploratory study involved 1,633 participants (university students and professors) in four steps: a pilot study (n=26), semantic validation (n=430), content validation (n=11) and construct validation (n=1,166). Samejima's graded response model was applied to validate the instrument. The results showed that all the steps (theoretical and practical) of the instrument's development and validation processes were successful and that the group of remaining items (n=45) had a high consistency (0.95). This instrument can be used in the furniture industry by engineers and product designers and in the purchasing process of tablet arm chairs for schools, universities and auditoriums. PMID:28337099
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…

Writing, Evaluating and Assessing Data Response Items in Economics.

ERIC Educational Resources Information Center

Trotman-Dickenson, D. I.

1989-01-01

Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores

ERIC Educational Resources Information Center

Johnson, Timothy R.

2013-01-01

One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

ERIC Educational Resources Information Center

Polak, Marike; De Rooij, Mark; Heiser, Willem J.

2012-01-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Effects of mischievous responding on universal mental health screening: I love rum raisin ice cream, really I do!

PubMed

Furlong, Michael J; Fullchange, Aileen; Dowdy, Erin

2017-09-01

Student surveys are often used for school-based mental health screening; hence, it is critical to evaluate the authenticity of information obtained via the self-report format. The objective of this study was to examine the possible effects of mischievous response patterns on school-based screening results. The present study included 1,857 high school students who completed a schoolwide screening for complete mental health. Student responses were reviewed to detect possible mischievous responses and to examine their association with other survey results. Consistent with previous research, mischievous responding was evaluated by items that are legitimate to ask of all students (e.g., How much do you weigh? and How many siblings do you have?). Responses were considered "mischievous" when a student selected multiple extreme, unusual (less than 5% incidence) response options, such as weighing more than 225 pounds and having 10 or more siblings. Only 1.8% of the students responded in extreme ways to 2 or more of 7 mischievous response items. When compared with other students, the mischievous responders were less likely to declare that they answered items honestly, were more likely to finish the survey in less than 10 min, reported lower levels of life satisfaction and school connectedness, and reported higher levels of emotional and behavioral distress. When applying a dual-factor mental health screening framework to the responses, mischievous responders were less likely to be categorized as having complete mental health. Implications for school-based mental health screening are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The emotion dysregulation inventory: Psychometric properties and item response theory calibration in an autism spectrum disorder sample.

PubMed

Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A

2018-06-01

Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Measuring Alexithymia via Trait Approach-I: A Alexithymia Scale Item Selection and Formation of Factor Structure

PubMed Central

TATAR, Arkun; SALTUKOĞLU, Gaye; ALİOĞLU, Seda; ÇİMEN, Sümeyye; GÜVEN, Hülya; AY, Çağla Ebru

2017-01-01

Introduction It is not clear in the literature whether available instruments are sufficient to measure alexithymia because of its theoretical structure. Moreover, it has been reported that several measuring instruments are needed to measure this construct, and all the instruments have different error sources. The old and the new forms of Toronto Alexithymia Scale are the only instruments available in Turkish. Thus, the purpose of this study was to develop a new scale to measure alexithymia, selecting items and constructing the factor structure. Methods A total of 1117 patients aged from 19 to 82 years (mean = 35.05 years) were included. A 100-item pool was prepared and applied to 628 women and 489 men. Data were analyzed using Explanatory Factor Analysis, Confirmatory Factor Analysis, and Item Response Theory and 28 items were selected. The new form of 28 items was applied to 415 university students, including 271 women and 144 men aged from 18 to 30 (mean=21.44). Results The results of Explanatory Factor Analysis revealed a five-factor construct of “Solving and Expressing Affective Experiences,” “External Locused Cognitive Style,” “Tendency to Somatize Affections,” “Imaginary Life and Visualization,” and “Acting Impulsively,” along with a two-factor construct representing the “Affective” and “Cognitive” components. All the components of the construct showed good model fit and high internal consistency. The new form was tested in terms of internal consistency, test-retest reliability, and concurrent validity using Toronto Alexithymia Scale as criteria and discriminative validity using Five-Factor Personality Inventory Short Form. Conclusion The results showed that the new scale met the basic psychometric requirements. Results have been discussed in line with related studies. PMID:29033633
Prevalence of responsible hospitality policies in licensed premises that are associated with alcohol-related harm.

PubMed

Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J

2002-06-01

This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
Why Japanese workers show low work engagement: An item response theory analysis of the Utrecht Work Engagement scale

PubMed Central

2010-01-01

With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption) to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9) and the comparability of this scale between both countries. Item Response Theory (IRT) was applied to the data from Japan (N = 2,339) and the Netherlands (N = 13,406). Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement) based the test information function (TIF) and the standard error of measurement (SEM). The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information) among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees. PMID:21054839
Why Japanese workers show low work engagement: An item response theory analysis of the Utrecht Work Engagement scale.

PubMed

Shimazu, Akihito; Schaufeli, Wilmar B; Miyanaka, Daisuke; Iwata, Noboru

2010-11-05

With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption) to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9) and the comparability of this scale between both countries. Item Response Theory (IRT) was applied to the data from Japan (N = 2,339) and the Netherlands (N = 13,406). Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement) based the test information function (TIF) and the standard error of measurement (SEM). The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information) among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees.
Item Response Data Analysis Using Stata Item Response Theory Package

ERIC Educational Resources Information Center

Yang, Ji Seung; Zheng, Xiaying

2018-01-01

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Item Response Models for Local Dependence among Multiple Ratings

ERIC Educational Resources Information Center

Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan

2014-01-01

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Development and evaluation of the PI-G: a three-scale measure based on the German translation of the PROMIS ® pain interference item bank.

PubMed

Farin, Erik; Nagl, Michaela; Gramm, Lukas; Heyduck, Katja; Glattacker, Manuela

2014-05-01

Study aim was to translate the PROMIS(®) pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms. We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS(®) PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted. The assumptions regarding IRT scaling of the translated PROMIS(®) PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties. The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS(®) PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).
Point and Click, Carefully: Investigating Inconsistent Response Styles in Middle School and College Students Involved in Web-Based Longitudinal Substance Use Research

PubMed Central

Wardell, Jeffrey D.; Rogers, Michelle L.; Simms, Leonard J.; Jackson, Kristina M.; Read, Jennifer P.

2014-01-01

This study investigated inconsistent responding to survey items by participants involved in longitudinal, web-based substance use research. We also examined cross-sectional and prospective predictors of inconsistent responding. Middle school (N = 1,023) and college students (N = 995) from multiple sites in the United States responded to online surveys assessing substance use and related variables in three waves of data collection. We applied a procedure for creating an index of inconsistent responding at each wave that involved identifying pairs of items with considerable redundancy and calculating discrepancies in responses to these items. Inconsistent responding was generally low in the Middle School sample and moderate in the College sample, with individuals showing only modest stability in inconsistent responding over time. Multiple regression analyses identified several baseline variables—including demographic, personality, and behavioral variables—that were uniquely associated with inconsistent responding both cross-sectionally and prospectively. Alcohol and substance involvement showed some bivariate associations with inconsistent responding, but these associations largely were accounted for by other factors. The results suggest that high levels of carelessness or inconsistency do not appear to characterize participants’ responses to longitudinal web-based surveys of substance use and support the use of inconsistency indices as a tool for identifying potentially problematic responders. PMID:24092819
A Multidimensional Ideal Point Item Response Theory Model for Binary Data

ERIC Educational Resources Information Center

Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.

2006-01-01

We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
Measurement properties of the WOMAC LK 3.1 pain scale.

PubMed

Stratford, P W; Kennedy, D M; Woodhouse, L J; Spadoni, G F

2007-03-01

The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is applied extensively to patients with osteoarthritis of the hip or knee. Previous work has challenged the validity of its physical function scale however an extensive evaluation of its pain scale has not been reported. Our purpose was to estimate internal consistency, factorial validity, test-retest reliability, and the standard error of measurement (SEM) of the WOMAC LK 3.1 pain scale. Four hundred and seventy-four patients with osteoarthritis of the hip or knee awaiting arthroplasty were administered the WOMAC. Estimates of internal consistency (coefficient alpha), factorial validity (confirmatory factor analysis), and the SEM based on internal consistency (SEM(IC)) were obtained. Test-retest reliability [Type 2,1 intraclass correlation coefficients (ICC)] and a corresponding SEM(TRT) were estimated on a subsample of 36 patients. Our estimates were: internal consistency alpha=0.84; SEM(IC)=1.48; Type 2,1 ICC=0.77; SEM(TRT)=1.69. Confirmatory factor analysis failed to support a single factor structure of the pain scale with uncorrelated error terms. Two comparable models provided excellent fit: (1) a model with correlated error terms between the walking and stairs items, and between night and sit items (chi2=0.18, P=0.98); (2) a two factor model with walking and stairs items loading on one factor, night and sit items loading on a second factor, and the standing item loading on both factors (chi2=0.18, P=0.98). Our examination of the factorial structure of the WOMAC pain scale failed to support a single factor and internal consistency analysis yielded a coefficient less than optimal for individual patient use. An alternate strategy to summing the five-item responses when considering individual patient application would be to interpret item responses separately or to sum only those items which display homogeneity.
Person Heterogeneity of the BDI-II-C and Its Effects on Dimensionality and Construct Validity: Using Mixture Item Response Models

ERIC Educational Resources Information Center

Wu, Pei-Chen; Huang, Tsai-Wei

2010-01-01

This study was to apply the mixed Rasch model to investigate person heterogeneity of Beck Depression Inventory-II-Chinese version (BDI-II-C) and its effects on dimensionality and construct validity. Person heterogeneity was reflected by two latent classes that differ qualitatively. Additionally, person heterogeneity adversely affected the…
Variations in Primary Teachers' Responses and Development during Three Major Science In-Service Programmes

ERIC Educational Resources Information Center

Jarvis, Tina; Pell, Anthony; Hingley, Philip

2011-01-01

This paper reports on how different types of teachers responded to in-service aimed at developing investigative-based science education (IBSE) in primary schools, and the extent to which they applied their new skills in the classroom. Common items from evaluation questionnaires allowed data to be combined from three major in-service programmes.…
The Relation between Test Formats and Kindergarteners' Expressions of Vocabulary Knowledge

ERIC Educational Resources Information Center

Christ, Tanya; Chiu, Ming Ming; Currie, Ashelin; Cipielewski, James

2014-01-01

This study tested how 53 kindergarteners' expressions of depth of vocabulary knowledge and use in novel contexts were related to in-context and out-of-context test formats for 16 target words. Applying multilevel, multi-categorical Logit to all 1,696 test item responses, the authors found that kindergarteners were more likely to express deep…
Agreement Between Responses From Community-Dwelling Persons With Stroke and Their Proxies on the NIH Neurological Quality of Life (Neuro-QoL) Short Forms.

PubMed

Kozlowski, Allan J; Singh, Ritika; Victorson, David; Miskovic, Ana; Lai, Jin-Shei; Harvey, Richard L; Cella, David; Heinemann, Allen W

2015-11-01

To examine agreement between patient and proxy responses on the Quality of Life in Neurological Disorders (Neuro-QoL) instruments after stroke. Cross-sectional observational substudy of the longitudinal, multisite, multicondition Neuro-QoL validation study. In-person, interview-guided, patient-reported outcomes. Convenience sample of dyads (N=86) of community-dwelling persons with stroke and their proxy respondents. Not applicable. Dyads concurrently completed short forms of 8 or 9 items for the 13 Neuro-QoL adult domains using the patient-proxy perspective. Agreement was examined at the scale-level with difference scores, intraclass correlation coefficients (ICCs), effect size statistics, and Bland-Altman plots, and at the item-level with kappa coefficients. We found no mean differences between patients and proxies on the Applied Cognition-General Concerns, Depression, Satisfaction With Social Roles and Activities, Stigma, and Upper Extremity Function (Fine Motor, activities of daily living) short forms. Patients rated themselves more favorably on the Applied Cognition-Executive Function, Ability to Participate in Social Roles and Activities, Lower Extremity Function (Mobility), Positive Affect and Well-Being, Anxiety, Emotional and Behavioral Dyscontrol, and Fatigue short forms. The largest mean patient-proxy difference observed was 3 T-score points on the Lower Extremity Function (Mobility). ICCs ranged from .34 to .59. However, limits of agreement showed dyad differences exceeding ±20 T-score points, and item-level agreement ranged from not significant to weighted kappa=.34. Proxy responses on Neuro-QoL short forms can complement responses of moderate- to high-functioning community-dwelling persons with stroke and augment group-level analyses, but do not substitute for individual patient ratings. Validation is needed for other stroke populations. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

Linking Measures of Adult Nicotine Dependence to a Common Latent Continuum and a Comparison with Adolescent Patterns

PubMed Central

Strong, David R.; Schonbrun, Yael Chatav; Schaffran, Christine; Griesler, Pamela C.; Kandel, Denise

2012-01-01

Background An ongoing debate regarding the nature of Nicotine Dependence (ND) is whether the same instrument can be applied to measure ND among adults and adolescents. Using a hierarchical item response model (IRM), we examined evidence for a common continuum underlying ND symptoms among adults and adolescents. Method The analyses are based on two waves of interviews with subsamples of parents and adolescents from a multi-ethnic longitudinal cohort of 1,039 6th–10th graders from the Chicago Public Schools (CPS). Adults and adolescents who reported smoking cigarettes the last 30 days prior to waves 3 and 5 completed three common instruments measuring ND symptoms and one item measuring loss of autonomy. Results A stable continuum of ND, first identified among adolescents, was replicated among adults. However, some symptoms, such as tolerance and withdrawal, differed markedly across adults and adolescents. The majority of mFTQ items were observed within the highest levels of ND, the NDSS items within the lowest levels, and the DSM-IV items were arrayed in the middle and upper third of the continuum of dependence severity. Loss of Autonomy was positioned at the lower end of the continuum. We propose a ten-symptom measure of ND for adolescents and adults. Conclusions Despite marked differences in the relative severity of specific ND symptoms in each group, common instrumentation of ND can apply to adults and adolescents. The results increase confidence in the ability to describe phenotypic heterogeneity in ND across important developmental periods. PMID:21855236
A Two-Decision Model for Responses to Likert-Type Items

ERIC Educational Resources Information Center

Thissen-Roe, Anne; Thissen, David

2013-01-01

Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
The development of an integrated assessment instrument for measuring analytical thinking and science process skills

NASA Astrophysics Data System (ADS)

Irwanto, Rohaeti, Eli; LFX, Endang Widjajanti; Suyanta

2017-05-01

This research aims to develop instrument and determine the characteristics of an integrated assessment instrument. This research uses 4-D model, which includes define, design, develop, and disseminate. The primary product is validated by expert judgment, tested it's readability by students, and assessed it's feasibility by chemistry teachers. This research involved 246 students of grade XI of four senior high schools in Yogyakarta, Indonesia. Data collection techniques include interview, questionnaire, and test. Data collection instruments include interview guideline, item validation sheet, users' response questionnaire, instrument readability questionnaire, and essay test. The results show that the integrated assessment instrument has Aiken validity value of 0.95. Item reliability was 0.99 and person reliability was 0.69. Teachers' response to the integrated assessment instrument is very good. Therefore, the integrated assessment instrument is feasible to be applied to measure the students' analytical thinking and science process skills.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Applying Computerized Adaptive Testing to the Negative Acts Questionnaire-Revised: Rasch Analysis of Workplace Bullying

PubMed Central

Ma, Shu-Ching; Li, Yu-Chi; Yui, Mei-Shu

2014-01-01

Background Workplace bullying is a prevalent problem in contemporary work places that has adverse effects on both the victims of bullying and organizations. With the rapid development of computer technology in recent years, there is an urgent need to prove whether item response theory–based computerized adaptive testing (CAT) can be applied to measure exposure to workplace bullying. Objective The purpose of this study was to evaluate the relative efficiency and measurement precision of a CAT-based test for hospital nurses compared to traditional nonadaptive testing (NAT). Under the preliminary conditions of a single domain derived from the scale, a CAT module bullying scale model with polytomously scored items is provided as an example for evaluation purposes. Methods A total of 300 nurses were recruited and responded to the 22-item Negative Acts Questionnaire-Revised (NAQ-R). All NAT (or CAT-selected) items were calibrated with the Rasch rating scale model and all respondents were randomly selected for a comparison of the advantages of CAT and NAT in efficiency and precision by paired t tests and the area under the receiver operating characteristic curve (AUROC). Results The NAQ-R is a unidimensional construct that can be applied to measure exposure to workplace bullying through CAT-based administration. Nursing measures derived from both tests (CAT and NAT) were highly correlated (r=.97) and their measurement precisions were not statistically different (P=.49) as expected. CAT required fewer items than NAT (an efficiency gain of 32%), suggesting a reduced burden for respondents. There were significant differences in work tenure between the 2 groups (bullied and nonbullied) at a cutoff point of 6 years at 1 worksite. An AUROC of 0.75 (95% CI 0.68-0.79) with logits greater than –4.2 (or >30 in summation) was defined as being highly likely bullied in a workplace. Conclusions With CAT-based administration of the NAQ-R for nurses, their burden was substantially reduced without compromising measurement precision. PMID:24534113
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

ERIC Educational Resources Information Center

Lee, Wooyeol; Cho, Sun-Joo

2017-01-01

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30

ERIC Educational Resources Information Center

Antal, Tamás

2007-01-01

A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

ERIC Educational Resources Information Center

Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

2016-01-01

Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Emotional vitality in caregivers: application of Rasch Measurement Theory with secondary data to development and test a new measure.

PubMed

Barbic, Skye P; Bartlett, Susan J; Mayo, Nancy E

2015-07-01

To describe the practical steps in identifying items and evaluating scoring strategies for a new measure of emotional vitality in informal caregivers of individuals who have experienced a significant health event. The psychometric properties of responses to selected items from validated health-related quality of life and other psychosocial questionnaires administered four times over a one-year period were evaluated using Rasch Measurement Theory. Community. A total of 409 individuals providing informal care at home to older adults who had experienced a recent stroke. Rasch Measurement Theory was used to test the ordering of response option thresholds, fit, spread of the item locations, residual correlations, person separation index, and stability across time. Based on a theoretical framework developed in earlier work, we identified 22 candidate items from a pool of relevant psychosocial measures available. Of these, additional evaluation resulted in 19 items that could be used to assess the five core domains. The overall model fit was reasonable (χ(2) = 202.26, DF = 117, p = 0.06), stable across time, with borderline evidence of multidimensionality (10%). Items and people covered a continuum ranging from -3.7 to +2.7 logits, reflecting coverage of the measurement continuum, with a person separation index of 0.85. Mean fit of caregivers was lower than expected (-1.31 ±1.10 logits). Established methods from the Rasch Measurement Theory were applied to develop a prototype measure of emotional vitality that is acceptable, reliable, and can be used to obtain an interval level score for use in future research and clinical settings. © The Author(s) 2014.
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

PubMed

Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-04-11

The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Development of a Computer-Adaptive Physical Function Instrument for Social Security Administration Disability Determination

PubMed Central

Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton

2014-01-01

Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
The Generic Short Patient Experiences Questionnaire (GS-PEQ): identification of core items from a survey in Norway

PubMed Central

2011-01-01

Background Questionnaires are commonly used to collect patient, or user, experiences with health care encounters; however, their adaption to specific target groups limits comparison between groups. We present the construction of a generic questionnaire (maximum of ten questions) for user evaluation across a range of health care services. Methods Based on previous testing of six group-specific questionnaires, we first constructed a generic questionnaire with 23 items related to user experiences. All questions included a "not applicable" response option, as well as a follow-up question about the item's importance. Nine user groups from one health trust were surveyed. Seven groups received questionnaires by mail and two by personal distribution. Selection of core questions was based on three criteria: applicability (proportion "not applicable"), importance (mean scores on follow-up questions), and comprehensiveness (content coverage, maximum two items per dimension). Results 1324 questionnaires were returned providing subsample sizes ranging from 52 to 323. Ten questions were excluded because the proportion of "not applicable" responses exceeded 20% in at least one user group. The number of remaining items was reduced to ten by applying the two other criteria. The final short questionnaire included items on outcome (2), clinician services (2), user involvement (2), incorrect treatment (1), information (1), organisation (1), and accessibility (1). Conclusion The Generic Short Patient Experiences Questionnaire (GS-PEQ) is a short, generic set of questions on user experiences with specialist health care that covers important topics for a range of groups. It can be used alone or with other instruments in quality assessment or in research. The psychometric properties and the relevance of the GS-PEQ in other health care settings and countries need further evaluation. PMID:21510871
Development of a computer-adaptive physical function instrument for Social Security Administration disability determination.

PubMed

Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton

2013-09-01

To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Validation of Catquest-9SF-A Visual Disability Instrument to Evaluate Patient Function After Corneal Transplantation.

PubMed

Claesson, Margareta; Armitage, W John; Byström, Berit; Montan, Per; Samolov, Branka; Stenvi, Ulf; Lundström, Mats

2017-09-01

Catquest-9SF is a 9-item visual disability questionnaire developed for evaluating patient-reported outcome measures after cataract surgery. The aim of this study was to use Rasch analysis to determine the responsiveness of Catquest-9SF for corneal transplant patients. Patients who underwent corneal transplantation primarily to improve vision were included. One group (n = 199) completed the Catquest-9SF questionnaire before corneal transplantation and a second independent group (n = 199) completed the questionnaire 2 years after surgery. All patients were recorded in the Swedish Cornea Registry, which provided clinical and demographic data for the study. Winsteps software v.3.91.0 (Winsteps.com, Beaverton, OR) was used to assess the fit of the Catquest-9SF data to the Rasch model. Rasch analysis showed that Catquest-9SF applied to corneal transplant patients was unidimensional (infit range, 0.73-1.32; outfit range, 0.81-1.35), and therefore, measured a single underlying construct (visual disability). The Rasch model explained 68.5% of raw variance. The response categories of the 9-item questionnaire were ordered, and the category thresholds were well defined. Item difficulty matched the level of patients' ability (0.36 logit difference between the means). Precision in terms of person separation (3.09) and person reliability (0.91) was good. Differential item functioning was notable for only 1 item (satisfaction with vision), which had a differential item functioning contrast of 1.08 logit. Rasch analysis showed that Catquest-9SF is a valid instrument for measuring visual disability in patients who have undergone corneal transplantation primarily to improve vision.
Perceived freedom-responsibility covariation among Cypriot adolescents.

PubMed

Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R

2008-04-01

Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.

Measuring quality of life in patients with stress urinary incontinence: is the ICIQ-UI-SF adequate?

PubMed

Kurzawa, Zuzanna; Sutherland, Jason M; Crump, Trafford; Liu, Guiping

2018-05-08

The International Consultation on Incontinence Questionnaire Short Form (ICIQ-UI-SF) is a widely used four-item patient-reported outcome (PRO) measure. Evaluations of this instrument are limited, restraining user's confidence in the instrument. This study conducts a comprehensive evaluation of the ICIQ-UI-SF on a sample of urological surgery patients in Canada. One hundred and seventy-seven surgical patients with stress urinary incontinence completed the ICIQ-UI-SF pre-operatively. Methods drawing from confirmatory factor analysis (CFA), measures of reliability, item response theory (IRT), and differential item functioning were applied. Ceiling effects were examined. Ceiling effects were identified. In the CFA, the factor loadings of items one and two differed significantly (p < 0.001) from item three indicating possible multidimensionality. The first two items reflect symptom severity not quality of life. Reliability was moderate as measured by Cronbach's alpha (0.63) and McDonald's coefficient (0.65). The IRT found the instrument does not discriminate between individuals with low incontinence-related quality of life. Due to low/moderate reliability, the ICIQ-UI-SF can be used as a complement to other data or used to report aggregated surgical outcomes among surgical patients. If the primary objective is to measure quality of life, other PROs should be considered.
Development of the functional vision questionnaire for children and young people with visual impairment: the FVQ_CYP.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2013-12-01

To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Examining Player Anger in World of Warcraft

NASA Astrophysics Data System (ADS)

Barnett, Jane; Coulson, Mark; Foreman, Nigel

This questionnaire study of the sources of anger in World of Warcraft applies classical quantitative measurement scale construction to a new problem, generating a host of questionnaire items that could find use in future studies, and identifying four major categories of events that cause negative effect among players. First, 33 players provided examples of in-game scenarios that had made them angry, and their responses were culled to create a 93-item battery rated by hundreds of player respondents in terms of anger intensity and anger frequency. An iterative process of factor analysis and scale reliability assessment led to a 28-item instrument measuring four anger-provoking factors: Raids/Instances, Griefers, Perceived Time Wasting, and Anti-social Players. These anger-causing scenarios were then illustrated by concrete examples from player and researcher experiences in World of Warcraft. One striking finding is that players become angry at other players' negative behavior, regardless of whether that behavior was intended to harm.
Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

ERIC Educational Resources Information Center

Pohl, Steffi; Gräfe, Linda; Rose, Norman

2014-01-01

Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Development and Preliminary Validation of Refugee Trauma History Checklist (RTHC)—A Brief Checklist for Survey Studies

PubMed Central

Gottvall, Maria; Vaez, Marjan

2017-01-01

A high proportion of refugees have been subjected to potentially traumatic experiences (PTEs), including torture. PTEs, and torture in particular, are powerful predictors of mental ill health. This paper reports the development and preliminary validation of a brief refugee trauma checklist applicable for survey studies. Methods: A pool of 232 items was generated based on pre-existing instruments. Conceptualization, item selection and item refinement was conducted based on existing literature and in collaboration with experts. Ten cognitive interviews using a Think Aloud Protocol (TAP) were performed in a clinical setting, and field testing of the proposed checklist was performed in a total sample of n = 137 asylum seekers from Syria. Results: The proposed refugee trauma history checklist (RTHC) consists of 2 × 8 items, concerning PTEs that occurred before and during the respondents’ flight, respectively. Results show low item non-response and adequate psychometric properties Conclusions: RTHC is a usable tool for providing self-report data on refugee trauma history surveys of community samples. The core set of included events can be augmented and slight modifications can be applied to RTHC for use also in other refugee populations and settings. PMID:28976937
Development of measures from the theory of planned behavior applied to leisure-time physical activity.

PubMed

Kerner, Matthew S

2005-06-01

Using the theory of planned behavior as a conceptual framework, scales assessing Attitude to Leisure-time Physical Activity, Expectations of Others, Perceived Control, and Intention to Engage in Leisure-time Physical Activity were developed for use among middle-school students. The study sample included 349 boys and 400 girls, 10 to 14 years of age (M=11.9 yr., SD=.9). Unipolar and bipolar scales with seven response choices were developed, with each scale item phrased in a Likert-type format. Following revisions, 22 items were retained in the Attitude to Leisure-time Physical Activity Scale, 10 items in the Expectations of Others Scale, 3 items in the Perceived Control Scale, and 17 items in the Intention to Engage in Leisure-time Physical Activity Scale. Adequate internal consistency was indicated by standardized coefficients alpha ranging from .75 to .89. Current results must be extended to assess discriminant and predictive validities and to check various reliabilities with new samples, then evaluation of intervention techniques for promotion of positive attitudes about leisure-time physical activity, including perception of control and intentions to engage in leisure-time physical activity.
The psychometric properties of the "Reading the Mind in the Eyes" Test: an item response theory (IRT) analysis.

PubMed

Preti, Antonio; Vellante, Marcello; Petretto, Donatella R

2017-05-01

The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
A Survey on Distributed Mobile Database and Data Mining

NASA Astrophysics Data System (ADS)

Goel, Ajay Mohan; Mangla, Neeraj; Patel, R. B.

2010-11-01

The anticipated increase in popular use of the Internet has created more opportunity in information dissemination, Ecommerce, and multimedia communication. It has also created more challenges in organizing information and facilitating its efficient retrieval. In response to this, new techniques have evolved which facilitate the creation of such applications. Certainly the most promising among the new paradigms is the use of mobile agents. In this paper, mobile agent and distributed database technologies are applied in the banking system. Many approaches have been proposed to schedule data items for broadcasting in a mobile environment. In this paper, an efficient strategy for accessing multiple data items in mobile environments and the bottleneck of current banking will be proposed.
An item response theory analysis of DSM-IV criteria for hallucinogen abuse and dependence in adolescents

PubMed Central

Wu, Li-Tzy; Pan, Jeng-Jong; Yang, Chongming; Reeve, Bryce B.; Blazer, Dan G.

2009-01-01

Aim This study applied both item response theory (IRT) and multiple indicators–multiple causes (MIMIC) methods to evaluate item-level psychometric properties of diagnostic questions for hallucinogen use disorders (HUDs), differential item functioning (DIF), and predictors of latent HUD. Methods Data were drawn from 2004–2006 National Surveys on Drug Use and Health. Analyses were based on 1548 past-year hallucinogen users aged 12–17 years. Substance use and symptoms were assessed by audio computer-assisted self-interviewing methods. Results Abuse and dependence criteria empirically were arrayed along a single continuum of severity. All abuse criteria indicated middle-to-high severity on the IRT-defined HUD continuum, while dependence criteria captured a wider range from the lowest (tolerance and time spent) to the highest (taking larger amounts and inability to cut down) severity levels. There was indication of DIF by hallucinogen users’ age, gender, race/ethnicity, and ecstasy use status. Adjusting for DIF, ecstasy users (vs. non-ecstasy hallucinogen users), females (vs. males), and whites (vs. Hispanics) exhibited increased odds of HUD. Conclusions Symptoms of hallucinogen abuse and dependence empirically do not reflect two discrete conditions in adolescents. Trends and problems related to hallucinogen use among girls and whites should be examined further to inform the designs of effective gender-appropriate and culturally sensitive prevention programs. PMID:19896773
The perceptual chunking of speech: a demonstration using ERPs.

PubMed

Gilbert, Annie C; Boucher, Victor J; Jemel, Boutheina

2015-04-07

In tasks involving the learning of verbal or non-verbal sequences, groupings are spontaneously produced. These groupings are generally marked by a lengthening of final elements and have been attributed to a domain-general perceptual chunking linked to working memory. Yet, no study has shown how this domain-general chunking applies to speech processing, partly because of the traditional view that chunking involves a conceptual recoding of meaningful verbal items like words (Miller, 1956). The present study provides a demonstration of the perceptual chunking of speech by way of two experiments using evoked Positive Shifts (PSs), which capture on-line neural responses to marks of various groups. We observed listeners׳ response to utterances (Experiment 1) and meaningless series of syllables (Experiment 2) containing changing intonation and temporal marks, while also examining how these marks affect the recognition of heard items. The results show that, across conditions - and irrespective of the presence of meaningful items - PSs are specifically evoked by groups marked by lengthening. Moreover, this on-line detection of marks corresponds to characteristic grouping effects on listeners' immediate recognition of heard items, which suggests chunking effects linked to working memory. These findings bear out a perceptual chunking of speech input in terms of groups marked by lengthening, which constitute the defining marks of a domain-general chunking. Copyright © 2015 Elsevier B.V. All rights reserved.
Analysis of sensitive questions across cultures: an application of multigroup item randomized response theory to sexual attitudes and behavior.

PubMed

de Jong, Martijn G; Pieters, Rik; Stremersch, Stefan

2012-09-01

Answers to sensitive questions are prone to social desirability bias. If not properly addressed, the validity of the research can be suspect. This article presents multigroup item randomized response theory (MIRRT) to measure self-reported sensitive topics across cultures. The method was specifically developed to reduce social desirability bias by making an a priori change in the design of the survey. The change involves the use of a randomization device (e.g., a die) that preserves participants' privacy at the item level. In cases where multiple items measure a higher level theoretical construct, the researcher could still make inferences at the individual level. The method can correct for under- and overreporting, even if both occur in a sample of individuals or across nations. We present and illustrate MIRRT in a nontechnical manner, provide WinBugs software code so that researchers can directly implement it, and present 2 cross-national studies in which it was applied. The first study compared nonstudent samples from 2 countries (total n = 927) on permissive sexual attitudes and risky sexual behavior and related these to individual-level characteristics such as the Big Five personality traits. The second study compared nonstudent samples from 17 countries (total n = 6,195) on risky sexual behavior and related these to individual-level characteristics, such as gender and age, and to country-level characteristics, such as sex ratio.
A Comparison of Limited-Information and Full-Information Methods in M"plus" for Estimating Item Response Theory Parameters for Nonnormal Populations

ERIC Educational Resources Information Center

DeMars, Christine E.

2012-01-01

In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
Estimation of Item Response Theory Parameters in the Presence of Missing Data

ERIC Educational Resources Information Center

Finch, Holmes

2008-01-01

Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Examination of Different Item Response Theory Models on Tests Composed of Testlets

ERIC Educational Resources Information Center

Kogar, Esin Yilmaz; Kelecioglu, Hülya

2017-01-01

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing

ERIC Educational Resources Information Center

Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.

2013-01-01

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Bi-dimensional acculturation and cultural response set in CES-D among Korean immigrants

PubMed Central

Kim, Eunjung; Seo, Kumin; Cain, Kevin C.

2017-01-01

This study examined a cultural response set to positive affect items and depressive symptom items in CES-D among 172 Korean immigrants. Bi-dimensional acculturation approach, which considers maintenance of Korean Orientation and adoption of American Orientation, was utilized. As Korean immigrants increased American Orientation, they tended to score higher on positive affect items, while no changes occurred in depressive symptom items. Korean Orientation was not related to either positive affect items or depressive symptom items. Korean immigrants have response bias toward positive affect items in CES-D, which decreases as they adopt more American Orientation. CES-D lacks cultural equivalence for Korean immigrants. PMID:20701420
Vegetable parenting practices scale. Item response modeling analyses

PubMed Central

Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

2015-01-01

Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
Extended Producer Responsibility and Product Stewardship for Tobacco Product Waste

PubMed Central

Curtis, Clifton; Collins, Susan; Cunningham, Shea; Stigler, Paula; Novotny, Thomas E

2015-01-01

This paper reviews several environmental principles, including Extended Producer Responsibility (EPR), Product Stewardship (PS), the Polluter Pays Principle (PPP), and the Precautionary Principle, as they may apply to tobacco product waste (TPW). The review addresses specific criteria that apply in deciding whether a particular toxic product should adhere to these principles; presents three case studies of similar approaches to other toxic and/or environmentally harmful products; and describes 10 possible interventions or policy actions that may help prevent, reduce, and mitigate the effects of TPW. EPR promotes total lifecycle environmental improvements, placing economic, physical, and informational responsibilities onto the tobacco industry, while PS complements EPR, but with responsibility shared by all parties involved in the tobacco product lifecycle. Both principles focus on toxic source reduction, post-consumer take-back, and final disposal of consumer products. These principles when applied to TPW have the potential to substantially decrease the environmental and public health harms of cigarette butts and other TPW throughout the world. TPW is the most commonly littered item picked up during environmental, urban, and coastal cleanups globally. PMID:26457262
Item response theory analysis of the Pain Self-Efficacy Questionnaire.

PubMed

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.

On the Complexity of Item Response Theory Models.

PubMed

Bonifay, Wes; Cai, Li

2017-01-01

Complexity in item response theory (IRT) has traditionally been quantified by simply counting the number of freely estimated parameters in the model. However, complexity is also contingent upon the functional form of the model. We examined four popular IRT models-exploratory factor analytic, bifactor, DINA, and DINO-with different functional forms but the same number of free parameters. In comparison, a simpler (unidimensional 3PL) model was specified such that it had 1 more parameter than the previous models. All models were then evaluated according to the minimum description length principle. Specifically, each model was fit to 1,000 data sets that were randomly and uniformly sampled from the complete data space and then assessed using global and item-level fit and diagnostic measures. The findings revealed that the factor analytic and bifactor models possess a strong tendency to fit any possible data. The unidimensional 3PL model displayed minimal fitting propensity, despite the fact that it included an additional free parameter. The DINA and DINO models did not demonstrate a proclivity to fit any possible data, but they did fit well to distinct data patterns. Applied researchers and psychometricians should therefore consider functional form-and not goodness-of-fit alone-when selecting an IRT model.
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Item Response Theory Using Hierarchical Generalized Linear Models

ERIC Educational Resources Information Center

Ravand, Hamdollah

2015-01-01

Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item Response Theory Equating Using Bayesian Informative Priors.

ERIC Educational Resources Information Center

de la Torre, Jimmy; Patz, Richard J.

This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Instrument Formatting with Computer Data Entry in Mind.

ERIC Educational Resources Information Center

Boser, Judith A.; And Others

Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
Longitudinal tests of competing factor structures for the Rosenberg Self-Esteem Scale: traits, ephemeral artifacts, and stable response styles.

PubMed

Marsh, Herbert W; Scalas, L Francesca; Nagengast, Benjamin

2010-06-01

Self-esteem, typically measured by the Rosenberg Self-Esteem Scale (RSE), is one of the most widely studied constructs in psychology. Nevertheless, there is broad agreement that a simple unidimensional factor model, consistent with the original design and typical application in applied research, does not provide an adequate explanation of RSE responses. However, there is no clear agreement about what alternative model is most appropriate-or even a clear rationale for how to test competing interpretations. Three alternative interpretations exist: (a) 2 substantively important trait factors (positive and negative self-esteem), (b) 1 trait factor and ephemeral method artifacts associated with positively or negatively worded items, or (c) 1 trait factor and stable response-style method factors associated with item wording. We have posited 8 alternative models and structural equation model tests based on longitudinal data (4 waves of data across 8 years with a large, representative sample of adolescents). Longitudinal models provide no support for the unidimensional model, undermine support for the 2-factor model, and clearly refute claims that wording effects are ephemeral, but they provide good support for models positing 1 substantive (self-esteem) factor and response-style method factors that are stable over time. This longitudinal methodological approach has not only resolved these long-standing issues in self-esteem research but also has broad applicability to most psychological assessments based on self-reports with a mix of positively and negatively worded items.
An examination of gender bias on the eighth-grade MEAP science test as it relates to the Hunter Gatherer Theory of Spatial Sex Differences

NASA Astrophysics Data System (ADS)

Armstrong-Hall, Judy Gail

The purpose of this study was to apply the Hunter-Gatherer Theory of sex spatial skills to responses to individual questions by eighth grade students on the Science component of the Michigan Educational Assessment Program (MEAP) to determine if sex bias was inherent in the test. The Hunter-Gatherer Theory on Spatial Sex Differences, an original theory, that suggested a spatial dimorphism concept with female spatial skill of pattern recall of unconnected items and male spatial skills requiring mental movement. This is the first attempt to apply the Hunter-Gatherer Theory on Spatial Sex Differences to a standardized test. An overall hypothesis suggested that the Hunter-Gatherer Theory of Spatial Sex Differences could predict that males would perform better on problems involving mental movement and females would do better on problems involving the pattern recall of unconnected items. Responses to questions on the 1994-95 MEAP requiring the use of male spatial skills and female spatial skills were analyzed for 5,155 eighth grade students. A panel composed of five educators and a theory developer determined which test items involved the use of male and female spatial skills. A MANOVA, using a random sample of 20% of the 5,155 students to compare male and female correct scores, was statistically significant, with males having higher scores on male spatial skills items and females having higher scores on female spatial skills items. Pearson product moment correlation analyses produced a positive correlation for both male and female performance on both types of spatial skills. The Hunter-Gatherer Theory of Spatial Sex Differences appears to be able to predict that males could perform better on the problems involving mental movement and females could perform better on problems involving the pattern recall of unconnected items. Recommendations for further research included: examination of male/female spatial skill differences at early elementary and high school levels to determine impact of gender on difficulties in solving spatial problems; investigation of the relationship between dominant female spatial skills for students diagnosed with ADHD; study effects of teaching male spatial skills to female students starting in early elementary school to determine the effect on standardized testing.
Applying Subject Matter Expertise (SME) Elicitation Techniques to TRAC Studies

DTIC Science & Technology

2014-09-30

prioritisation, budgeting and resource allocation with multi-criteria decision analysis and decision conferencing ”. English. In: Annals of Operations... electronically . Typically, in responding to survey items, experts are not expected to elaborate beyond providing responses in the format requested in the...between them, however irrelevant to probability Kynn and Ayyub.84 For example, an electronic jamming device might disrupt a cell phone signal at certain
Assessor Decision Making While Marking a Note-Taking Listening Test: The Case of the OET

ERIC Educational Resources Information Center

Harding, Luke; Pill, John; Ryan, Kerry

2011-01-01

This article investigates assessor decision making when using and applying a marking guide for a note-taking task in a specific purpose English language listening test. In contexts where note-taking items are used, a marking guide is intended to stipulate what kind of response should be accepted as evidence of the ability under test. However,…
An Application of the Rasch Measurement Theory to an Assessment of Geometric Thinking Levels

ERIC Educational Resources Information Center

Stols, Gerrit; Long, Caroline; Dunne, Tim

2015-01-01

The purpose of this study is to apply the Rasch model to investigate both the Van Hiele theory for geometric development and an associated test. In terms of the test, the objective is to investigate the functioning of a classic 25-item instrument designed to identify levels of geometric proficiency. The dataset of responses by 244 students (106…
An Evaluation of the Precision of Measurement of Ryff's Psychological Well-Being Scales in a Population Sample

ERIC Educational Resources Information Center

Abbott, Rosemary A.; Ploubidis, George B.; Huppert, Felicia A.; Kuh, Diana; Croudace, Tim J.

2010-01-01

The aim of this study is to assess the effective measurement range of Ryff's Psychological Well-being scales (PWB). It applies normal ogive item response theory (IRT) methodology using factor analysis procedures for ordinal data based on a limited information estimation approach. The data come from a sample of 1,179 women participating in a…
Prediction of true test scores from observed item scores and ancillary data.

PubMed

Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

2015-05-01

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…
Asymptotic Properties of Induced Maximum Likelihood Estimates of Nonlinear Models for Item Response Variables: The Finite-Generic-Item-Pool Case.

ERIC Educational Resources Information Center

Jones, Douglas H.

The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…
Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10

ERIC Educational Resources Information Center

Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip

2006-01-01

Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model.…
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary.

PubMed

Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

PubMed Central

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2016-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Evaluation of the Parent-Report Inventory of Callous-Unemotional Traits in a Sample of Children Recruited from Intimate Partner Violence Services: A Multidimensional Rasch Analysis.

PubMed

McDonald, Shelby Elaine; Ma, Lin; Green, Kathy E; Hitti, Stephanie A; Cody, Anna M; Donovan, Courtney; Williams, James Herbert; Ascione, Frank R

2018-03-01

Our study applied multidimensional item response theory (MIRT) to compare structural models of the parent-report version of the Inventory of Callous and Unemotional Traits (ICU; English and North American Spanish translations). A total of 291 maternal caregivers were recruited from community-based domestic violence services and reported on their children (77.9% ethnic minority; 47% female), who ranged in age from 7 to 12 years (mean = 9.07, standard deviation = 1.64). We compared 9 models that were based on prior psychometric evaluations of the ICU. MIRT analyses indicated that a revised 18-item version comprising 2 factors (callous-unemotional and empathic-prosocial) was more suitable for our sample. Differential item functioning was found for several items across ethnic and language groups, but not for child gender or age. Evidence of construct validity was found. We recommend continued research and revisions to the ICU to better assess the presence of callous-unemotional traits in community samples of school-age children. © 2017 Wiley Periodicals, Inc.
The PU-PROM: A patient-reported outcome measure for peptic ulcer disease.

PubMed

Liu, Na; Lv, Jing; Liu, Jinchun; Zhang, Yanbo

2017-12-01

Patient-reported outcome measure (PROM) conceived to enable description of treatment-related effects, from the patient perspective, bring the potential to improve in clinical research, and to provide patients with accurate information. Therefore, the aim of this study was to develop a patient-centred peptic ulcer patient-reported outcome measure (PU-PROM) and evaluate its reliability, validity, differential item functioning (DIF) and feasibility. To develop a conceptual framework and item pool for the PU-PROM, we performed a literature review and consulted other measures created in China and other countries. Beyond that, we interviewed 10 patients with peptic ulcers, and consulted six key experts to ensure that all germane parameters were included. In the first item selection phase, classical test theory and item response theory were used to select and adjust items to shape the preliminary measure completed by 130 patients and 50 controls. In the next phase, the measure was evaluated used the same methods with 492 patients and 124 controls. Finally, we used the same population in the second item reselection to assess the reliability, validity, DIF and feasibility of the final measure. The final peptic ulcer PRO measure comprised four domains (physiology, psychology, society and treatment), with 11 subdomains, and 54 items. The Cronbach's α coefficient of each subdomain for the measure was >0.800. Confirmatory factory analysis indicated that the construct validity fulfilled expectations. Model fit indices, such as RMR, RMSEA, NFI, NNFI, CFI and IFI, showed acceptable fit. The measure showed a good response rate. The peptic ulcer PRO measure had good reliability, validity, DIF and feasibility, and can be used as a clinical research evaluation instrument with patients with peptic ulcers to assess their condition focus on treatment. This measure may also be applied in other health areas, especially in clinical trials of new drugs, and may be helpful in clinical decision making. © 2017 The Authors Health Expectations Published by John Wiley & Sons Ltd.

Informed consent for phase I studies: evaluation of quantity and quality of information provided to patients.

PubMed

Tomamichel, M; Sessa, C; Herzig, S; de Jong, J; Pagani, O; Willems, Y; Cavalli, F

1995-04-01

The process by which patients are informed and their consent is obtained in phase I trials has thus far been only marginally studied. Since 1986 we have followed an oral procedure, consisting of three consecutive conversations in which the investigator responsible for phase I studies, the research nurse and the patients' relatives and/or friends also participate, followed by the patients signing of a written consent form. It is required that six items of information considered essential by our staff be conveyed to patients by the responsible investigator. Meerwein's model, which defines three main dimensions of the informing process (the information itself, the emotional and interactive aspects), has been studied to ascertain whether it can be applied to evaluate the quality of the information proffered. Thirty-two conversations were taped, transcribed and evaluated by one psychiatrist and one psychologist. A quantitative analysis of information was performed by calculating the number of patients to whom the essential items of information had been conveyed. The qualitative analysis was performed by rating on a five-point scoring system, from 1 (very bad) to 5 (excellent), the three dimensions of the informing process for each patient and by calculating for each dimension the mean score of the constituent items. Complete information about the characteristics of the phase I drug and the modalities of the treatment and follow up was given to almost 80% of the patients. All but one of the items of the information dimension scored 3.5 or higher, with the one related to the assessment by the doctor of the patient's understanding at the end of the consultation scoring less than 3 in 53% of the patients. All items of the emotional dimension scored higher than 3.5. Greater difficulty was encountered by the physician with the interactive dimension, the lowest mean scores being reported on the items related to the doctor's awareness of the indirectly expressed anxieties of the patients. In 71% of the consultations the three dimensions of information scored more than 3 and balanced one another, indicating a successful consultation by the Meerwein model. The informed consent procedure applied was satisfactory from a quantitative point of view, and the main items of information were acceptable to the patients. Meerweins's model proved to be applicable and useful for identifying pitfalls in communication. Greater attention should be paid to the indirect messages and implied criticisms of the patients to improve their participation in decision making. Physicians should become more skillful in providing adequate information and improve their methods of communication.
The ABC’s of Suicide Risk Assessment: Applying a Tripartite Approach to Individual Evaluations

PubMed Central

Harris, Keith M.; Syu, Jia-Jia; Lello, Owen D.; Chew, Y. L. Eileen; Willcox, Christopher H.; Ho, Roger H. M.

2015-01-01

There is considerable need for accurate suicide risk assessment for clinical, screening, and research purposes. This study applied the tripartite affect-behavior-cognition theory, the suicidal barometer model, classical test theory, and item response theory (IRT), to develop a brief self-report measure of suicide risk that is theoretically-grounded, reliable and valid. An initial survey (n = 359) employed an iterative process to an item pool, resulting in the six-item Suicidal Affect-Behavior-Cognition Scale (SABCS). Three additional studies tested the SABCS and a highly endorsed comparison measure. Studies included two online surveys (Ns = 1007, and 713), and one prospective clinical survey (n = 72; Time 2, n = 54). Factor analyses demonstrated SABCS construct validity through unidimensionality. Internal reliability was high (α = .86-.93, split-half = .90-.94)). The scale was predictive of future suicidal behaviors and suicidality (r = .68, .73, respectively), showed convergent validity, and the SABCS-4 demonstrated clinically relevant sensitivity to change. IRT analyses revealed the SABCS captured more information than the comparison measure, and better defined participants at low, moderate, and high risk. The SABCS is the first suicide risk measure to demonstrate no differential item functioning by sex, age, or ethnicity. In all comparisons, the SABCS showed incremental improvements over a highly endorsed scale through stronger predictive ability, reliability, and other properties. The SABCS is in the public domain, with this publication, and is suitable for clinical evaluations, public screening, and research. PMID:26030590
The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

2016-04-01

The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

ERIC Educational Resources Information Center

Cher Wong, Cheow

2015-01-01

Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Examination of Polytomous Items' Psychometric Properties According to Nonparametric Item Response Theory Models in Different Test Conditions

ERIC Educational Resources Information Center

Sengul Avsar, Asiye; Tavsancil, Ezel

2017-01-01

This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Item Response Theory Models for Wording Effects in Mixed-Format Scales

ERIC Educational Resources Information Center

Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

2015-01-01

Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Rasch-built Overall Disability Scale for Multifocal motor neuropathy (MMN-RODS(©) ).

PubMed

Vanhoutte, Els K; Faber, Catharina G; van Nes, Sonja I; Cats, Elisabeth A; Van der Pol, W-Ludo; Gorson, Kenneth C; van Doorn, Pieter A; Cornblath, David R; van den Berg, Leonard H; Merkies, Ingemar S J

2015-09-01

Clinical trials in multifocal motor neuropathy (MMN) have often used ordinal-based measures that may not accurately capture changes. We aimed to construct a disability interval outcome measure specifically for MMN using the Rasch model and to examine its clinimetric properties. A total of 146 preliminary activity and participation items were assessed twice (reliability studies) in 96 clinically stable MMN patients. These patients also assessed the ordinal-based overall disability sum score (construct, sample-dependent validity). The final Rasch-built overall disability scale for MMN (MMN-RODS(©) ) was serially applied in 26 patients with newly diagnosed or relapsing MMN, treated with intravenous immunoglobulin (IVIg) (1-year follow-up; responsiveness study). The magnitude of change for each patient was calculated using the minimum clinically important difference technique related to the individually obtained standard errors. A total of 121 items not fulfilling Rasch requirements were removed. The final 25-item MMN-RODS(©) fulfilled all Rasch model's expectations and showed acceptable reliability and validity including good discriminatory capacity. Most serially examined patients improved, but its magnitude was low, reflecting poor responsiveness. The constructed MMN-RODS(©) is a disease-specific, interval measure to detect activity limitations in patients with MMN and overcomes the shortcomings of ordinal scales. However, future clinimetric studies are needed to improve the MMN-RODS(©) 's responsiveness by longer observations and/or more rigorous treatment regimens. © 2015 Peripheral Nerve Society.
Vegetable parenting practices scale: Item response modeling analyses

USDA-ARS?s Scientific Manuscript database

Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
Assessing Psycho-social Barriers to Rehabilitation in Injured Workers with Chronic Musculoskeletal Pain: Development and Item Properties of the Yellow Flag Questionnaire (YFQ).

PubMed

Salathé, Cornelia Rolli; Trippolini, Maurizio Alen; Terribilini, Livio Claudio; Oliveri, Michael; Elfering, Achim

2018-06-01

Purpose To develop a multidimensional scale to asses psychosocial beliefs-the Yellow Flag Questionnaire (YFQ)-aimed at guiding interventions for workers with chronic musculoskeletal (MSK) pain. Methods Phase 1 consisted of item selection based on literature search, item development and expert consensus rounds. In phase 2, items were reduced with calculating a quality-score per item, using structure equation modeling and confirmatory factor analysis on data from 666 workers. In phase 3, Cronbach's α, and Pearson correlations coefficients were computed to compare YFQ with disability, anxiety, depression and self-efficacy and the YFQ score based on data from 253 injured workers. Regressions of YFQ total score on disability, anxiety, depression and self-efficacy were calculated. Results After phase 1, the YFQ included 116 items and 15 domains. Further reductions of items in phase 2 by applying the item quality criteria reduced the total to 48 items. Phase factor analysis with structural equation modeling confirmed 32 items in seven domains: activity, work, emotions, harm & blame, diagnosis beliefs, co-morbidity and control. Cronbach α was 0.91 for the total score, between 0.49 and 0.81 for the 7 distinct scores of each domain, respectively. Correlations between YFQ total score ranged with disability, anxiety, depression and self-efficacy was .58, .66, .73, -.51, respectively. After controlling for age and gender the YFQ total score explained between R2 27% and R2 53% variance of disability, anxiety, depression and self-efficacy. Conclusions The YFQ, a multidimensional screening scale is recommended for use to assess psychosocial beliefs of workers with chronic MSK pain. Further evaluation of the measurement properties such as the test-retest reliability, responsiveness and prognostic validity is warranted.
A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

ERIC Educational Resources Information Center

Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

2011-01-01

The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
Structural Equation Model Approach to the Use of Response Times for Improving Estimation in Item Response Models

ERIC Educational Resources Information Center

Sen, Rohini

2012-01-01

In the last five decades, research on the uses of response time has extended into the field of psychometrics (Schnikpe & Scrams, 1999; van der Linden, 2006; van der Linden, 2007), where interest has centered around the usefulness of response time information in item calibration and person measurement within an item response theory. framework.…
A Primer on the 2- and 3-Parameter Item Response Theory Models.

ERIC Educational Resources Information Center

Thornton, Artist

Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…
The EORTC information questionnaire, EORTC QLQ-INFO25. Validation study for Spanish patients.

PubMed

Arraras, Juan Ignacio; Manterola, Ana; Hernández, Berta; Arias de la Vega, Fernando; Martínez, Maite; Vila, Meritxell; Eito, Clara; Vera, Ruth; Domínguez, Miguel Ángel

2011-06-01

The EORTC QLQ-INFO25 evaluates the information received by cancer patients. This study assesses the psychometric properties of the QLQ-INFO25 when applied to a sample of Spanish patients. A total of 169 patients with different cancers and stages of disease completed the EORTC QLQINFO25, the EORTC QLQ-C30 and the information scales of the inpatient satisfaction module EORTC IN-PATSAT32 on two occasions during the patients' treatment and follow- up period. Psychometric evaluation of the structure, reliability, validity and responsiveness to changes was conducted. Patient acceptability was assessed with a debriefing questionnaire. Multi-trait scaling confirmed the 4 multi-item scales (information about disease, medical tests, treatment and other services) and eight single items. All items met the standards for convergent validity and all except one met the standards of item discriminant validity. Internal consistency for all scales (α>0.70) and the whole questionnaire (α>0.90) was adequate in the three measurements, except information about the disease (0.67) and other services (0.68) in the first measurement, as was test-retest reliability (intraclass correlations >0.70). Correlations with related areas of IN-PATSAT32 (r>0.40) supported convergent validity. Divergent validity was confirmed through low correlations with EORTC QLQ-C30 scales (r<0.30). The EORTC QLQ-INFO-25 discriminated among groups based on gender, age, education, levels of anxiety and depression, treatment line, wish for information and satisfaction. One scale and an item showed changes over time. The EORTC QLQ-INFO 25 is a reliable and valid instrument when applied to a sample of Spanish cancer patients. These results are in line with those of the EORTC validation study.
The development of automaticity in short-term memory search: Item-response learning and category learning.

PubMed

Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M

2017-05-01

In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Building a computer program to support children, parents, and distraction during healthcare procedures.

PubMed

Hanrahan, Kirsten; McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W Nick; Zimmerman, M Bridget; Ersig, Anne L

2012-10-01

This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children's responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, titled Children, Parents and Distraction, is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure.
Developing a short version of the Toronto Structured Interview for Alexithymia using item response theory.

PubMed

Sekely, Angela; Taylor, Graeme J; Bagby, R Michael

2018-03-17

The Toronto Structured Interview for Alexithymia (TSIA) was developed to provide a structured interview method for assessing alexithymia. One drawback of this instrument is the amount of time it takes to administer and score. The current study used item response theory (IRT) methods to analyze data from a large heterogeneous multi-language sample (N = 842) to investigate whether a subset of items could be selected to create a short version of the instrument. Samejima's (1969) graded response model was used to fit the item responses. Items providing maximum information were retained in the short model, resulting in the elimination of 12-items from the original 24-items. Despite the 50% reduction in the number of items, 65.22% of the information was retained. Further studies are needed to validate the short version. A short version of the TSIA is potentially of practical value to clinicians and researchers with time constraints. Copyright © 2018. Published by Elsevier B.V.
Contextual behavior and neural circuits

PubMed Central

Lee, Inah; Lee, Choong-Hee

2013-01-01

Animals including humans engage in goal-directed behavior flexibly in response to items and their background, which is called contextual behavior in this review. Although the concept of context has long been studied, there are differences among researchers in defining and experimenting with the concept. The current review aims to provide a categorical framework within which not only the neural mechanisms of contextual information processing but also the contextual behavior can be studied in more concrete ways. For this purpose, we categorize contextual behavior into three subcategories as follows by considering the types of interactions among context, item, and response: contextual response selection, contextual item selection, and contextual item–response selection. Contextual response selection refers to the animal emitting different types of responses to the same item depending on the context in the background. Contextual item selection occurs when there are multiple items that need to be chosen in a contextual manner. Finally, when multiple items and multiple contexts are involved, contextual item–response selection takes place whereby the animal either chooses an item or inhibits such a response depending on item–context paired association. The literature suggests that the rhinal cortical regions and the hippocampal formation play key roles in mnemonically categorizing and recognizing contextual representations and the associated items. In addition, it appears that the fronto-striatal cortical loops in connection with the contextual information-processing areas critically control the flexible deployment of adaptive action sets and motor responses for maximizing goals. We suggest that contextual information processing should be investigated in experimental settings where contextual stimuli and resulting behaviors are clearly defined and measurable, considering the dynamic top-down and bottom-up interactions among the neural systems for contextual behavior. PMID:23675321
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

PubMed

Eichenbaum, Alexander E; Marcus, David K; French, Brian F

2017-06-01

This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Comparison of response patterns in different survey designs: a longitudinal panel with mixed-mode and online-only design.

PubMed

Rübsamen, Nicole; Akmatov, Manas K; Castell, Stefanie; Karch, André; Mikolajczyk, Rafael T

2017-01-01

Increasing availability of the Internet allows using only online data collection for more epidemiological studies. We compare response patterns in a population-based health survey using two survey designs: mixed-mode (choice between paper-and-pencil and online questionnaires) and online-only design (without choice). We used data from a longitudinal panel, the Hygiene and Behaviour Infectious Diseases Study (HaBIDS), conducted in 2014/2015 in four regions in Lower Saxony, Germany. Individuals were recruited using address-based probability sampling. In two regions, individuals could choose between paper-and-pencil and online questionnaires. In the other two regions, individuals were offered online-only participation. We compared sociodemographic characteristics of respondents who filled in all panel questionnaires between the mixed-mode group (n = 1110) and the online-only group (n = 482). Using 134 items, we performed multinomial logistic regression to compare responses between survey designs in terms of type (missing, "do not know" or valid response) and ordinal regression to compare responses in terms of content. We applied the false discovery rates (FDR) to control for multiple testing and investigated effects of adjusting for sociodemographic characteristic. For validation of the differential response patterns between mixed-mode and online-only, we compared the response patterns between paper and online mode among the respondents in the mixed-mode group in one region (n = 786). Respondents in the online-only group were older than those in the mixed-mode group, but both groups did not differ regarding sex or education. Type of response did not differ between the online-only and the mixed-mode group. Survey design was associated with different content of response in 18 of the 134 investigated items; which decreased to 11 after adjusting for sociodemographic variables. In the validation within the mixed-mode, only two of those were among the 11 significantly different items. The probability of observing by chance the same two or more significant differences in this setting was 22%. We found similar response patterns in both survey designs with only few items being answered differently, likely attributable to chance. Our study supports the equivalence of the compared survey designs and suggests that, in the studied setting, using online-only design does not cause strong distortion of the results.

Applying Hierarchical Model Calibration to Automatically Generated Items.

ERIC Educational Resources Information Center

Williamson, David M.; Johnson, Matthew S.; Sinharay, Sandip; Bejar, Isaac I.

This study explored the application of hierarchical model calibration as a means of reducing, if not eliminating, the need for pretesting of automatically generated items from a common item model prior to operational use. Ultimately the successful development of automatic item generation (AIG) systems capable of producing items with highly similar…
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Investigating Separate and Concurrent Approaches for Item Parameter Drift in 3PL Item Response Theory Equating

ERIC Educational Resources Information Center

Arce-Ferrer, Alvaro J.; Bulut, Okan

2017-01-01

This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

ERIC Educational Resources Information Center

Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

2013-01-01

In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Generalizability in Item Response Modeling

ERIC Educational Resources Information Center

Briggs, Derek C.; Wilson, Mark

2007-01-01

An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…
Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model

ERIC Educational Resources Information Center

Andrich, David; Humphry, Stephen M.; Marais, Ida

2012-01-01

Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…
Using Response Times for Item Selection in Adaptive Testing

ERIC Educational Resources Information Center

van der Linden, Wim J.

2008-01-01

Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the…
The Influence of Item Response Indecision on the Self-Directed Search

ERIC Educational Resources Information Center

Sampson, James P., Jr.; Shy, Jonathan D.; Hartley, Sarah Lucas; Reardon, Robert C.; Peterson, Gary W.

2009-01-01

Students (N = 247) responded to Self-Directed Search (SDS) per the standard response format and were also instructed to record a question mark (?) for items about which they were uncertain (item response indecision [IRI]). The initial responses of the 114 participants with a (?) were then reversed and a second SDS summary code was obtained and…
Improving measurement of injection drug risk behavior using item response theory.

PubMed

Janulis, Patrick

2014-03-01

Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.

PubMed

Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen

2004-10-01

To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
Rasch Analysis of the Edmonton Symptom Assessment System.

PubMed

Sprague, Emma; Siegert, Richard J; Medvedev, Oleg; Roberts, Margaret H

2018-05-01

The Edmonton Symptom Assessment System (ESAS) is a widely used multisymptom assessment tool in cancer and palliative care settings, but its psychometric properties have not been widely tested using modern psychometric methods such as Rasch analysis. To apply Rasch analysis to the ESAS in a community palliative care setting and determine its suitability for assessing symptom burden in this group. ESAS data collected from 229 patients enrolled in a community hospice service were evaluated using a partial credit Rasch model with RUMM2030 software (RUMM Laboratory Pty, Ltd., Duncraig, WA). Where disordered thresholds were discovered, item rescoring was undertaken. Rasch model fit and differential item functioning were evaluated after each iterative phase. Uniform rescoring was necessary for all 12 items to display ordered thresholds. The best model fit was achieved after item rescoring and combining three pairs of locally dependent items into three superitems (χ 2 = 29.56 [27]; P = 0.33) that permitted ordinal-to-interval conversion. The ESAS satisfied unidimensional Rasch model expectations in a 12-item format after minor modifications. This included uniform rescoring of the disordered response categories and creating superitems to improve model fit and clinical utility. The accuracy of the ESAS scores can be improved by using ordinal-to-interval conversion tables published in the article. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Improving Assessment of Work Related Mental Health Function Using the Work Disability Functional Assessment Battery (WD-FAB).

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M

2018-03-01

Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Distinctive emotional responses of clinicians to suicide-attempting patients - a comparative study

PubMed Central

2013-01-01

Background Clinician responses to patients have been recognized as an important factor in treatment outcome. Clinician responses to suicidal patients have received little attention in the literature however, and no quantitative studies have been published. Further, although patients with high versus low lethality suicidal behaviors have been speculated to represent two distinct populations, clinicians’ emotional responses to them have not been examined. Methods Clinicians’ responses to their patients when last seeing them prior to patients’ suicide attempt or death were assessed retrospectively with the Therapist Response/Countertransference Questionnaire, administered anonymously via an Internet survey service. Scores on individual items and subscale scores were compared between groups, and linear discriminant analysis was applied to determine the combination of items that best discriminated between groups. Results Clinicians reported on patients who completed suicide, made high-lethality attempts, low-lethality attempts, or died unexpected non-suicidal deaths in a total of 82 cases. We found that clinicians treating imminently suicidal patients had less positive feelings towards these patients than for non-suicidal patients, but had higher hopes for their treatment, while finding themselves notably more overwhelmed, distressed by, and to some degree avoidant of them. Further, we found that the specific paradoxical combination of hopefulness and distress/avoidance was a significant discriminator between suicidal patients and those who died unexpected non-suicidal deaths with 90% sensitivity and 56% specificity. In addition, we identified one questionnaire item that discriminated significantly between high- and low-lethality suicide patients. Conclusions Clinicians’ emotional responses to patients at risk versus not at risk for imminent suicide attempt may be distinct in ways consistent with responses theorized by Maltsberger and Buie in 1974. Prospective replication is needed to confirm these results, however. Our findings demonstrate the feasibility of using quantitative self-report methodologies for investigation of the relationship between clinicians’ emotional responses to suicidal patients and suicide risk. PMID:24053664
10 CFR 32.14 - Certain items containing byproduct material; requirements for license to apply or initially...

Code of Federal Regulations, 2011 CFR

2011-01-01

... for license to apply or initially transfer. 32.14 Section 32.14 Energy NUCLEAR REGULATORY COMMISSION SPECIFIC DOMESTIC LICENSES TO MANUFACTURE OR TRANSFER CERTAIN ITEMS CONTAINING BYPRODUCT MATERIAL Exempt... or initially transfer. An application for a specific license to apply byproduct material to, or to...
The effects of 'does not apply' on measurement of temperament with the Infant Behavior Questionnaire-Revised: A cautionary tale for very young infants.

PubMed

Giesbrecht, Gerald F; Dewey, Deborah

2014-10-01

The Infant Behavior Questionnaire-Revised (IBQ-R) is a widely used parent report measure of infant temperament. Items marked 'does not apply' (NA) are treated as missing data when calculating scale scores, but the effect of this practice on assessment of infant temperament has not been reported. To determine the effect of NA responses on assessment of infant temperament and to evaluate the remedy offered by several missing data strategies. A prospective, community-based longitudinal cohort study. 401 infants who were born>37 weeks of gestation. Mothers completed the short form of the IBQ-R when infants were 3-months and 6-months of age. The rate of NA responses at the 3-month assessment was three times as high (22%) as the rate at six months (7%). Internal consistency was appreciably reduced and scale means were inflated in the presence of NA responses, especially at 3-months. The total number of NA items endorsed by individual parents was associated with infant age and parity. None of the missing data strategies completely eliminated problems related to NA responses but the Expectation Maximization algorithm greatly reduced these problems. The findings suggest that researchers should exercise caution when interpreting results obtained from infants at 3 months of age. Careful selection of scales, selecting a full length version of the IBQ-R, and use of a modern missing data technique may help to maintain the quality of data obtained from very young infants. Copyright © 2014 Elsevier Ltd. All rights reserved.
Application of a General Polytomous Testlet Model to the Reading Section of a Large-Scale English Language Assessment. Research Report. ETS RR-10-21

ERIC Educational Resources Information Center

Li, Yanmei; Li, Shuhong; Wang, Lin

2010-01-01

Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd

The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
A Comparison of the One-, the Modified Three-, and the Three-Parameter Item Response Theory Models in the Test Development Item Selection Process.

ERIC Educational Resources Information Center

Eignor, Daniel R.; Douglass, James B.

This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
The positive mental health instrument: development and validation of a culturally relevant scale in a multi-ethnic Asian population.

PubMed

Vaingankar, Janhavi Ajit; Subramaniam, Mythily; Chong, Siow Ann; Abdin, Edimansyah; Orlando Edelen, Maria; Picco, Louisa; Lim, Yee Wei; Phua, Mei Yen; Chua, Boon Yiang; Tee, Joseph Y S; Sherbourne, Cathy

2011-10-31

Instruments to measure mental health and well-being are largely developed and often used within Western populations and this compromises their validity in other cultures. A previous qualitative study in Singapore demonstrated the relevance of spiritual and religious practices to mental health, a dimension currently not included in exiting multi-dimensional measures. The objective of this study was to develop a self-administered measure that covers all key and culturally appropriate domains of mental health, which can be applied to compare levels of mental health across different age, gender and ethnic groups. We present the item reduction and validation of the Positive Mental Health (PMH) instrument in a community-based adult sample in Singapore. Surveys were conducted among adult (21-65 years) residents belonging to Chinese, Malay and Indian ethnicities. Exploratory and confirmatory factor analysis (EFA, CFA) were conducted and items were reduced using item response theory tests (IRT). The final version of the PMH instrument was tested for internal consistency and criterion validity. Items were tested for differential item functioning (DIF) to check if items functioned in the same way across all subgroups. EFA and CFA identified six first-order factor structure (General coping, Personal growth and autonomy, Spirituality, Interpersonal skills, Emotional support, and Global affect) under one higher-order dimension of Positive Mental Health (RMSEA=0.05, CFI=0.96, TLI=0.96). A 47-item self-administered multi-dimensional instrument with a six-point Likert response scale was constructed. The slope estimates and strength of the relation to the theta for all items in each six PMH subscales were high (range:1.39 to 5.69), suggesting good discrimination properties. The threshold estimates for the instrument ranged from -3.45 to 1.61 indicating that the instrument covers entire spectrums for the six dimensions. The instrument demonstrated high internal consistency and had significant and expected correlations with other well-being measures. Results confirmed absence of DIF. The PMH instrument is a reliable and valid instrument that can be used to measure and compare level of mental health across different age, gender and ethnic groups in Singapore.

Item Response Modeling of Multivariate Count Data with Zero Inflation, Maximum Inflation, and Heaping

ERIC Educational Resources Information Center

Magnus, Brooke E.; Thissen, David

2017-01-01

Questionnaires that include items eliciting count responses are becoming increasingly common in psychology. This study proposes methodological techniques to overcome some of the challenges associated with analyzing multivariate item response data that exhibit zero inflation, maximum inflation, and heaping at preferred digits. The modeling…
Nested Logit Models for Multiple-Choice Item Response Data

ERIC Educational Resources Information Center

Suh, Youngsuk; Bolt, Daniel M.

2010-01-01

Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all…
The Dutch Identity: A New Tool for the Study of Item Response Models.

ERIC Educational Resources Information Center

Holland, Paul W.

1990-01-01

The Dutch Identity is presented as a useful tool for expressing the basic equations of item response models that relate the manifest probabilities to the item response functions and the latent trait distribution. Ways in which the identity may be exploited are suggested and illustrated. (SLD)
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Sample Invariance of the Structural Equation Model and the Item Response Model: A Case Study.

ERIC Educational Resources Information Center

Breithaupt, Krista; Zumbo, Bruno D.

2002-01-01

Evaluated the sample invariance of item discrimination statistics in a case study using real data, responses of 10 random samples of 500 people to a depression scale. Results lend some support to the hypothesized superiority of a two-parameter item response model over the common form of structural equation modeling, at least when responses are…
A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

ERIC Educational Resources Information Center

Wolkowitz, Amanda A.; Skorupski, William P.

2013-01-01

When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
Refining a self-assessment of informatics competency scale using Mokken scaling analysis.

PubMed

Yoon, Sunmoo; Shaffer, Jonathan A; Bakken, Suzanne

2015-01-01

Healthcare environments are increasingly implementing health information technology (HIT) and those from various professions must be competent to use HIT in meaningful ways. In addition, HIT has been shown to enable interprofessional approaches to health care. The purpose of this article is to describe the refinement of the Self-Assessment of Nursing Informatics Competencies Scale (SANICS) using analytic techniques based upon item response theory (IRT) and discuss its relevance to interprofessional education and practice. In a sample of 604 nursing students, the 93-item version of SANICS was examined using non-parametric IRT. The iterative modeling procedure included 31 steps comprising: (1) assessing scalability, (2) assessing monotonicity, (3) assessing invariant item ordering, and (4) expert input. SANICS was reduced to an 18-item hierarchical scale with excellent reliability. Fundamental skills for team functioning and shared decision making among team members (e.g. "using monitoring systems appropriately," "describing general systems to support clinical care") had the highest level of difficulty, and "demonstrating basic technology skills" had the lowest difficulty level. Most items reflect informatics competencies relevant to all health professionals. Further, the approaches can be applied to construct a new hierarchical scale or refine an existing scale related to informatics attitudes or competencies for various health professions.
Dopamine Alters the Fidelity of Working Memory Representations according to Attentional Demands

PubMed Central

Fallon, Sean James; Zokaei, Nahid; Norbury, Agnes; Manohar, Sanjay G.; Husain, Masud

2018-01-01

Capacity limitations in working memory (WM) necessitate the need to effectively control its contents. Here, we examined the effect of cabergoline, a dopamine D2 receptor agonist, on WM using a continuous report paradigm that allowed us to assess the fidelity with which items are stored. We assessed recall performance under three different gating conditions: remembering only one item, being cued to remember one target among distractors, and having to remember all items. Cabergoline had differential effects on recall performance according to whether distractors had to be ignored and whether mnemonic resources could be deployed exclusively to the target. Compared with placebo, cabergoline improved mnemonic performance when there were no distractors but significantly reduced performance when distractors were presented in a precue condition. No significant difference in performance was observed under cabergoline when all items had to be remembered. By applying a stochastic model of response selection, we established that the causes of drug-induced changes in performance were due to changes in the precision with which items were stored in WM. However, there was no change in the extent to which distractors were mistaken for targets. Thus, D2 agonism causes changes in the fidelity of mnemonic representations without altering interference between memoranda. PMID:27897674
CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS

PubMed Central

McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.

2014-01-01

The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026
Development and validation of a new condition-specific instrument for evaluation of smile esthetics-related quality of life.

PubMed

Saltovic, Ema; Lajnert, Vlatka; Saltovic, Sabina; Kovacevic Pavicic, Daniela; Pavlic, Andrej; Spalj, Stjepan

2018-03-01

Orofacial esthetics raises psychosocial issues. The purpose was to create and validate new short instrument for psychosocial impacts of altered smile esthetics. A team of an orthodontist, two prosthodontists, psychologist, and a dental student generated items that could draw up specific hypothetical psychosocial dimensions (69 items initially, 39 in final analysis). The sample consisted of 261 Caucasian subjects attending local high schools and university (26% male) aged 14 to 28 years that have self-administrated the designed questionnaire. Factorial analysis, Cronbach's alpha, Pearson correlation, paired samples t-test and analysis of variance were used for analyses of internal consistency, construct validity, responsiveness, and test-retest. Three dimensions of psychosocial impacts of altered smile esthetics were identified: dental self-consciousness, dental self-confidence and social contacts that can be best fitted by 12 items, 4 items in each dimension. Internal consistency was good (α in range 0.85-0.89). Good stability in test-retest was confirmed. In responsiveness testing, tooth whitening induced increase in dental self-confidence (P = 0.002), but no significant changes in other dimensions. The new instrument, Smile Esthetics-Related Quality of Life (SERQoL), is short and has proven to be a good indicator of psychosocial dimensions related to perception of smile esthetics. Smile Esthetics-Related Quality of Life questionnaire might have practical validity when applied in esthetic dental clinical procedures. © 2017 Wiley Periodicals, Inc.
Gender differences in national assessment of educational progress science items: What does i don't know really mean?

NASA Astrophysics Data System (ADS)

Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
The failing measurement of attitudes: How semantic determinants of individual survey responses come to replace measures of attitude strength.

PubMed

Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Egeland, Thore

2018-01-12

The traditional understanding of data from Likert scales is that the quantifications involved result from measures of attitude strength. Applying a recently proposed semantic theory of survey response, we claim that survey responses tap two different sources: a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics, we hypothesized that in many cases, information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. We developed a procedure to separate the semantic influence from attitude strength in individual response patterns, and compared these results to, respectively, the observed sample correlation matrices and the semantic similarity structures arising from text analysis algorithms. This was done with four datasets, comprising a total of 7,787 subjects and 27,461,502 observed item pair responses. As we argued, attitude strength seemed to account for much information about the individual respondents. However, this information did not seem to carry over into the observed sample correlation matrices, which instead converged around the semantic structures offered by the survey items. This is potentially disturbing for the traditional understanding of what survey data represent. We argue that this approach contributes to a better understanding of the cognitive processes involved in survey responses. In turn, this could help us make better use of the data that such methods provide.
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

ERIC Educational Resources Information Center

Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

2018-01-01

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Higher-Order Item Response Models for Hierarchical Latent Traits

ERIC Educational Resources Information Center

Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming

2013-01-01

Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…
Evaluating Item Fit for Multidimensional Item Response Models

ERIC Educational Resources Information Center

Zhang, Bo; Stone, Clement A.

2008-01-01

This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…
An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Validation of the Kohnen Restless Legs Syndrome-Quality of Life instrument.

PubMed

Kohnen, Ralf; Martinez-Martin, Pablo; Benes, Heike; Trenkwalder, Claudia; Högl, Birgit; Dunkl, Elmar; Walters, Arthur S

2016-08-01

Due to the symptoms and the sleep disturbances it causes, Restless Legs Syndrome (RLS) has a negative impact on quality of life. Measurement of such impact can be performed by means of questionnaires, such as the Kohnen Restless Legs Syndrome-Quality of Life questionnaire (KRLS-QoL), a specific 12-item instrument that is self-applied by patients. The present study is aimed at performing a first formal validation study of this instrument. Eight hundred ninety-one patients were included for analysis. RLS severity was assessed by the International Restless Legs Scale (IRLS), Restless Legs Syndrome-6 scales (RLS-6), and Clinical Global Impression of Severity. In addition the Epworth Sleepiness Scale (ESS) was assessed. Acceptability, dimensionality, scaling assumptions, reliability, precision, hypotheses-related validity, and responsiveness were tested. There were missing data in 3.58% patients. Floor and ceiling effects were low for the subscales, global evaluation, and summary index derived from items 1 to 11 after checking that scaling assumptions were met. Exploratory parallel factor analysis showed that the KRLS-QoL may be deemed unidimensional, ie, that all components of the scale are part of one overall general quality of life factor. Indexes of internal consistency (alpha = 0.88), item-total correlation (r S = 0.32-0.71), item homogeneity coefficient (0.41), and scale stability (ICC = 0.73) demonstrated a satisfactory reliability of the KRLS-QoL. Moderate or high correlations were obtained between KRLS-QoL scores and the IRLS, some components of the RLS-6, inter-KRLS-QoL domains, and global evaluations. Known-groups validity for severity levels grouping and responsiveness analysis results were satisfactory, the latter showing higher magnitudes of response for treated than for placebo arms. The KRLS-QoL was proven an acceptable, reliable, valid, and responsive measure to assess the impact of the RLS on quality of life. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Judgment and judgment latency for freedom and responsibility relatedness as a function of subtle linguistic variations.

PubMed

Wilkerson, Keith; McGahan, Joseph R; Stevens, Rick; Williamson, David; Low, Jean

2009-12-01

The goal of this study was to determine whether differential response formats to covariation problems influence corresponding response latencies. The authors provided participants with 3 trials of 16 statements addressing positive and negative relations between freedom and responsibility. The authors framed half of the items around responsibility given freedom and the other half around freedom given responsibility. Response formats comprised true-false, agree-disagree, and yes-no answers as a between-participants factor. Results indicated that the manipulation of response format did not affect latencies. However, latencies differed according to the framing of the items. For items framed around freedom given responsibility, latencies were shorter. In addition, participants were more likely to report a positive relation between freedom and responsibility when items were framed around freedom given responsibility. The authors discuss implications relative to previous research in this area and give recommendations for future research.
Psychometric properties of the Chinese version of resilience scale specific to cancer: an item response theory analysis.

PubMed

Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong

2018-06-01

Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.

ERIC Educational Resources Information Center

Martinez, Michael E.; And Others

Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…

Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models

ERIC Educational Resources Information Center

Sulis, Isabella; Toland, Michael D.

2017-01-01

Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…
An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model

ERIC Educational Resources Information Center

Tao, Wei; Cao, Yi

2016-01-01

Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
Estimating Ordinal Reliability for Likert-Type and Ordinal Item Response Data: A Conceptual, Empirical, and Practical Guide

ERIC Educational Resources Information Center

Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.

2012-01-01

This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
IRTPRO 2.1 for Windows (Item Response Theory for Patient-Reported Outcomes)

ERIC Educational Resources Information Center

Paek, Insu; Han, Kyung T.

2013-01-01

This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data. (Contains 1 figure and 2 notes.)
Measurement equivalence and differential item functioning in family psychology.

PubMed

Bingenheimer, Jeffrey B; Raudenbush, Stephen W; Leventhal, Tama; Brooks-Gunn, Jeanne

2005-09-01

Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui and H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (c) 2005 APA, all rights reserved
The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence.

ERIC Educational Resources Information Center

Ackerman, Terry A.

One of the important underlying assumptions of all item response theory (IRT) models is that of local independence. This assumption requires that the response to an item on a test not be influenced by the response to any other items. This assumption is often taken for granted, with little or no scrutiny of the response process required to answer…
The construction of categorization judgments: using subjective confidence and response latency to test a distributed model.

PubMed

Koriat, Asher; Sorka, Hila

2015-01-01

The classification of objects to natural categories exhibits cross-person consensus and within-person consistency, but also some degree of between-person variability and within-person instability. What is more, the variability in categorization is also not entirely random but discloses systematic patterns. In this study, we applied the Self-Consistency Model (SCM, Koriat, 2012) to category membership decisions, examining the possibility that confidence judgments and decision latency track the stable and variable components of categorization responses. The model assumes that category membership decisions are constructed on the fly depending on a small set of clues that are sampled from a commonly shared population of pertinent clues. The decision and confidence are based on the balance of evidence in favor of a positive or a negative response. The results confirmed several predictions derived from SCM. For each participant, consensual responses to items were more confident than non-consensual responses, and for each item, participants who made the consensual response tended to be more confident than those who made the nonconsensual response. The difference in confidence between consensual and nonconsensual responses increased with the proportion of participants who made the majority response for the item. A similar pattern was observed for response speed. The pattern of results obtained for cross-person consensus was replicated by the results for response consistency when the responses were classified in terms of within-person agreement across repeated presentations. These results accord with the sampling assumption of SCM, that confidence and response speed should be higher when the decision is consistent with what follows from the entire population of clues than when it deviates from it. Results also suggested that the context for classification can bias the sample of clues underlying the decision, and that confidence judgments mirror the effects of context on categorization decisions. The model and results offer a principled account of the stable and variable contributions to categorization behavior within a decision-making framework. Copyright © 2014 Elsevier B.V. All rights reserved.
Validation of the Community Integration Questionnaire in the adult burn injury population.

PubMed

Gerrard, Paul; Kazis, Lewis E; Ryan, Colleen M; Shie, Vivian L; Holavanahalli, Radha; Lee, Austin; Jette, Alan; Fauerbach, James A; Esselman, Peter; Herndon, David; Schneider, Jeffrey C

2015-11-01

With improved survival, long-term effects of burn injuries on quality of life, particularly community integration, are important outcomes. This study aims to assess the Community Integration Questionnaire's psychometric properties in the adult burn population. Data were obtained from a multicenter longitudinal data set of burn survivors. The psychometric properties of the Community Integration Questionnaire (n = 492) were examined. The questionnaire items were evaluated for clinical and substantive relevance; validation procedures were conducted on different samples of the population; construct validity was assessed using exploratory factor analysis; internal consistency reliability was examined using Cronbach's α statistics; and item response theory was applied to the final models. The CIQ-15 was reduced by two questions to form the CIQ-13, with a two-factor structure, interpreted as self/family care and social integration. Item response theory testing suggests that Factor 2 captures a wider range of community integration levels. Cronbach's α was 0.80 for Factor 1, 0.77 for Factor 2, and 0.79 for the test as a whole. The CIQ-13 demonstrates validity and reliability in the adult burn survivor population addressing issues of self/family care and social integration. This instrument is useful in future research of community reintegration outcomes in the burn population.
Using R and WinBUGS to fit a Generalized Partial Credit Model for developing and evaluating patient-reported outcomes assessments

PubMed Central

Li, Yuelin; Baser, Ray

2013-01-01

The US Food and Drug Administration recently announced the final guidelines on the development and validation of Patient-Reported Outcomes (PROs) assessments in drug labeling and clinical trials. This guidance paper may boost the demand for new PRO survey questionnaires. Henceforth biostatisticians may encounter psychometric methods more frequently, particularly Item Response Theory (IRT) models to guide the shortening of a PRO assessment instrument. This article aims to provide an introduction on the theory and practical analytic skills in fitting a Generalized Partial Credit Model in IRT (GPCM). GPCM theory is explained first, with special attention to a clearer exposition of the formal mathematics than what is typically available in the psychometric literature. Then a worked example is presented, using self-reported responses taken from the International Personality Item Pool. The worked example contains step-by-step guides on using the statistical languages R and WinBUGS in fitting the GPCM. Finally, the Fisher information function of the GPCM model is derived and used to evaluate, as an illustrative example, the usefulness of assessment items by their information contents. This article aims to encourage biostatisticians to apply IRT models in the re-analysis of existing data and in future research. PMID:22362655
Using R and WinBUGS to fit a generalized partial credit model for developing and evaluating patient-reported outcomes assessments.

PubMed

Li, Yuelin; Baser, Ray

2012-08-15

The US Food and Drug Administration recently announced the final guidelines on the development and validation of patient-reported outcomes (PROs) assessments in drug labeling and clinical trials. This guidance paper may boost the demand for new PRO survey questionnaires. Henceforth, biostatisticians may encounter psychometric methods more frequently, particularly item response theory (IRT) models to guide the shortening of a PRO assessment instrument. This article aims to provide an introduction on the theory and practical analytic skills in fitting a generalized partial credit model (GPCM) in IRT. GPCM theory is explained first, with special attention to a clearer exposition of the formal mathematics than what is typically available in the psychometric literature. Then, a worked example is presented, using self-reported responses taken from the international personality item pool. The worked example contains step-by-step guides on using the statistical languages r and WinBUGS in fitting the GPCM. Finally, the Fisher information function of the GPCM model is derived and used to evaluate, as an illustrative example, the usefulness of assessment items by their information contents. This article aims to encourage biostatisticians to apply IRT models in the re-analysis of existing data and in future research. Copyright © 2012 John Wiley & Sons, Ltd.
Item response theory scoring and the detection of curvilinear relationships.

PubMed

Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

2017-03-01

Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
[Validation and reliability study of the parent concerns about surgery questionnaire: What worries parents?

PubMed

Gironés Muriel, Alberto; Campos Segovia, Ana; Ríos Gómez, Patricia

2018-01-01

The study of mediating variables and psychological responses to child surgery involves the evaluation of both the patient and the parents as regards different stressors. To have a reliable and reproducible valid evaluation tool that assesses the level of paternal involvement in relation to different stressors in the setting of surgery. A self-report questionnaire study was completed by 123 subjects of both sexes, subdivided into 2populations, due to their relationship with the hospital setting. The items were determined by a group of experts and analysed using the Lawshe validity index to determine a first validity of content. Subsequently, the reliability of the tool was determined by an item-re-item analysis of the 2sub-populations. A factorial analysis was performed to analyse the construct validity with the maximum likelihood and rotation of varimax type factors. A questionnaire of paternal concern was offered, consisting of 21 items with a Cronbach coefficient of 0.97, giving good precision and stability. The posterior factor analysis gives an adequate validity to the questionnaire, with the determination of 10 common stressors that cover 74.08% of the common and non-common variance of the questionnaire. The proposed questionnaire is reliable, valid and easy-to-apply and is developed to assess the level of paternal concern about the surgery of a child and to be able to apply measures and programs through the prior assessment of these elements. Copyright © 2016 Asociación Española de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.
Assessing Construct Validity Using Multidimensional Item Response Theory.

ERIC Educational Resources Information Center

Ackerman, Terry A.

The concept of a user-specified validity sector is discussed. The idea of the validity sector combines the work of M. D. Reckase (1986) and R. Shealy and W. Stout (1991). Reckase developed a methodology to represent an item in a multidimensional latent space as a vector. Item vectors are computed using multidimensional item response theory item…
Least Squares Distance Method of Cognitive Validation and Analysis for Binary Items Using Their Item Response Theory Parameters

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2007-01-01

The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…
Mixture Item Response Theory-MIMIC Model: Simultaneous Estimation of Differential Item Functioning for Manifest Groups and Latent Classes

ERIC Educational Resources Information Center

Bilir, Mustafa Kuzey

2009-01-01

This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

PubMed

Peyre, Hugo; Leplège, Alain; Coste, Joël

2011-03-01

Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.
Confirming the cognition of rising scores: Fox and Mitchum (2013) predicts violations of measurement invariance in series completion between age-matched cohorts.

PubMed

Fox, Mark C; Mitchum, Ainsley L

2014-01-01

The trend of rising scores on intelligence tests raises important questions about the comparability of variation within and between time periods. Descriptions of the processes that mediate selection of item responses provide meaningful psychological criteria upon which to base such comparisons. In a recent paper, Fox and Mitchum presented and tested a cognitive theory of rising scores on analogical and inductive reasoning tests that is specific enough to make novel predictions about cohort differences in patterns of item responses for tests such as the Raven's Matrices. In this paper we extend the same proposal in two important ways by (1) testing it against a dataset that enables the effects of cohort to be isolated from those of age, and (2) applying it to two other inductive reasoning tests that exhibit large Flynn effects: Letter Series and Word Series. Following specification and testing of a confirmatory item response model, predicted violations of measurement invariance are observed between two age-matched cohorts that are separated by only 20 years, as members of the later cohort are found to map objects at higher levels of abstraction than members of the earlier cohort who possess the same overall level of ability. Results have implications for the Flynn effect and cognitive aging while underscoring the value of establishing psychological criteria for equating members of distinct groups who achieve the same scores.
Identifying training needs of logging truck drivers using a skill inventory.

PubMed

Carnahan, B J

2004-11-01

The purpose of this research was to determine if the Driver Skill Inventory (DSI) could be used to characterize the self-assessed driving performance of commercial logging truck drivers. The DSI requires respondents to subjectively evaluate their own ability in regard to 15 different driving skills. The DSI responses of 1000 logging truck drivers were collected across three southeastern states. The underlying hypothesis in the current study was that DSI responses of these drivers would have similar reliability and factor structure as those DSI responses collected from non-commercial drivers in previous studies. Factor analysis of the data confirmed this hypothesis. Statistical analysis revealed that low self-ratings on various safety skill items within the DSI inventory were associated with: (1) inconsistency in using seat belts, (2) inconsistency in performing pre-trip inspections on logging trucks, and (3) committing moving violations. Conversely, high self-ratings ratings on various perceptual-motor skill items were associated with these same at-risk behaviors. The perceptual-motor skill items were also positively associated with negative attitudes toward driving regulations and the number of moving violations incurred over a three-year period. Non-parametric statistical analysis revealed that self-assessments were lowest for DSI skills pertaining to controlling one's anger while driving and managing the truck through a skid or slide. Results of the study confirmed that the DSI can be successfully applied to commercial logging truck drivers as part of an overall comprehensive training needs assessment.
The effect of response modality on immediate serial recall in dementia of the Alzheimer type.

PubMed

Macé, Anne-Laure; Ergis, Anne-Marie; Caza, Nicole

2012-09-01

Contrary to traditional models of verbal short-term memory (STM), psycholinguistic accounts assume that temporary retention of verbal materials is an intrinsic property of word processing. Therefore, memory performance will depend on the nature of the STM tasks, which vary according to the linguistic representations they engage. The aim of this study was to explore the effect of response modality on verbal STM performance in individuals with dementia of the Alzheimer Type (DAT), and its relationship with the patients' word-processing deficits. Twenty individuals with mild DAT and 20 controls were tested on an immediate serial recall (ISR) task using the same items across two response modalities (oral and picture pointing) and completed a detailed language assessment. When scoring of ISR performance was based on item memory regardless of item order, a response modality effect was found for all participants, indicating that they recalled more items with picture pointing than with oral response. However, this effect was less marked in patients than in controls, resulting in an interaction. Interestingly, when recall of both item and order was considered, results indicated similar performance between response modalities in controls, whereas performance was worse for pointing than for oral response in patients. Picture-naming performance was also reduced in patients relative to controls. However, in the word-to-picture matching task, a similar pattern of responses was found between groups for incorrectly named pictures of the same items. The finding of a response modality effect in item memory for all participants is compatible with the assumption that semantic influences are greater in picture pointing than in oral response, as predicted by psycholinguistic models. Furthermore, patients' performance was modulated by their word-processing deficits, showing a reduced advantage relative to controls. Overall, the response modality effect observed in this study for item memory suggests that verbal STM performance is intrinsically linked with word processing capacities in both healthy controls and individuals with mild DAT, supporting psycholinguistic models of STM.
A Study of General Education Astronomy Students' Understandings of Cosmology. Part III. Evaluating Four Conceptual Cosmology Surveys: An Item Response Theory Approach

ERIC Educational Resources Information Center

Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.

2012-01-01

This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…

A Comparison of Measurement Equivalence Methods Based on Confirmatory Factor Analysis and Item Response Theory.

ERIC Educational Resources Information Center

Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.

Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
A Polytomous Item Response Theory Analysis of Social Physique Anxiety Scale

ERIC Educational Resources Information Center

Fletcher, Richard B.; Crocker, Peter

2014-01-01

The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…
Item Response Theory at Subject- and Group-Level. Research Report 90-1.

ERIC Educational Resources Information Center

Tobi, Hilde

This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…
The Role of Psychometric Modeling in Test Validation: An Application of Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Schilling, Stephen G.

2007-01-01

In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…
Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

ERIC Educational Resources Information Center

von Davier, Matthias; Sinharay, Sandip

2009-01-01

This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Exploring the Robustness of a Unidimensional Item Response Theory Model with Empirically Multidimensional Data

ERIC Educational Resources Information Center

Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald

2017-01-01

Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…
The Random Response Technique as an Indicator of Questionnaire Item Social Desirability/Personal Sensitivity.

ERIC Educational Resources Information Center

Crino, Michael D.; And Others

1985-01-01

The random response technique was compared to a direct questionnaire, administered to college students, to investigate whether or not the responses predicted the social desirability of the item. Results suggest support for the hypothesis. A 33-item version of the Marlowe-Crowne Social Desirability Scale which was used is included. (GDC)
Evaluation of Internal Construct Validity and Unidimensionality of the Brachial Assessment Tool, A Patient-Reported Outcome Measure for Brachial Plexus Injury.

PubMed

Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea

2016-12-01

To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright Â© 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Applying a Mixed Methods Framework to Differential Item Function Analyses

ERIC Educational Resources Information Center

Hitchcock, John H.; Johanson, George A.

2015-01-01

Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.

PubMed

Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi

2018-04-17

The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.
Development of the health literacy on social determinants of health questionnaire in Japanese adults.

PubMed

Matsumoto, Masayoshi; Nakayama, Kazuhiro

2017-01-06

Health inequities are increasing worldwide, with mounting evidence showing that the greatest cause of which are social determinants of health. To reduce inequities, a lot of citizens need to be able to access, understand, appraise, and apply information on the social determinants; that is, they need to improve health literacy on social determinants of health. However, only a limited number of scales focus on these considerations; hence, we developed the Health Literacy on Social Determinants of Health Questionnaire (HL-SDHQ) and examined its psychometric properties. We extracted domains of the social determinants of health from "the solid facts" and related articles, operationalizing the following ten domains: "the social gradient," "early life," "social exclusion," "work," "unemployment," "social support," "social capital," "addiction," "food," and "transport," Next, we developed the scale items in the ten extracted domains based on the literature and included four aspects of health literacy (ability to access, understand, appraise, and apply social determinants of health-related information) in the items. We also evaluated the ease of response and content validity. The self-administered questionnaire consisted of 33 items. The reliability and construct validity were verified among 831 Japanese adults in an internet survey. The scale items had high reliability with a Cronbach's alpha of 0.92, and also adequate results were obtained for the internal consistency of the information-processing dimensions (Cronbach's alpha values were 0.82, 0.91, 0.84, and 0.92 for accessing, understanding, appraising, and applying, respectively). The goodness of fit by confirmatory factor analysis based on the four dimensions was an acceptable value (comparative fit index = 0.901; root mean square error of approximation = 0.058). Furthermore, the bivariate relationship between HL-SDHQ and the frequency of participation in citizen's activities was similar to the theoretical results. HL-SDHQ clarifies the relationship between the ten domains of the social determinants of health and health in each domain and is able to measure whether it is possible to access, understand, appraise, and apply related information. The reliability and validity of the scale were adequate.
Development of a subjective cognitive decline questionnaire using item response theory: a pilot study.

PubMed

Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L

2015-12-01

Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Analyzing force concept inventory with item response theory

NASA Astrophysics Data System (ADS)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Serial recall of colors: Two models of memory for serial order applied to continuous visual stimuli.

PubMed

Peteranderl, Sonja; Oberauer, Klaus

2018-01-01

This study investigated the effects of serial position and temporal distinctiveness on serial recall of simple visual stimuli. Participants observed lists of five colors presented at varying, unpredictably ordered interitem intervals, and their task was to reproduce the colors in their order of presentation by selecting colors on a continuous-response scale. To control for the possibility of verbal labeling, articulatory suppression was required in one of two experimental sessions. The predictions were derived through simulation from two computational models of serial recall: SIMPLE represents the class of temporal-distinctiveness models, whereas SOB-CS represents event-based models. According to temporal-distinctiveness models, items that are temporally isolated within a list are recalled more accurately than items that are temporally crowded. In contrast, event-based models assume that the time intervals between items do not affect recall performance per se, although free time following an item can improve memory for that item because of extended time for the encoding. The experimental and the simulated data were fit to an interference measurement model to measure the tendency to confuse items with other items nearby on the list-the locality constraint-in people as well as in the models. The continuous-reproduction performance showed a pronounced primacy effect with no recency, as well as some evidence for transpositions obeying the locality constraint. Though not entirely conclusive, this evidence favors event-based models over a role for temporal distinctiveness. There was also a strong detrimental effect of articulatory suppression, suggesting that verbal codes can be used to support serial-order memory of simple visual stimuli.
Mayo-Portland adaptability inventory: comparing psychometrics in cerebrovascular accident to traumatic brain injury.

PubMed

Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon

2012-12-01

(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Reliability and validity of the Dutch version of the Consultation and Relational Empathy Measure in primary care.

PubMed

van Dijk, Inge; Scholten Meilink Lenferink, Nick; Lucassen, Peter L B J; Mercer, Stewart W; van Weel, Chris; Olde Hartman, Tim C; Speckens, Anne E M

2017-02-01

Empathy is an essential skill in doctor-patient communication with positive effects on compliance, patient satisfaction and symptom duration. There are no validated patient-rated empathy measures available in Dutch. To investigate the validity and reliability of a Dutch version of the Consultation and Relational Empathy (CARE) Measure, a widely used 10-item patient-rated questionnaire of physician empathy. After translation and back translation, the Dutch CARE Measure was distributed among patients from 19 general practitioners in 5 primary care centers. Tests of internal reliability and validity included Cronbach's alpha, item total correlations and factor analysis. Seven items of the QUality Of care Through the patient's Eyes (QUOTE) questionnaire assessing 'affective performance' of the physician were included in factor analysis and used to investigate convergent validity. Of the 800 distributed questionnaires, 655 (82%) were returned. Acceptability and face validity were supported by a low number of 'does not apply' responses (range 0.2%-11.9%). Internal reliability was high (Cronbach's alpha 0.974). Corrected item total correlations were at a minimum of 0.837. Factor analysis on the 10 items of the CARE Measure and 7 QUOTE items resulted in two factors (Eigenvalue > 1), the first containing the CARE Measure items and the second containing the QUOTE items. Convergent construct validity between the CARE Measure and QUOTE was confirmed with a modest positive correlation (r = 0.34, n = 654, P < 0.001). The findings support the preliminary validity and reliability of the Dutch CARE Measure. Future research is required to investigate divergent validity and discriminant ability between doctors. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Item Response Theory Models for Performance Decline during Testing

ERIC Educational Resources Information Center

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

ERIC Educational Resources Information Center

Kaskowitz, Gary S.; De Ayala, R. J.

2001-01-01

Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
Standard Errors and Confidence Intervals from Bootstrapping for Ramsay-Curve Item Response Theory Model Item Parameters

ERIC Educational Resources Information Center

Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.

2011-01-01

Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…
Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory.

PubMed

Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd

2017-01-01

The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.

Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory

PubMed Central

Shedden-Mora, Meike C.; Löwe, Bernd

2017-01-01

Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Accounting for Local Dependence with the Rasch Model: The Paradox of Information Increase.

PubMed

Andrich, David

Test theories imply statistical, local independence. Where local independence is violated, models of modern test theory that account for it have been proposed. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation between two items in the dichotomous Rasch model, this paper derives three related implications. First, it formalises how the polytomous Rasch model for an item constituted by summing the scores of the dependent items absorbs the dependence in its threshold structure. Second, it shows that as a consequence the unit when the dependence is accounted for is not the same as if the items had no response dependence. Third, it explains the paradox, known, but not explained in the literature, that the greater the dependence of the constituent items the greater the apparent information in the constituted polytomous item when it should provide less information.
Conducting preference assessments for youth with disorders of consciousness during rehabilitation.

PubMed

Amari, Adrianna; Suskauer, Stacy J; Paasch, Valerie; Grodin, Lauren K; Slomine, Beth S

2017-08-01

Care and rehabilitation for individuals with disorders of consciousness (DOC) can be challenging; the use of observational data collection, individualized treatment programs, and incorporation of preferred, personally meaningful and salient items may be helpful in addressing such challenges during assessment and intervention. In this article, we extend the predominantly adult literature on use of salient items to promote differential responding by describing our methodology to identify preferred items across sensory domains for application during inpatient rehabilitation with children with DOC. Details on the indirect and direct preference assessment procedures rooted in applied behavior analysis that we have tailored for this population are provided. We describe steps of the procedures, including structured caregiver interview, staff survey, item inclusion, in vivo single-item stimulus preference assessment, and treatment. Clinical case examples further illustrate implementation of our methodology, observed response topographies, individually identified preferred items, and their application for 3 children in a minimally conscious state. In addition, we introduce a new structured caregiver interview, the Preference Assessment for Youth with Disorders of Consciousness (PAYDOC), modeled on the Reinforcer Assessment for Individuals with Severe Disabilities (RAISD; Fisher, Piazza, Bowman, & Amari, 1996) and modified to be appropriate for future use as a clinical tool to enhance assessment of preferences with this pediatric brain injury population. This methodology can be used to identify highly idiosyncratic stimuli that can be incorporated in multiple ways throughout rehabilitation to optimize care for youth with DOC. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

ERIC Educational Resources Information Center

Kleinke, David J.

Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
Person Response Functions and the Definition of Units in the Social Sciences

ERIC Educational Resources Information Center

Engelhard, George, Jr.; Perkins, Aminah F.

2011-01-01

Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2016-01-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
Applications of Multidimensional Item Response Theory Models with Covariates to Longitudinal Test Data. Research Report. ETS RR-16-21

ERIC Educational Resources Information Center

Fu, Jianbin

2016-01-01

The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

ERIC Educational Resources Information Center

Tsutakawa, Robert K.; Lin, Hsin Ying

Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model

ERIC Educational Resources Information Center

Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim

2017-01-01

We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
Item response theory analysis of Working Alliance Inventory, revised response format, and new Brief Alliance Inventory.

PubMed

Mallinckrodt, Brent; Tekie, Yacob T

2016-11-01

The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
Measuring Response Styles Across the Big Five: A Multiscale Extension of an Approach Using Multinomial Processing Trees.

PubMed

Khorramdel, Lale; von Davier, Matthias

2014-01-01

This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.
An item response theory evaluation of the young mania rating scale and the montgomery-asberg depression rating scale in the systematic treatment enhancement program for bipolar disorder (STEP-BD).

PubMed

Prisciandaro, James J; Tolliver, Bryan K

2016-11-15

The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
Better assessment of physical function: item improvement is neglected but essential

PubMed Central

2009-01-01

Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.

PubMed

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Optimal Test Design with Rule-Based Item Generation

ERIC Educational Resources Information Center

Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.

2013-01-01

Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

ERIC Educational Resources Information Center

Livingston, Samuel A.; Dorans, Neil J.

2004-01-01

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

ERIC Educational Resources Information Center

Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

2016-01-01

The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

ERIC Educational Resources Information Center

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
The Relationship of Item-Level Response Times with Test-Taker and Item Variables in an Operational CAT Environment. LSAC Research Report Series.

ERIC Educational Resources Information Center

Swygert, Kimberly A.

In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
Item response theory in personality assessment: a demonstration using the MMPI-2 depression scale.

PubMed

Childs, R A; Dahlstrom, W G; Kemp, S M; Panter, A T

2000-03-01

Item response theory (IRT) analyses have, over the past 3 decades, added much to our understanding of the relationships among and characteristics of test items, as revealed in examinees response patterns. Assessment instruments used outside the educational context have only infrequently been analyzed using IRT, however. This study demonstrates the relevance of IRT to personality data through analyses of Scale 2 (the Depression Scale) on the revised Minnesota Multiphasic Personality Inventory (MMPI-2). A rich set of hypotheses regarding the items on this scale, including contrasts among the Harris-Lingoes and Wiener-Harmon subscales and differences in the items measurement characteristics for men and women, are investigated through the IRT analyses.

Measuring pain phenomena after spinal cord injury: Development and psychometric properties of the SCI-QOL Pain Interference and Pain Behavior assessment tools.

PubMed

Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S

2018-05-01

To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Building a Computer Program to Support Children, Parents, and Distraction during Healthcare Procedures

PubMed Central

McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W. Nick; Zimmerman, M. Bridget; Ersig, Anne L.

2012-01-01

This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children’s responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, the Children, Parents and Distraction (CPaD), is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure. PMID:22805121
Reliability and validity of a short form household food security scale in a Caribbean community.

PubMed

Gulliford, Martin C; Mahabir, Deepak; Rocke, Brian

2004-06-16

We evaluated the reliability and validity of the short form household food security scale in a different setting from the one in which it was developed. The scale was interview administered to 531 subjects from 286 households in north central Trinidad in Trinidad and Tobago, West Indies. We evaluated the six items by fitting item response theory models to estimate item thresholds, estimating agreement among respondents in the same households and estimating the slope index of income-related inequality (SII) after adjusting for age, sex and ethnicity. Item-score correlations ranged from 0.52 to 0.79 and Cronbach's alpha was 0.87. Item responses gave within-household correlation coefficients ranging from 0.70 to 0.78. Estimated item thresholds (standard errors) from the Rasch model ranged from -2.027 (0.063) for the 'balanced meal' item to 2.251 (0.116) for the 'hungry' item. The 'balanced meal' item had the lowest threshold in each ethnic group even though there was evidence of differential functioning for this item by ethnicity. Relative thresholds of other items were generally consistent with US data. Estimation of the SII, comparing those at the bottom with those at the top of the income scale, gave relative odds for an affirmative response of 3.77 (95% confidence interval 1.40 to 10.2) for the lowest severity item, and 20.8 (2.67 to 162.5) for highest severity item. Food insecurity was associated with reduced consumption of green vegetables after additionally adjusting for income and education (0.52, 0.28 to 0.96). The household food security scale gives reliable and valid responses in this setting. Differing relative item thresholds compared with US data do not require alteration to the cut-points for classification of 'food insecurity without hunger' or 'food insecurity with hunger'. The data provide further evidence that re-evaluation of the 'balanced meal' item is required.
Computerized Adaptive Testing with Item Clones. Research Report.

ERIC Educational Resources Information Center

Glas, Cees A. W.; van der Linden, Wim J.

To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Slower is not always better: Response-time evidence clarifies the limited role of miserly information processing in the Cognitive Reflection Test

PubMed Central

Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard

2017-01-01

We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Development of the Contact Lens User Experience: CLUE Scales

PubMed Central

Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.

2016-01-01

ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257
The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

PubMed

Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander

2014-04-01

An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
Concreteness effects in short-term memory: a test of the item-order hypothesis.

PubMed

Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald

2011-12-01

The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.
Pattern analysis of total item score and item response of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative sample of US adults

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A.

2017-01-01

Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales. PMID:28289560
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.

PubMed

Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D

2016-12-01

The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

PubMed Central

2013-01-01

Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

PubMed

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M

2013-03-04

Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Measuring the quality of life in hypertension according to Item Response Theory

PubMed Central

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-01-01

ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
Development and Application of Methods for Estimating Operating Characteristics of Discrete Test Item Responses without Assuming any Mathematical Form.

ERIC Educational Resources Information Center

Samejima, Fumiko

In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

PubMed Central

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Validation of a clinical critical thinking skills test in nursing.

PubMed

Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

2015-01-27

The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing

PubMed Central

2015-01-01

Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Cross-informant and cross-national equivalence using item-response theory (IRT) linking: A case study using the behavioral assessment for children of African heritage in the United States and Jamaica.

PubMed

Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T

2016-03-01

Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Translation, adaptation and validation of the American short form Patient Activation Measure (PAM13) in a Danish version.

PubMed

Maindal, Helle Terkildsen; Sokolowski, Ineta; Vedsted, Peter

2009-06-29

The Patient Activation Measure (PAM) is a measure that assesses patient knowledge, skill, and confidence for self-management. This study validates the Danish translation of the 13-item Patient Activation Measure (PAM13) in a Danish population with dysglycaemia. 358 people with screen-detected dysglycaemia participating in a primary care health education study responded to PAM13. The PAM13 was translated into Danish by a standardised forward-backward translation. Data quality was assessed by mean, median, item response, missing values, floor and ceiling effects, internal consistency (Cronbach's alpha and average inter-item correlation) and item-rest correlations. Scale properties were assessed by Rasch Rating Scale models. The item response was high with a small number of missing values (0.8-4.2%). Floor effect was small (range 0.6-3.6%), but the ceiling effect was above 15% for all items (range 18.6-62.7%). The alpha-coefficient was 0.89 and the average inter-item correlation 0.38. The Danish version formed a unidimensional, probabilistic Guttman-like scale explaining 43.2% of the variance. We did however, find a different item sequence compared to the original scale. A Danish version of PAM13 with acceptable validity and reliability is now available. Further development should focus on single items, response categories in relation to ceiling effects and further validation of reproducibility and responsiveness.
Detection of Differential Item Functioning Using the Lasso Approach

ERIC Educational Resources Information Center

Magis, David; Tuerlinckx, Francis; De Boeck, Paul

2015-01-01

This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…

Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

PubMed

Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

2016-03-12

Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Sequential Computerized Mastery Tests--Three Simulation Studies

ERIC Educational Resources Information Center

Wiberg, Marie

2006-01-01

A simulation study of a sequential computerized mastery test is carried out with items modeled with the 3 parameter logistic item response theory model. The examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The…
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data.

PubMed

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L J; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two 'one-process' models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a 'two-process' model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses.
What can we learn from PISA?: Investigating PISA's approach to scientific literacy

NASA Astrophysics Data System (ADS)

Schwab, Cheryl Jean

This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Optimization of Contrast Detection Power with Probabilistic Behavioral Information

PubMed Central

Cordes, Dietmar; Herzmann, Grit; Nandy, Rajesh; Curran, Tim

2012-01-01

Recent progress in the experimental design for event-related fMRI experiments made it possible to find the optimal stimulus sequence for maximum contrast detection power using a genetic algorithm. In this study, a novel algorithm is proposed for optimization of contrast detection power by including probabilistic behavioral information, based on pilot data, in the genetic algorithm. As a particular application, a recognition memory task is studied and the design matrix optimized for contrasts involving the familiarity of individual items (pictures of objects) and the recollection of qualitative information associated with the items (left/right orientation). Optimization of contrast efficiency is a complicated issue whenever subjects’ responses are not deterministic but probabilistic. Contrast efficiencies are not predictable unless behavioral responses are included in the design optimization. However, available software for design optimization does not include options for probabilistic behavioral constraints. If the anticipated behavioral responses are included in the optimization algorithm, the design is optimal for the assumed behavioral responses, and the resulting contrast efficiency is greater than what either a block design or a random design can achieve. Furthermore, improvements of contrast detection power depend strongly on the behavioral probabilities, the perceived randomness, and the contrast of interest. The present genetic algorithm can be applied to any case in which fMRI contrasts are dependent on probabilistic responses that can be estimated from pilot data. PMID:22326984
Threats to Validity When Using Open-Ended Items in International Achievement Studies: Coding Responses to the PISA 2012 Problem-Solving Test in Finland

ERIC Educational Resources Information Center

Arffman, Inga

2016-01-01

Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

ERIC Educational Resources Information Center

Cao, Yi; Lu, Ru; Tao, Wei

2014-01-01

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Kernel-Smoothing Estimation of Item Characteristic Functions for Continuous Personality Items: An Empirical Comparison with the Linear and the Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere J.

2004-01-01

This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
Innovative Application of a Multidimensional Item Response Model in Assessing the Influence of Social Desirability on the Pseudo-Relationship between Self-Efficacy and Behavior

ERIC Educational Resources Information Center

Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.

2006-01-01

This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…
The Structure of the Narcissistic Personality Inventory With Binary and Rating Scale Items.

PubMed

Boldero, Jennifer M; Bell, Richard C; Davies, Richard C

2015-01-01

Narcissistic Personality Inventory (NPI) items typically have a forced-choice format, comprising a narcissistic and a nonnarcissistic statement. Recently, some have presented the narcissistic statements and asked individuals to either indicate whether they agree or disagree that the statements are self-descriptive (i.e., a binary response format) or to rate the extent to which they agree or disagree that these statements are self-descriptive on a Likert scale (i.e., a rating response format). The current research demonstrates that when NPI items have a binary or a rating response format, the scale has a bifactor structure (i.e., the items load on a general factor and on 6 specific group factors). Indexes of factor strength suggest that the data are unidimensional enough for the NPI's general factor to be considered a measure of a narcissism latent trait. However, the rating item general factor assessed more narcissism components than the binary item one. The positive correlations of the NPI's general factor, assessed when items have a rating response format, were moderate with self-esteem, strong with a measure of narcissistic grandiosity, and weak with 2 measures of narcissistic vulnerability. Together, the results suggest that using a rating format for items enhances the information provided by the NPI.
Military medical graduates' perceptions of organizational culture in Turkish military medical school.

PubMed

Ozer, Mustafa; Bakir, Bilal; Teke, Abdulkadir; Ucar, Muharrem; Bas, Turker; Atac, Adnan

2008-08-01

Organizational culture is the term used to describe the shared beliefs, perceptions, and expectations of individuals in organizations. In the healthcare environment, organizational culture has been associated with several elements of organizational experience that contribute to quality, such as nursing care, job satisfaction, and patient safety. A range of tools have been designed to measure organizational culture and applied in industrial, educational, and health care settings. This study has been conducted to investigate the perceptions of military medical graduates on organizational culture at Gülhane Military Medical School. A measurement of organizational culture, which was developed by the researchers from Akdeniz University, was applied to all military medical graduates in 2004. This was a Likert type scale that included 31 items. Designers of the measurement grouped all these items into five main dimensions in their previous study. The items were scored on a five-point scale anchored by 1: strongly agree and 5: strongly disagree. Study participants included all military physicians who were in clerkship training period at Gulhane Military Medical Academy in 2004. A total of 106 graduates were accepted to response the questionnaire. The mean age of participants was 25.2 +/- 1.1. At the time of study only 8 (7.5%) graduates were married. The study results have showed that the measurement tool with 31 items had a sufficient reliability with a Cronbach's alpha value of 0.91. Factor analysis has resulted a final measurement tool of 24 items with five factors. Total score and the scores of five subdimensions have been estimated and compared between groups based on living city and marital status. The study has shown the dimension of symbol received positive perceptions while the dimension of organizational structure and efficiency received the most negative perceptions. GMMS has a unique organizational culture with its weak and strong aspects. Conducting this kind of studies contribute to improve organizational culture in order to increase educational and research capability.
Use of Item Parceling in Structural Equation Modeling with Missing Data

ERIC Educational Resources Information Center

Orcan, Fatih

2013-01-01

Parceling is referred to as a procedure for computing sums or average scores across multiple items. Parcels instead of individual items are then used as indicators of latent factors in the structural equation modeling analysis (Bandalos 2002, 2008; Little et al., 2002; Yang, Nay, & Hoyle, 2010). Item parceling may be applied to alleviate some…
Identifying Differential Item Functioning of Rating Scale Items with the Rasch Model: An Introduction and an Application

ERIC Educational Resources Information Center

Myers, Nicholas D.; Wolfe, Edward W.; Feltz, Deborah L.; Penfield, Randall D.

2006-01-01

This study (a) provided a conceptual introduction to differential item functioning (DIF), (b) introduced the multifaceted Rasch rating scale model (MRSM) and an associated statistical procedure for identifying DIF in rating scale items, and (c) applied this procedure to previously collected data from American coaches who responded to the coaching…
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

ERIC Educational Resources Information Center

Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

2013-01-01

Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
An Approach to Biased Item Identification Using Latent Trait Measurement Theory.

ERIC Educational Resources Information Center

Rudner, Lawrence M.

Because it is a true score model employing item parameters which are independent of the examined sample, item characteristic curve theory (ICC) offers several advantages over classical measurement theory. In this paper an approach to biased item identification using ICC theory is described and applied. The ICC theory approach is attractive in that…
Item Banks for Substance Use from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Severity of Use and Positive Appeal of Use*

PubMed Central

Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis

2015-01-01

Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Practical Guide to Conducting an Item Response Theory Analysis

ERIC Educational Resources Information Center

Toland, Michael D.

2014-01-01

Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…
Item Construction and Psychometric Models Appropriate for Constructed Responses

DTIC Science & Technology

1991-08-01

which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Different Approaches to Covariate Inclusion in the Mixture Rasch Model

ERIC Educational Resources Information Center

Li, Tongyun; Jiao, Hong; Macready, George B.

2016-01-01

The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…

Robust Estimation of Latent Ability in Item Response Models

ERIC Educational Resources Information Center

Schuster, Christof; Yuan, Ke-Hai

2011-01-01

Because of response disturbances such as guessing, cheating, or carelessness, item response models often can only approximate the "true" individual response probabilities. As a consequence, maximum-likelihood estimates of ability will be biased. Typically, the nature and extent to which response disturbances are present is unknown, and, therefore,…
Theoretical and Empirical Comparisons between Two Models for Continuous Item Responses.

ERIC Educational Resources Information Center

Ferrando, Pere J.

2002-01-01

Analyzed the relations between two continuous response models intended for typical response items: the linear congeneric model and Samejima's continuous response model (CRM). Illustrated the relations described using an empirical example and assessed the relations through a simulation study. (SLD)
Measuring subjective response to aircraft noise: the effects of survey context.

PubMed

Kroesen, Maarten; Molin, Eric J E; van Wee, Bert

2013-01-01

In applied research, noise annoyance is often used as indicator of subjective reaction to aircraft noise in residential areas. The present study aims to show that the meaning which respondents attach to the concept of aircraft noise annoyance is partly a function of survey context. To this purpose a survey is conducted among residents living near Schiphol Airport, the largest airport in the Netherlands. In line with the formulated hypotheses it is shown that different sets of preceding questionnaire items influence the response distribution of aircraft noise annoyance as well as the correlational patterns between aircraft noise annoyance and other relevant scales.
Response to Germann's "Comment on 'theory for source-responsive and free-surface film modeling of unsaturated flow'"

USGS Publications Warehouse

Nimmo, J.R.

2010-01-01

Germann's (2010) comment helpfully presents supporting evidence that I have missed, notes items that need clarification or correction, and stimulates discussion of what is needed for improved theory of unsaturated flow. Several points from this comment relate not only to specific features of the content of my paper (Nimmo, 2010), but also to the broader question of what methodology is appropriate for developing an applied earth science. Accordingly, before addressing specific points that Germann identified, I present here some considerations of purpose and background relevant to evaluation of the unsaturated flow model of Nimmo (2010).
On the dynamic nature of response criterion in recognition memory: effects of base rate, awareness, and feedback.

PubMed

Rhodes, Matthew G; Jacoby, Larry L

2007-03-01

The authors examined whether participants can shift their criterion for recognition decisions in response to the probability that an item was previously studied. Participants in 3 experiments were given recognition tests in which the probability that an item was studied was correlated with its location during the test. Results from all 3 experiments indicated that participants' response criteria were sensitive to the probability that an item was previously studied and that shifts in criterion were robust. In addition, awareness of the bases for criterion shifts and feedback on performance were key factors contributing to the observed shifts in decision criteria. These data suggest that decision processes can operate in a dynamic fashion, shifting from item to item.
Item Banking. ERIC/AE Digest.

ERIC Educational Resources Information Center

Rudner, Lawrence

This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…
An item-response theory approach to safety climate measurement: The Liberty Mutual Safety Climate Short Scales.

PubMed

Huang, Yueng-Hsiang; Lee, Jin; Chen, Zhuo; Perry, MacKenna; Cheung, Janelle H; Wang, Mo

2017-06-01

Zohar and Luria's (2005) safety climate (SC) scale, measuring organization- and group- level SC each with 16 items, is widely used in research and practice. To improve the utility of the SC scale, we shortened the original full-length SC scales. Item response theory (IRT) analysis was conducted using a sample of 29,179 frontline workers from various industries. Based on graded response models, we shortened the original scales in two ways: (1) selecting items with above-average discriminating ability (i.e. offering more than 6.25% of the original total scale information), resulting in 8-item organization-level and 11-item group-level SC scales; and (2) selecting the most informative items that together retain at least 30% of original scale information, resulting in 4-item organization-level and 4-item group-level SC scales. All four shortened scales had acceptable reliability (≥0.89) and high correlations (≥0.95) with the original scale scores. The shortened scales will be valuable for academic research and practical survey implementation in improving occupational safety. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Factors that influence search termination decisions in free recall: an examination of response type and confidence.

PubMed

Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J

2011-09-01

In three experiments search termination decisions were examined as a function of response type (correct vs. incorrect) and confidence. It was found that the time between the last retrieved item and the decision to terminate search (exit latency) was related to the type of response and confidence in the last item retrieved. Participants were willing to search longer when the last retrieved item was a correct item vs. an incorrect item and when the confidence was high in the last retrieved item. It was also found that the number of errors retrieved during the recall period was related to search termination decisions such that the more errors retrieved, the more likely participants were to terminate the search. Finally, it was found that knowledge of overall search set size influenced the time needed to search for items, but did not influence search termination decisions. Copyright © 2011 Elsevier B.V. All rights reserved.
Influence of Skip Patterns on Item Non-Response in a Substance Use Survey of 7th to 12th Grade Students

ERIC Educational Resources Information Center

Ding, Kele; Olds, R. Scott; Thombs, Dennis L.

2009-01-01

This retrospective case study assessed the influence of item non-response error on subsequent response to questionnaire items assessing adolescent alcohol and marijuana use. Post-hoc analyses were conducted on survey results obtained from 4,371 7th to 12th grade students in Ohio in 2005. A skip pattern design in a conventional questionnaire…
Using a Multivariate Multilevel Polytomous Item Response Theory Model to Study Parallel Processes of Change: The Dynamic Association between Adolescents' Social Isolation and Engagement with Delinquent Peers in the National Youth Survey

ERIC Educational Resources Information Center

Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.

2010-01-01

The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

ERIC Educational Resources Information Center

Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.

2003-01-01

Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Validity test of the IPD-Work consortium approach for creating comparable job strain groups between Job Content Questionnaire and Demand-Control Questionnaire.

PubMed

Choi, Bongkyoo; Ko, Sangbaek; Ostergren, Per-Olof

2015-01-01

This study aims to test the validity of the IPD-Work Consortium approach for creating comparable job strain groups between the Job Content Questionnaire (JCQ) and the Demand-Control Questionnaire (DCQ). A random population sample (N = 682) of all middle-aged Malmö males and females was given a questionnaire with the 14-item JCQ and 11-item DCQ for the job control and job demands. The JCQ job control and job demands scores were calculated in 3 different ways: using the 14-item JCQ standard scale formulas (method 1); dropping 3 job control items and using the 11-item JCQ standard scale formulas with additional scale weights (method 2); and the approach of the IPD Group (method 3), dropping 3 job control items, but using the simple 11-item summation-based scale formulas. The high job strain was defined as a combination of high demands and low control. Between the 2 questionnaires, false negatives for the high job strain were much greater than false positives (37-49% vs. 7-13%). When the method 3 was applied, the sensitivity of the JCQ for the high job strain against the DCQ was lowest (0.51 vs. 0.60-0.63 when the methods 1 and 2 were applied), although the specificity was highest (0.93 vs. 0.87-0.89 when the methods 1 and 2 were applied). The prevalence of the high job strain with the JCQ (the method 3 was applied) was considerably lower (4-7%) than with the JCQ (the methods 1 and 2 were applied) and the DCQ. The number of congruent cases for the high job strain between the 2 questionnaires was smallest when the method 3 was applied. The IPD-Work Consortium approach showed 2 major weaknesses to be used for epidemiological studies on the high job strain and health outcomes as compared to the standard JCQ methods: the greater misclassification of the high job strain and lower prevalence of the high job strain. This work is available in Open Access model and licensed under a CC BY-NC 3.0 PL license.
Converging evidence for control of color-word Stroop interference at the item level.

PubMed

Bugg, Julie M; Hutchison, Keith A

2013-04-01

Prior studies have shown that cognitive control is implemented at the list and context levels in the color-word Stroop task. At first blush, the finding that Stroop interference is reduced for mostly incongruent items as compared with mostly congruent items (i.e., the item-specific proportion congruence [ISPC] effect) appears to provide evidence for yet a third level of control, which modulates word reading at the item level. However, evidence to date favors the view that ISPC effects reflect the rapid prediction of high-contingency responses and not item-specific control. In Experiment 1, we first show that an ISPC effect is obtained when the relevant dimension (i.e., color) signals proportion congruency, a problematic pattern for theories based on differential response contingencies. In Experiment 2, we replicate and extend this pattern by showing that item-specific control settings transfer to new stimuli, ruling out alternative frequency-based accounts. In Experiment 3, we revert to the traditional design in which the irrelevant dimension (i.e., word) signals proportion congruency. Evidence for item-specific control, including transfer of the ISPC effect to new stimuli, is apparent when 4-item sets are employed but not when 2-item sets are employed. We attribute this pattern to the absence of high-contingency responses on incongruent trials in the 4-item set. These novel findings provide converging evidence for reactive control of color-word Stroop interference at the item level, reveal theoretically important factors that modulate reliance on item-specific control versus contingency learning, and suggest an update to the item-specific control account (Bugg, Jacoby, & Chanani, 2011).
A semi-parametric within-subject mixture approach to the analyses of responses and response times.

PubMed

Molenaar, Dylan; Bolsinova, Maria; Vermunt, Jeroen K

2018-05-01

In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach. © 2017 The British Psychological Society.
Differential item functioning magnitude and impact measures from item response theory models.

PubMed

Kleinman, Marjorie; Teresi, Jeanne A

2016-01-01

Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Modeling motivated misreports to sensitive survey questions.

PubMed

Böckenholt, Ulf

2014-07-01

Asking sensitive or personal questions in surveys or experimental studies can both lower response rates and increase item non-response and misreports. Although non-response is easily diagnosed, misreports are not. However, misreports cannot be ignored because they give rise to systematic bias. The purpose of this paper is to present a modeling approach that identifies misreports and corrects for them. Misreports are conceptualized as a motivated process under which respondents edit their answers before they report them. For example, systematic bias introduced by overreports of socially desirable behaviors or underreports of less socially desirable ones can be modeled, leading to more-valid inferences. The proposed approach is applied to a large-scale experimental study and shows that respondents who feel powerful tend to overclaim their knowledge.
Using Response-Time Constraints in Item Selection To Control for Differential Speededness in Computerized Adaptive Testing. LSAC Research Report Series.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.

This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
The Spanish version of the Self-Determination Inventory Student Report: application of item response theory to self-determination measurement.

PubMed

Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A

2018-04-01

A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Detecting When “Quality of Life” Has Been “Enhanced”: Estimating Change in Quality of Life Ratings

PubMed Central

Tractenberg, Rochelle E.; Yumoto, Futoshi; Aisen, Paul S.

2015-01-01

Objective To demonstrate challenges in the estimation of change in quality of life (QOL). Methods Data were taken from a completed clinical trial with negative results. Responses to 13 QOL items were obtained 12 months apart from 258 persons with Alzheimer’s disease (AD) participating in a randomized, placebo-controlled clinical trial with two treatment arms. Two analyses to estimate whether “change” in QOL occurred over 12 months are described. A simple difference (later - earlier) was calculated from total scores (standard approach). A Qualified Change algorithm (novel approach) was applied to each item: differences in ratings were classified as either: improved, worsened, stayed poor, or stayed “positive” (fair, good, excellent). The strengths of evidence supporting a claim that “QOL changed”, derived from the two analyses, were compared by considering plausible alternative explanations for, and interpretations of, results obtained under each approach. Results Total score approach: QOL total scores decreased, on average, in the two treatment (both −1.0, p < 0.05), but not the placebo (=−0.59, p > 0.3) groups. Qualified change approach: Roughly 60% of all change in QOL items was worsening in every arm; 17% - 42% of all subjects experienced change in each item. Conclusions Totalling the subjective QOL item ratings collapses over items, and suggests a potentially misleading “overall” level of change (or no change, as in the placebo arm). Leaving the items as individual components of “quality” of life they were intended to capture, and qualifying the direction and amount of change in each, suggests that at least 17% of any group experienced change on every item, with 60% of all observed change being worsening. Discussion Summarizing QOL item ratings as a total “score” collapses over the face-valid, multi-dimensional components of the construct “quality of life”. Qualified Change provides robust evidence of changes to QOL or “enhancements of” life quality. PMID:26213645
Scale refinement and initial evaluation of a behavioral health function measurement tool for work disability evaluation.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M

2013-09-01

To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Cross-sectional survey followed by IRT calibration data simulations. Community. Sample of individuals applying for Social Security Administration disability benefits: claimants (n=1015) and a normative comparative sample of U.S. adults (n=1000). None. SSA-BH measurement instrument. IRT analyses supported the unidimensionality of 4 SSA-BH scales: mood and emotions (35 items), self-efficacy (23 items), social interactions (6 items), and behavioral control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10-item computer adaptive tests with the full item bank indicated robust ability of the computer adaptive testing approach to comprehensively characterize behavioral health function along 4 distinct dimensions. Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all 4 scales. Behavioral function profiles of Social Security Administration claimants were generated and compared with age- and sex-matched norms along 4 scales: mood and emotions, behavioral control, social interactions, and self-efficacy. Using the computer adaptive test-based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the Social Security Administration's work disability programs. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

PubMed

Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

2016-12-01

The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
A dimensional approach to understanding severity estimates and risk correlates of marijuana abuse and dependence in adults

PubMed Central

WU, LI-TZY; WOODY, GEORGE E.; YANG, CHONGMING; PAN, JENG-JONG; REEVE, BRYCE B.; BLAZER, DAN G.

2012-01-01

While item response theory (IRT) research shows a latent severity trait underlying response patterns of substance abuse and dependence symptoms, little is known about IRT-based severity estimates in relation to clinically relevant measures. In response to increased prevalences of marijuana-related treatment admissions, an elevated level of marijuana potency, and the debate on medical marijuana use, we applied dimensional approaches to understand IRT-based severity estimates for marijuana use disorders (MUDs) and their correlates while simultaneously considering gender- and race/ethnicity-related differential item functioning (DIF). Using adult data from the 2008 National Survey on Drug Use and Health (N=37,897), DSM-IV criteria for MUDs among past-year marijuana users were examined by IRT, logistic regression, and multiple indicators–multiple causes (MIMIC) approaches. Among 6,917 marijuana users, 15% met criteria for a MUD; another 24% exhibited subthreshold dependence. Abuse criteria were highly correlated with dependence criteria (correlation=0.90), indicating unidimensionality; item information curves revealed redundancy in multiple criteria. MIMIC analyses showed that MUD criteria were positively associated with weekly marijuana use, early marijuana use, other substance use disorders, substance abuse treatment, and serious psychological distress. African Americans and Hispanics showed higher levels of MUDs than whites, even after adjusting for race/ethnicity-related DIF. The redundancy in multiple criteria suggests an opportunity to improve efficiency in measuring symptom-level manifestations by removing low-informative criteria. Elevated rates of MUDs among African Americans and Hispanics require research to elucidate risk factors and improve assessments of MUDs for different racial/ethnic groups. PMID:22351489
Reliability, Validity, and Predictive Utility of the 25-Item Criminogenic Cognitions Scale (CCS).

PubMed

Tangney, June Price; Stuewig, Jeffrey; Furukawa, Emi; Kopelovich, Sarah; Meyer, Patrick; Cosby, Brandon

2012-10-01

Theory, research, and clinical reports suggest that moral cognitions play a role in initiating and sustaining criminal behavior. The 25 item Criminogenic Cognitions Scale (CCS) was designed to tap 5 dimensions: Notions of entitlement; Failure to Accept Responsibility; Short-Term Orientation; Insensitivity to Impact of Crime; and Negative Attitudes Toward Authority. Results from 552 jail inmates support the reliability, validity, and predictive utility of the measure. The CCS was linked to criminal justice system involvement, self-report measures of aggression, impulsivity, and lack of empathy. Additionally, the CCS was associated with violent criminal history, antisocial personality, and clinicians' ratings of risk for future violence and psychopathy (PCL:SV). Furthermore, criminogenic thinking upon incarceration predicted subsequent official reports of inmate misconduct during incarceration. CCS scores varied somewhat by gender and race. Research and applied uses of CCS are discussed.
A new IRT-based standard setting method: application to eCat-listening.

PubMed

García, Pablo Eduardo; Abad, Francisco José; Olea, Julio; Aguado, David

2013-01-01

Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.
An outbreak of Cyclospora infection on a cruise ship.

PubMed

Gibbs, R A; Nanyonjo, R; Pingault, N M; Combs, B G; Mazzucchelli, T; Armstrong, P; Tarling, G; Dowse, G K

2013-03-01

In 2010, an outbreak of cyclosporiasis affected passengers and crew on two successive voyages of a cruise ship that departed from and returned to Fremantle, Australia. There were 73 laboratory-confirmed and 241 suspected cases of Cyclospora infection reported in passengers and crew from the combined cruises. A case-control study performed in crew members found that illness was associated with eating items of fresh produce served onboard the ship, but the study was unable conclusively to identify the responsible food(s). It is likely that one or more of the fresh produce items taken onboard at a south-east Asian port during the first cruise was contaminated. If fresh produce supplied to cruise ships is sourced from countries or regions where Cyclospora is endemic, robust standards of food production and hygiene should be applied to the supply chain.
Using Data Augmentation and Markov Chain Monte Carlo for the Estimation of Unfolding Response Models

ERIC Educational Resources Information Center

Johnson, Matthew S.; Junker, Brian W.

2003-01-01

Unfolding response models, a class of item response theory (IRT) models that assume a unimodal item response function (IRF), are often used for the measurement of attitudes. Verhelst and Verstralen (1993)and Andrich and Luo (1993) independently developed unfolding response models by relating the observed responses to a more common monotone IRT…
A Study of Bayesian Estimation and Comparison of Response Time Models in Item Response Theory

ERIC Educational Resources Information Center

Suh, Hongwook

2010-01-01

Response time has been regarded as an important source for investigating the relationship between human performance and response speed. It is important to examine the relationship between response time and item characteristics, especially in the perspective of the relationship between response time and various factors that affect examinee's…
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Fitting measurement models to vocational interest data: are dominance models ideal?

PubMed

Tay, Louis; Drasgow, Fritz; Rounds, James; Williams, Bruce A

2009-09-01

In this study, the authors examined the item response process underlying 3 vocational interest inventories: the Occupational Preference Inventory (C.-P. Deng, P. I. Armstrong, & J. Rounds, 2007), the Interest Profiler (J. Rounds, T. Smith, L. Hubert, P. Lewis, & D. Rivkin, 1999; J. Rounds, C. M. Walker, et al., 1999), and the Interest Finder (J. E. Wall & H. E. Baker, 1997; J. E. Wall, L. L. Wise, & H. E. Baker, 1996). Item response theory (IRT) dominance models, such as the 2-parameter and 3-parameter logistic models, assume that item response functions (IRFs) are monotonically increasing as the latent trait increases. In contrast, IRT ideal point models, such as the generalized graded unfolding model, have IRFs that peak where the latent trait matches the item. Ideal point models are expected to fit better because vocational interest inventories ask about typical behavior, as opposed to requiring maximal performance. Results show that across all 3 interest inventories, the ideal point model provided better descriptions of the response process. The importance of specifying the correct item response model for precise measurement is discussed. In particular, scores computed by a dominance model were shown to be sometimes illogical: individuals endorsing mostly realistic or mostly social items were given similar scores, whereas scores based on an ideal point model were sensitive to which type of items respondents endorsed.
CROSS-NATIONAL APPLICABILITY OF A PARSIMONIOUS MEASURE OF ACCULTURATION TO GLOBAL CONSUMER CULTURE.

PubMed

Durvasula, Srinivas; Lysonski, Steven

2015-06-01

Cleveland and Laroche presented a scale to measure Acculturation to Global Consumer Culture. This measure was the first attempt to gauge consumer mindsets regarding their adaptation to global consumerism. Because this scale consisted of 57 scale items, applying such a lengthy scale can lead to response fatigue. Past research has also suggested that as more items are added to a scale, the informational value of each additional item is marginal. As an alternative, a shorter version of the Acculturation to Global Consumer Culture Scale is presented. The psychometric properties of this scale were verified via multiple group confirmatory factor analysis. A four-country investigation of young adults in China (n = 126; M age = 22.24 yr., SD = 3.63), New Zealand (n = 196; M age = 20.12 yr., SD = 4.12), Nigeria (n = 146; M age = 23.09 yr., SD = 3.80), and the United States (n = 120; M age = 21.67 yr., SD = 4.26) provides support for the cross-national applicability of the proposed parsimonious measure. Limitations and extensions are discussed.
Volume 42, Issue5 (May 2005)Articles in the Current Issue:Developmental growth in students' concept of energy: Analysis of selected items from the TIMSS database

NASA Astrophysics Data System (ADS)

Liu, Xiufeng; McKeough, Anne

2005-05-01

The aim of this study was to develop a model of students' energy concept development. Applying Case's (1985, 1992) structural theory of cognitive development, we hypothesized that students' concept of energy undergoes a series of transitions, corresponding to systematic increases in working memory capacity. The US national sample from the Third International Mathematics and Science Study (TIMSS) database was used to test our hypothesis. Items relevant to the energy concept in the TIMSS test booklets for three populations were identified. Item difficulty from Rasch modeling was used to test the hypothesized developmental sequence, and percentage of students' correct responses was used to test the correspondence between students' age/grade level and level of the energy concepts. The analysis supported our hypothesized sequence of energy concept development and suggested mixed effects of maturation and schooling on energy concept development. Further, the results suggest that curriculum and instruction design take into consideration the developmental progression of students' concept of energy.
26 CFR 1.1312-1 - Double inclusion of an item of gross income.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 26 Internal Revenue 11 2014-04-01 2014-04-01 false Double inclusion of an item of gross income. 1... Limitations § 1.1312-1 Double inclusion of an item of gross income. (a) Paragraph (1) of section 1312 applies if the determination requires the inclusion in a taxpayer's gross income of an item which was...
26 CFR 1.1312-1 - Double inclusion of an item of gross income.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 26 Internal Revenue 11 2013-04-01 2013-04-01 false Double inclusion of an item of gross income. 1... Limitations § 1.1312-1 Double inclusion of an item of gross income. (a) Paragraph (1) of section 1312 applies if the determination requires the inclusion in a taxpayer's gross income of an item which was...
26 CFR 1.1312-1 - Double inclusion of an item of gross income.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 26 Internal Revenue 11 2012-04-01 2012-04-01 false Double inclusion of an item of gross income. 1... Limitations § 1.1312-1 Double inclusion of an item of gross income. (a) Paragraph (1) of section 1312 applies if the determination requires the inclusion in a taxpayer's gross income of an item which was...
A Comparison of Methods of Vertical Equating.

ERIC Educational Resources Information Center

Loyd, Brenda H.; Hoover, H. D.

Rasch model vertical equating procedures were applied to three mathematics computation tests for grades six, seven, and eight. Each level of the test was composed of 45 items in three sets of 15 items, arranged in such a way that tests for adjacent grades had two sets (30 items) in common, and the sixth and eighth grades had 15 items in common. In…
26 CFR 1.1312-1 - Double inclusion of an item of gross income.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 26 Internal Revenue 11 2010-04-01 2010-04-01 true Double inclusion of an item of gross income. 1....1312-1 Double inclusion of an item of gross income. (a) Paragraph (1) of section 1312 applies if the determination requires the inclusion in a taxpayer's gross income of an item which was erroneously included in...
26 CFR 1.1312-1 - Double inclusion of an item of gross income.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 26 Internal Revenue 11 2011-04-01 2011-04-01 false Double inclusion of an item of gross income. 1... Limitations § 1.1312-1 Double inclusion of an item of gross income. (a) Paragraph (1) of section 1312 applies if the determination requires the inclusion in a taxpayer's gross income of an item which was...
Update on the Child's Challenging Behaviour Scale following evaluation using Rasch analysis.

PubMed

Bourke-Taylor, H M; Pallant, J F; Law, M

2014-03-01

The Child's Challenging Behaviour Scale (CCBS) was designed to measure a mother's rating of her child's challenging behaviours. The CCBS was initially developed for mothers of school-aged children with developmental disability and has previously been shown to have good psychometric properties using classical test theory techniques. The aim of this study was to use Rasch analysis to fully evaluate all aspects of the scale, including response format, item fit, dimensionality and targeting. The sample consisted of 152 mothers of a school-aged child (aged 5-18 years) with a disability. Mothers were recruited via websites and mail-out newsletters through not-for-profit organizations that supported families with disabilities. Respondents completed a survey which included the 11 items of the CCBS. Rasch analysis was conducted on these responses using the RUMM2030 package. Rasch analysis of the CCBS revealed serious threshold disordering for nine of the 11 items, suggesting problems with the 5-point response format used for the scale. The neutral midpoint of the response format was subsequently removed to create a 4-point scale. High levels of local dependency were detected among two pairs of items, resulting in the removal of two items (item 7 and item 1). The final nine-item version of the scale (CCBS Version 2) was unidimensional, well targeted, showed good fit to the Rasch model, and strong internal consistency. To achieve fit to the Rasch model it was necessary to make two modifications to the CCBS scale. The resulting nine-item scale with a 4-point response format showed excellent psychometric properties, supporting its internal validity. © 2013 John Wiley & Sons Ltd.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

ERIC Educational Resources Information Center

Guo, Rui; Zheng, Yi; Chang, Hua-Hua

2015-01-01

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Optimal Item Selection with Credentialing Examinations.

ERIC Educational Resources Information Center

Hambleton, Ronald K.; And Others

The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.