Sample records for item parameter estimation

  1. Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

    PubMed

    Waller, Niels G; Feuerstahler, Leah

    2017-01-01

    In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).

  2. Bootstrap Standard Errors for Maximum Likelihood Ability Estimates When Item Parameters Are Unknown

    ERIC Educational Resources Information Center

    Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi

    2014-01-01

    When item parameter estimates are used to estimate the ability parameter in item response models, the standard error (SE) of the ability estimate must be corrected to reflect the error carried over from item calibration. For maximum likelihood (ML) ability estimates, a corrected asymptotic SE is available, but it requires a long test and the…

  3. Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

    ERIC Educational Resources Information Center

    Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

    2013-01-01

    Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…

  4. Developing an Interpretation of Item Parameters for Personality Items: Content Correlates of Parameter Estimates.

    ERIC Educational Resources Information Center

    Zickar, Michael J.; Ury, Karen L.

    2002-01-01

    Attempted to relate content features of personality items to item parameter estimates from the partial credit model of E. Muraki (1990) by administering the Adjective Checklist (L. Goldberg, 1992) to 329 undergraduates. As predicted, the discrimination parameter was related to the item subtlety ratings of personality items but the level of word…

  5. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

    PubMed Central

    Michaelides, Michalis P.

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230

  6. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

    PubMed

    Michaelides, Michalis P

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  7. Investigating the Impact of Uncertainty about Item Parameters on Ability Estimation

    ERIC Educational Resources Information Center

    Zhang, Jinming; Xie, Minge; Song, Xiaolan; Lu, Ting

    2011-01-01

    Asymptotic expansions of the maximum likelihood estimator (MLE) and weighted likelihood estimator (WLE) of an examinee's ability are derived while item parameter estimators are treated as covariates measured with error. The asymptotic formulae present the amount of bias of the ability estimators due to the uncertainty of item parameter estimators.…

  8. Sample Size and Item Parameter Estimation Precision When Utilizing the One-Parameter "Rasch" Model

    ERIC Educational Resources Information Center

    Custer, Michael

    2015-01-01

    This study examines the relationship between sample size and item parameter estimation precision when utilizing the one-parameter model. Item parameter estimates are examined relative to "true" values by evaluating the decline in root mean squared deviation (RMSD) and the number of outliers as sample size increases. This occurs across…

  9. Rasch Model Parameter Estimation in the Presence of a Nonnormal Latent Trait Using a Nonparametric Bayesian Approach

    ERIC Educational Resources Information Center

    Finch, Holmes; Edwards, Julianne M.

    2016-01-01

    Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…

  10. Item Parameter Estimation for the MIRT Model: Bias and Precision of Confirmatory Factor Analysis-Based Models

    ERIC Educational Resources Information Center

    Finch, Holmes

    2010-01-01

    The accuracy of item parameter estimates in the multidimensional item response theory (MIRT) model context is one that has not been researched in great detail. This study examines the ability of two confirmatory factor analysis models specifically for dichotomous data to properly estimate item parameters using common formulae for converting factor…

  11. Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items

    ERIC Educational Resources Information Center

    Chen, Cheng-Te; Wang, Wen-Chung

    2007-01-01

    This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…

  12. The Impact of Three Factors on the Recovery of Item Parameters for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Kim, Kyung Yong; Lee, Won-Chan

    2017-01-01

    This article provides a detailed description of three factors (specification of the ability distribution, numerical integration, and frame of reference for the item parameter estimates) that might affect the item parameter estimation of the three-parameter logistic model, and compares five item calibration methods, which are combinations of the…

  13. The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

    ERIC Educational Resources Information Center

    Kaskowitz, Gary S.; De Ayala, R. J.

    2001-01-01

    Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…

  14. Item Response Theory Equating Using Bayesian Informative Priors.

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Patz, Richard J.

    This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…

  15. Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2008-01-01

    In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…

  16. Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.

    ERIC Educational Resources Information Center

    Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun

    2002-01-01

    Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)

  17. The Asymptotic Distribution of Ability Estimates: Beyond Dichotomous Items and Unidimensional IRT Models

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2015-01-01

    The maximum likelihood estimate (MLE) of the ability parameter of an item response theory model with known item parameters was proved to be asymptotically normally distributed under a set of regularity conditions for tests involving dichotomous items and a unidimensional ability parameter (Klauer, 1990; Lord, 1983). This article first considers…

  18. ASCAL: A Microcomputer Program for Estimating Logistic IRT Item Parameters.

    ERIC Educational Resources Information Center

    Vale, C. David; Gialluca, Kathleen A.

    ASCAL is a microcomputer-based program for calibrating items according to the three-parameter logistic model of item response theory. It uses a modified multivariate Newton-Raphson procedure for estimating item parameters. This study evaluated this procedure using Monte Carlo Simulation Techniques. The current version of ASCAL was then compared to…

  19. Evaluation of Linking Methods for Placing Three-Parameter Logistic Item Parameter Estimates onto a One-Parameter Scale

    ERIC Educational Resources Information Center

    Karkee, Thakur B.; Wright, Karen R.

    2004-01-01

    Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…

  20. A Comparison of Limited-Information and Full-Information Methods in M"plus" for Estimating Item Response Theory Parameters for Nonnormal Populations

    ERIC Educational Resources Information Center

    DeMars, Christine E.

    2012-01-01

    In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…

  1. Modeling the Hyperdistribution of Item Parameters To Improve the Accuracy of Recovery in Estimation Procedures.

    ERIC Educational Resources Information Center

    Matthews-Lopez, Joy L.; Hombo, Catherine M.

    The purpose of this study was to examine the recovery of item parameters in simulated Automatic Item Generation (AIG) conditions, using Markov chain Monte Carlo (MCMC) estimation methods to attempt to recover the generating distributions. To do this, variability in item and ability parameters was manipulated. Realistic AIG conditions were…

  2. Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.

    PubMed

    Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

    2016-01-01

    This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.

  3. Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions

    PubMed Central

    Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

    2016-01-01

    This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability. PMID:26941699

  4. Estimation of Two-Parameter Logistic Item Response Curves. Research Report 83-1. Mathematical Sciences Technical Report No. 130.

    ERIC Educational Resources Information Center

    Tsutakawa, Robert K.

    This paper presents a method for estimating certain characteristics of test items which are designed to measure ability, or knowledge, in a particular area. Under the assumption that ability parameters are sampled from a normal distribution, the EM algorithm is used to derive maximum likelihood estimates to item parameters of the two-parameter…

  5. Refinement of a Bias-Correction Procedure for the Weighted Likelihood Estimator of Ability. Research Report. ETS RR-07-23

    ERIC Educational Resources Information Center

    Zhang, Jinming; Lu, Ting

    2007-01-01

    In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter…

  6. Accuracy and Variability of Item Parameter Estimates from Marginal Maximum a Posteriori Estimation and Bayesian Inference via Gibbs Samplers

    ERIC Educational Resources Information Center

    Wu, Yi-Fang

    2015-01-01

    Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…

  7. Comparing Different Approaches of Bias Correction for Ability Estimation in IRT Models. Research Report. ETS RR-08-13

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2008-01-01

    The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…

  8. An Evaluation of Hierarchical Bayes Estimation for the Two- Parameter Logistic Model.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho

    Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item parameters. Simulated data sets were analyzed using two different Bayes estimation procedures, the two-stage hierarchical Bayes estimation (HB2) and the marginal Bayesian with known hyperparameters (MB), and marginal maximum…

  9. IRT Item Parameter Recovery with Marginal Maximum Likelihood Estimation Using Loglinear Smoothing Models

    ERIC Educational Resources Information Center

    Casabianca, Jodi M.; Lewis, Charles

    2015-01-01

    Loglinear smoothing (LLS) estimates the latent trait distribution while making fewer assumptions about its form and maintaining parsimony, thus leading to more precise item response theory (IRT) item parameter estimates than standard marginal maximum likelihood (MML). This article provides the expectation-maximization algorithm for MML estimation…

  10. Variability in Parameter Estimates and Model Fit across Repeated Allocations of Items to Parcels

    ERIC Educational Resources Information Center

    Sterba, Sonya K.; MacCallum, Robert C.

    2010-01-01

    Different random or purposive allocations of items to parcels within a single sample are thought not to alter structural parameter estimates as long as items are unidimensional and congeneric. If, additionally, numbers of items per parcel and parcels per factor are held fixed across allocations, different allocations of items to parcels within a…

  11. Careful with Those Priors: A Note on Bayesian Estimation in Two-Parameter Logistic Item Response Theory Models

    ERIC Educational Resources Information Center

    Marcoulides, Katerina M.

    2018-01-01

    This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…

  12. Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression scores compared to independent item response theory model reestimation.

    PubMed

    Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix

    2016-03-01

    To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Item Response Theory Modeling of the Philadelphia Naming Test.

    PubMed

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

    2015-06-01

    In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.

  14. Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

    ERIC Educational Resources Information Center

    DeMars, Christine

    Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…

  15. Optimal Linking Design for Response Model Parameters

    ERIC Educational Resources Information Center

    Barrett, Michelle D.; van der Linden, Wim J.

    2017-01-01

    Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…

  16. Estimation of Item Response Theory Parameters in the Presence of Missing Data

    ERIC Educational Resources Information Center

    Finch, Holmes

    2008-01-01

    Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…

  17. Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science.

    PubMed

    Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo

    2017-10-30

    The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.

  18. The Sensitivity of Parameter Estimates to the Latent Ability Distribution. Research Report. ETS RR-11-40

    ERIC Educational Resources Information Center

    Xu, Xueli; Jia, Yue

    2011-01-01

    Estimation of item response model parameters and ability distribution parameters has been, and will remain, an important topic in the educational testing field. Much research has been dedicated to addressing this task. Some studies have focused on item parameter estimation when the latent ability was assumed to follow a normal distribution,…

  19. Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.

    ERIC Educational Resources Information Center

    Zhu, Renbang; Yu, Feng; Liu, Su

    A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…

  20. The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model

    ERIC Educational Resources Information Center

    Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet

    2012-01-01

    Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…

  1. Standard Errors and Confidence Intervals from Bootstrapping for Ramsay-Curve Item Response Theory Model Item Parameters

    ERIC Educational Resources Information Center

    Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.

    2011-01-01

    Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…

  2. Investigating Separate and Concurrent Approaches for Item Parameter Drift in 3PL Item Response Theory Equating

    ERIC Educational Resources Information Center

    Arce-Ferrer, Alvaro J.; Bulut, Okan

    2017-01-01

    This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…

  3. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    ERIC Educational Resources Information Center

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  4. Profile-likelihood Confidence Intervals in Item Response Theory Models.

    PubMed

    Chalmers, R Philip; Pek, Jolynn; Liu, Yang

    2017-01-01

    Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.

  5. Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

    ERIC Educational Resources Information Center

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-01-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…

  6. An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.

    ERIC Educational Resources Information Center

    Marco, Gary L.; And Others

    Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…

  7. Unidimensional Interpretations for Multidimensional Test Items

    ERIC Educational Resources Information Center

    Kahraman, Nilufer

    2013-01-01

    This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…

  8. Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

    ERIC Educational Resources Information Center

    Wetzel, C. Douglas; McBride, James R.

    Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…

  9. Construction of a Computerized Adaptive Testing Version of the Quebec Adaptive Behavior Scale.

    ERIC Educational Resources Information Center

    Tasse, Marc J.; And Others

    Multilog (Thissen, 1991) was used to estimate parameters of 225 items from the Quebec Adaptive Behavior Scale (QABS). A database containing actual data from 2,439 subjects was used for the parameterization procedures. The two-parameter-logistic model was used in estimating item parameters and in the testing strategy. MicroCAT (Assessment Systems…

  10. Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    Roberts, James S.; Thompson, Vanessa M.

    2011-01-01

    A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…

  11. Interactions Between Item Content And Group Membership on Achievement Test Items.

    ERIC Educational Resources Information Center

    Linn, Robert L.; Harnisch, Delwyn L.

    The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…

  12. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations. Research Report. ETS RR-09-40

    ERIC Educational Resources Information Center

    Haberman, Shelby J.

    2009-01-01

    A regression procedure is developed to link simultaneously a very large number of item response theory (IRT) parameter estimates obtained from a large number of test forms, where each form has been separately calibrated and where forms can be linked on a pairwise basis by means of common items. An application is made to forms in which a…

  13. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

    ERIC Educational Resources Information Center

    Sahin, Alper; Anil, Duygu

    2017-01-01

    This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…

  14. Estimation of a Ramsay-Curve Item Response Theory Model by the Metropolis-Hastings Robbins-Monro Algorithm. CRESST Report 834

    ERIC Educational Resources Information Center

    Monroe, Scott; Cai, Li

    2013-01-01

    In Ramsay curve item response theory (RC-IRT, Woods & Thissen, 2006) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's (1981) EM algorithm, which yields maximum marginal likelihood estimates. This method, however,…

  15. Estimation of a Ramsay-Curve Item Response Theory Model by the Metropolis-Hastings Robbins-Monro Algorithm

    ERIC Educational Resources Information Center

    Monroe, Scott; Cai, Li

    2014-01-01

    In Ramsay curve item response theory (RC-IRT) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's EM algorithm, which yields maximum marginal likelihood estimates. This method, however, does not produce the…

  16. Equal Area Logistic Estimation for Item Response Theory

    NASA Astrophysics Data System (ADS)

    Lo, Shih-Ching; Wang, Kuo-Chang; Chang, Hsin-Li

    2009-08-01

    Item response theory (IRT) models use logistic functions exclusively as item response functions (IRFs). Applications of IRT models require obtaining the set of values for logistic function parameters that best fit an empirical data set. However, success in obtaining such set of values does not guarantee that the constructs they represent actually exist, for the adequacy of a model is not sustained by the possibility of estimating parameters. In this study, an equal area based two-parameter logistic model estimation algorithm is proposed. Two theorems are given to prove that the results of the algorithm are equivalent to the results of fitting data by logistic model. Numerical results are presented to show the stability and accuracy of the algorithm.

  17. A Comparison of the One-, the Modified Three-, and the Three-Parameter Item Response Theory Models in the Test Development Item Selection Process.

    ERIC Educational Resources Information Center

    Eignor, Daniel R.; Douglass, James B.

    This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…

  18. Investigation of IRT-Based Equating Methods in the Presence of Outlier Common Items

    ERIC Educational Resources Information Center

    Hu, Huiqin; Rogers, W. Todd; Vukmirovic, Zarko

    2008-01-01

    Common items with inconsistent b-parameter estimates may have a serious impact on item response theory (IRT)--based equating results. To find a better way to deal with the outlier common items with inconsistent b-parameters, the current study investigated the comparability of 10 variations of four IRT-based equating methods (i.e., concurrent…

  19. Examination of Different Item Response Theory Models on Tests Composed of Testlets

    ERIC Educational Resources Information Center

    Kogar, Esin Yilmaz; Kelecioglu, Hülya

    2017-01-01

    The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…

  20. The Effect of Including or Excluding Students with Testing Accommodations on IRT Calibrations.

    ERIC Educational Resources Information Center

    Karkee, Thakur; Lewis, Dan M.; Barton, Karen; Haug, Carolyn

    This study aimed to determine the degree to which the inclusion of accommodated students with disabilities in the calibration sample affects the characteristics of item parameters and the test results. Investigated were effects on test reliability, item fit to the applicable item response theory (IRT) model, item parameter estimates, and students'…

  1. A Note on the Reliability Coefficients for Item Response Model-Based Ability Estimates

    ERIC Educational Resources Information Center

    Kim, Seonghoon

    2012-01-01

    Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true…

  2. Parameter Estimation for Thurstone Choice Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vojnovic, Milan; Yun, Seyoung

    We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons. We provide a tight characterization of the mean squared error of the maximum likelihood parameter estimator. We also provide similar characterizations for parameter estimators defined by a rank-breaking method, which amounts to deducing one ormore » more pair comparisons from a comparison of two or more items, assuming independence of these pair comparisons, and maximizing a likelihood function derived under these assumptions. We also consider a related binary classification problem where each individual parameter takes value from a set of two possible values and the goal is to correctly classify all items within a prescribed classification error. The results of this paper shed light on how the parameter estimation accuracy depends on given Thurstone choice model and the structure of comparison sets. In particular, we found that for unbiased input comparison sets of a given cardinality, when in expectation each comparison set of given cardinality occurs the same number of times, for a broad class of Thurstone choice models, the mean squared error decreases with the cardinality of comparison sets, but only marginally according to a diminishing returns relation. On the other hand, we found that there exist Thurstone choice models for which the mean squared error of the maximum likelihood parameter estimator can decrease much faster with the cardinality of comparison sets. We report empirical evaluation of some claims and key parameters revealed by theory using both synthetic and real-world input data from some popular sport competitions and online labor platforms.« less

  3. A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis

    ERIC Educational Resources Information Center

    Edwards, Michael C.

    2010-01-01

    Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show…

  4. The Impact of Multidirectional Item Parameter Drift on IRT Scaling Coefficients and Proficiency Estimates

    ERIC Educational Resources Information Center

    Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G.

    2012-01-01

    Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…

  5. Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model

    PubMed Central

    Seo, Dong Gi; Weiss, David J.

    2015-01-01

    Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. PMID:29795848

  6. Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Hong, Yuan; Deng, Weiling

    2010-01-01

    To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…

  7. Estimating Non-Normal Latent Trait Distributions within Item Response Theory Using True and Estimated Item Parameters

    ERIC Educational Resources Information Center

    Sass, D. A.; Schmitt, T. A.; Walker, C. M.

    2008-01-01

    Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…

  8. Implementation of the EM Algorithm in the Estimation of Item Parameters: The BILOG Computer Program.

    ERIC Educational Resources Information Center

    Mislevy, Robert J.; Bock, R. Darrell

    This paper reviews the basic elements of the EM approach to estimating item parameters and illustrates its use with one simulated and one real data set. In order to illustrate the use of the BILOG computer program, runs for 1-, 2-, and 3-parameter models are presented for the two sets of data. First is a set of responses from 1,000 persons to five…

  9. Invariance Properties for General Diagnostic Classification Models

    ERIC Educational Resources Information Center

    Bradshaw, Laine P.; Madison, Matthew J.

    2016-01-01

    In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic…

  10. Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2008-01-01

    In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…

  11. Semiparametric Item Response Functions in the Context of Guessing

    ERIC Educational Resources Information Center

    Falk, Carl F.; Cai, Li

    2016-01-01

    We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

  12. Cognitive diagnosis modelling incorporating item response times.

    PubMed

    Zhan, Peida; Jiao, Hong; Liao, Dandan

    2018-05-01

    To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.

  13. Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

    ERIC Educational Resources Information Center

    Michaelides, Michalis P.; Haertel, Edward H.

    2014-01-01

    The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…

  14. Investigating the Stability of Four Methods for Estimating Item Bias.

    ERIC Educational Resources Information Center

    Perlman, Carole L.; And Others

    The reliability of item bias estimates was studied for four methods: (1) the transformed delta method; (2) Shepard's modified delta method; (3) Rasch's one-parameter residual analysis; and (4) the Mantel-Haenszel procedure. Bias statistics were computed for each sample using all methods. Data were from administration of multiple-choice items from…

  15. A Test-Length Correction to the Estimation of Extreme Proficiency Levels

    ERIC Educational Resources Information Center

    Magis, David; Beland, Sebastien; Raiche, Gilles

    2011-01-01

    In this study, the estimation of extremely large or extremely small proficiency levels, given the item parameters of a logistic item response model, is investigated. On one hand, the estimation of proficiency levels by maximum likelihood (ML), despite being asymptotically unbiased, may yield infinite estimates. On the other hand, with an…

  16. Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844

    ERIC Educational Resources Information Center

    Falk, Carl F.; Cai, Li

    2015-01-01

    We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

  17. Sources of interference in item and associative recognition memory.

    PubMed

    Osth, Adam F; Dennis, Simon

    2015-04-01

    A powerful theoretical framework for exploring recognition memory is the global matching framework, in which a cue's memory strength reflects the similarity of the retrieval cues being matched against the contents of memory simultaneously. Contributions at retrieval can be categorized as matches and mismatches to the item and context cues, including the self match (match on item and context), item noise (match on context, mismatch on item), context noise (match on item, mismatch on context), and background noise (mismatch on item and context). We present a model that directly parameterizes the matches and mismatches to the item and context cues, which enables estimation of the magnitude of each interference contribution (item noise, context noise, and background noise). The model was fit within a hierarchical Bayesian framework to 10 recognition memory datasets that use manipulations of strength, list length, list strength, word frequency, study-test delay, and stimulus class in item and associative recognition. Estimates of the model parameters revealed at most a small contribution of item noise that varies by stimulus class, with virtually no item noise for single words and scenes. Despite the unpopularity of background noise in recognition memory models, background noise estimates dominated at retrieval across nearly all stimulus classes with the exception of high frequency words, which exhibited equivalent levels of context noise and background noise. These parameter estimates suggest that the majority of interference in recognition memory stems from experiences acquired before the learning episode. (c) 2015 APA, all rights reserved).

  18. Estimating a Noncompensatory IRT Model Using Metropolis within Gibbs Sampling

    ERIC Educational Resources Information Center

    Babcock, Ben

    2011-01-01

    Relatively little research has been conducted with the noncompensatory class of multidimensional item response theory (MIRT) models. A Monte Carlo simulation study was conducted exploring the estimation of a two-parameter noncompensatory item response theory (IRT) model. The estimation method used was a Metropolis-Hastings within Gibbs algorithm…

  19. Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2017-01-01

    Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

  20. How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation

    ERIC Educational Resources Information Center

    Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard

    2006-01-01

    Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…

  1. Observed Score and True Score Equating Procedures for Multidimensional Item Response Theory

    ERIC Educational Resources Information Center

    Brossman, Bradley Grant

    2010-01-01

    The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the Multidimensional Item Response Theory (MIRT) framework. Currently, MIRT scale linking procedures exist to place item parameter estimates and ability estimates on the same scale after separate calibrations are conducted.…

  2. Comparing Three Estimation Methods for the Three-Parameter Logistic IRT Model

    ERIC Educational Resources Information Center

    Lamsal, Sunil

    2015-01-01

    Different estimation procedures have been developed for the unidimensional three-parameter item response theory (IRT) model. These techniques include the marginal maximum likelihood estimation, the fully Bayesian estimation using Markov chain Monte Carlo simulation techniques, and the Metropolis-Hastings Robbin-Monro estimation. With each…

  3. Potential application of item-response theory to interpretation of medical codes in electronic patient records

    PubMed Central

    2011-01-01

    Background Electronic patient records are generally coded using extensive sets of codes but the significance of the utilisation of individual codes may be unclear. Item response theory (IRT) models are used to characterise the psychometric properties of items included in tests and questionnaires. This study asked whether the properties of medical codes in electronic patient records may be characterised through the application of item response theory models. Methods Data were provided by a cohort of 47,845 participants from 414 family practices in the UK General Practice Research Database (GPRD) with a first stroke between 1997 and 2006. Each eligible stroke code, out of a set of 202 OXMIS and Read codes, was coded as either recorded or not recorded for each participant. A two parameter IRT model was fitted using marginal maximum likelihood estimation. Estimated parameters from the model were considered to characterise each code with respect to the latent trait of stroke diagnosis. The location parameter is referred to as a calibration parameter, while the slope parameter is referred to as a discrimination parameter. Results There were 79,874 stroke code occurrences available for analysis. Utilisation of codes varied between family practices with intraclass correlation coefficients of up to 0.25 for the most frequently used codes. IRT analyses were restricted to 110 Read codes. Calibration and discrimination parameters were estimated for 77 (70%) codes that were endorsed for 1,942 stroke patients. Parameters were not estimated for the remaining more frequently used codes. Discrimination parameter values ranged from 0.67 to 2.78, while calibration parameters values ranged from 4.47 to 11.58. The two parameter model gave a better fit to the data than either the one- or three-parameter models. However, high chi-square values for about a fifth of the stroke codes were suggestive of poor item fit. Conclusion The application of item response theory models to coded electronic patient records might potentially contribute to identifying medical codes that offer poor discrimination or low calibration. This might indicate the need for improved coding sets or a requirement for improved clinical coding practice. However, in this study estimates were only obtained for a small proportion of participants and there was some evidence of poor model fit. There was also evidence of variation in the utilisation of codes between family practices raising the possibility that, in practice, properties of codes may vary for different coders. PMID:22176509

  4. Non-ignorable missingness item response theory models for choice effects in examinee-selected items.

    PubMed

    Liu, Chen-Wei; Wang, Wen-Chung

    2017-11-01

    Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.

  5. Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

    ERIC Educational Resources Information Center

    Tsutakawa, Robert K.; Lin, Hsin Ying

    Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…

  6. Standard Errors of Estimated Latent Variable Scores with Estimated Structural Parameters

    ERIC Educational Resources Information Center

    Hoshino, Takahiro; Shigemasu, Kazuo

    2008-01-01

    The authors propose a concise formula to evaluate the standard error of the estimated latent variable score when the true values of the structural parameters are not known and must be estimated. The formula can be applied to factor scores in factor analysis or ability parameters in item response theory, without bootstrap or Markov chain Monte…

  7. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    NASA Astrophysics Data System (ADS)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-12-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  8. Effects of Calibration Sample Size and Item Bank Size on Ability Estimation in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Sahin, Alper; Weiss, David J.

    2015-01-01

    This study aimed to investigate the effects of calibration sample size and item bank size on examinee ability estimation in computerized adaptive testing (CAT). For this purpose, a 500-item bank pre-calibrated using the three-parameter logistic model with 10,000 examinees was simulated. Calibration samples of varying sizes (150, 250, 350, 500,…

  9. Upper-extremity and mobility subdomains from the Patient-Reported Outcomes Measurement Information System (PROMIS) adult physical functioning item bank.

    PubMed

    Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar

    2013-11-01

    To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  10. Parameter Estimation in Rasch Models for Examinee-Selected Items

    ERIC Educational Resources Information Center

    Liu, Chen-Wei; Wang, Wen-Chung

    2017-01-01

    The examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using…

  11. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    ERIC Educational Resources Information Center

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  12. Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

    ERIC Educational Resources Information Center

    He, Yong

    2013-01-01

    Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…

  13. A new item response theory model to adjust data allowing examinee choice

    PubMed Central

    Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

    2018-01-01

    In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996

  14. A Normalized Direct Approach for Estimating the Parameters of the Normal Ogive Three-Parameter Model for Ability Tests.

    ERIC Educational Resources Information Center

    Gugel, John F.

    A new method for estimating the parameters of the normal ogive three-parameter model for multiple-choice test items--the normalized direct (NDIR) procedure--is examined. The procedure is compared to a more commonly used estimation procedure, Lord's LOGIST, using computer simulations. The NDIR procedure uses the normalized (mid-percentile)…

  15. Consistency of Rasch Model Parameter Estimation: A Simulation Study.

    ERIC Educational Resources Information Center

    van den Wollenberg, Arnold L.; And Others

    1988-01-01

    The unconditional--simultaneous--maximum likelihood (UML) estimation procedure for the one-parameter logistic model produces biased estimators. The UML method is inconsistent and is not a good alternative to conditional maximum likelihood method, at least with small numbers of items. The minimum Chi-square estimation procedure produces unbiased…

  16. Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

    ERIC Educational Resources Information Center

    Matlock, Ki Lynn; Turner, Ronna

    2016-01-01

    When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…

  17. Stochastic Approximation Methods for Latent Regression Item Response Models

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  18. Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

    PubMed Central

    Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

    2011-01-01

    Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212

  19. Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

    ERIC Educational Resources Information Center

    Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

    2011-01-01

    The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…

  20. Rasch Measurement and Item Banking: Theory and Practice.

    ERIC Educational Resources Information Center

    Nakamura, Yuji

    The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…

  1. A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

    PubMed

    Park, Jong Cook; Kim, Kwang Sig

    2012-03-01

    The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.

  2. Effects of Misbehaving Common Items on Aggregate Scores and an Application of the Mantel-Haenszel Statistic in Test Equating. CSE Report 688

    ERIC Educational Resources Information Center

    Michaelides, Michalis P.

    2006-01-01

    Consistent behavior is a desirable characteristic that common items are expected to have when administered to different groups. Findings from the literature have established that items do not always behave in consistent ways; item indices and IRT item parameter estimates of the same items differ when obtained from different administrations.…

  3. Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

    PubMed

    Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

    2015-08-01

    The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.

  4. A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho; Cohen, Allan S.

    Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…

  5. Higher-Order Item Response Models for Hierarchical Latent Traits

    ERIC Educational Resources Information Center

    Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming

    2013-01-01

    Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…

  6. The Rasch Model and Missing Data, with an Emphasis on Tailoring Test Items.

    ERIC Educational Resources Information Center

    de Gruijter, Dato N. M.

    Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the…

  7. A large-scale, long-term study of scale drift: The micro view and the macro view

    NASA Astrophysics Data System (ADS)

    He, W.; Li, S.; Kingsbury, G. G.

    2016-11-01

    The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.

  8. Exploring Alternative Characteristic Curve Approaches to Linking Parameter Estimates from the Generalized Partial Credit Model.

    ERIC Educational Resources Information Center

    Roberts, James S.; Bao, Han; Huang, Chun-Wei; Gagne, Phill

    Characteristic curve approaches for linking parameters from the generalized partial credit model were examined for cases in which common (anchor) items are calibrated separately in two groups. Three of these approaches are simple extensions of the test characteristic curve (TCC), item characteristic curve (ICC), and operating characteristic curve…

  9. Measuring the quality of life in hypertension according to Item Response Theory

    PubMed Central

    Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

    2017-01-01

    ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764

  10. Application of latent variable model in Rosenberg self-esteem scale.

    PubMed

    Leung, Shing-On; Wu, Hui-Ping

    2013-01-01

    Latent Variable Models (LVM) are applied to Rosenberg Self-Esteem Scale (RSES). Parameter estimations automatically give negative signs hence no recoding is necessary for negatively scored items. Bad items can be located through parameter estimate, item characteristic curves and other measures. Two factors are extracted with one on self-esteem and the other on the degree to take moderate views, with the later not often being covered in previous studies. A goodness-of-fit measure based on two-way margins is used but more works are needed. Results show that scaling provided by models with more formal statistical ground correlated highly with conventional method, which may provide justification for usual practice.

  11. Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

    ERIC Educational Resources Information Center

    Baghaei, Purya; Ravand, Hamdollah

    2016-01-01

    In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…

  12. Sequential Computerized Mastery Tests--Three Simulation Studies

    ERIC Educational Resources Information Center

    Wiberg, Marie

    2006-01-01

    A simulation study of a sequential computerized mastery test is carried out with items modeled with the 3 parameter logistic item response theory model. The examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The…

  13. A model for incomplete longitudinal multivariate ordinal data.

    PubMed

    Liu, Li C

    2008-12-30

    In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point. Copyright 2008 John Wiley & Sons, Ltd.

  14. Computerized adaptive testing: the capitalization on chance problem.

    PubMed

    Olea, Julio; Barrada, Juan Ramón; Abad, Francisco J; Ponsoda, Vicente; Cuevas, Lara

    2012-03-01

    This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.

  15. Development of the Contact Lens User Experience: CLUE Scales

    PubMed Central

    Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.

    2016-01-01

    ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257

  16. Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

    PubMed Central

    Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

    2018-01-01

    Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879

  17. Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test.

    PubMed

    Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin

    2018-01-01

    The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.

  18. Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

    ERIC Educational Resources Information Center

    Doebler, Anna; Doebler, Philipp; Holling, Heinz

    2013-01-01

    The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter [theta] is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given…

  19. Computing Maximum Likelihood Estimates of Loglinear Models from Marginal Sums with Special Attention to Loglinear Item Response Theory.

    ERIC Educational Resources Information Center

    Kelderman, Henk

    1992-01-01

    Describes algorithms used in the computer program LOGIMO for obtaining maximum likelihood estimates of the parameters in loglinear models. These algorithms are also useful for the analysis of loglinear item-response theory models. Presents modified versions of the iterative proportional fitting and Newton-Raphson algorithms. Simulated data…

  20. Parameter Estimation with Small Sample Size: A Higher-Order IRT Model Approach

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Hong, Yuan

    2010-01-01

    Sample size ranks as one of the most important factors that affect the item calibration task. However, due to practical concerns (e.g., item exposure) items are typically calibrated with much smaller samples than what is desired. To address the need for a more flexible framework that can be used in small sample item calibration, this article…

  1. Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

    ERIC Educational Resources Information Center

    Kieftenbeld, Vincent; Natesan, Prathiba

    2012-01-01

    Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…

  2. The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

    ERIC Educational Resources Information Center

    Hamadneh, Iyad Mohammed

    2015-01-01

    This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…

  3. The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models

    ERIC Educational Resources Information Center

    Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver

    2012-01-01

    Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item-fit statistics for correct and misspecified diagnostic classification models within a log-linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3…

  4. Online Calibration of Polytomous Items Under the Generalized Partial Credit Model

    PubMed Central

    Zheng, Yi

    2016-01-01

    Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063

  5. A mixed-effects regression model for longitudinal multivariate ordinal data.

    PubMed

    Liu, Li C; Hedeker, Donald

    2006-03-01

    A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.

  6. A Comparison of Latent Growth Models for Constructs Measured by Multiple Items

    ERIC Educational Resources Information Center

    Leite, Walter L.

    2007-01-01

    Univariate latent growth modeling (LGM) of composites of multiple items (e.g., item means or sums) has been frequently used to analyze the growth of latent constructs. This study evaluated whether LGM of composites yields unbiased parameter estimates, standard errors, chi-square statistics, and adequate fit indexes. Furthermore, LGM was compared…

  7. Item Response Theory Modeling of the Philadelphia Naming Test

    ERIC Educational Resources Information Center

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.

    2015-01-01

    Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…

  8. Empirical Histograms in Item Response Theory with Ordinal Data

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2007-01-01

    The purpose of this research is to describe, test, and illustrate a new implementation of the empirical histogram (EH) method for ordinal items. The EH method involves the estimation of item response model parameters simultaneously with the approximation of the distribution of the random latent variable (theta) as a histogram. Software for the EH…

  9. A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

    ERIC Educational Resources Information Center

    Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole

    2016-01-01

    Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

  10. What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates

    PubMed Central

    Thomas, Sarah L.; Schmidt, Karen M.; Erbacher, Monica K.; Bergeman, Cindy S.

    2017-01-01

    The authors investigated the effect of Missing Completely at Random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates. PMID:26784376

  11. Ability Estimation and Item Calibration Using the One and Three Parameter Logistic Models: A Comparative Study. Research Report 77-1.

    ERIC Educational Resources Information Center

    Reckase, Mark D.

    Latent trait model calibration procedures were used on data obtained from a group testing program. The one-parameter model of Wright and Panchapakesan and the three-parameter logistic model of Wingersky, Wood, and Lord were selected for comparison. These models and their corresponding estimation procedures were compared, using actual and simulated…

  12. Information matrix estimation procedures for cognitive diagnostic models.

    PubMed

    Liu, Yanlou; Xin, Tao; Andersson, Björn; Tian, Wei

    2018-03-06

    Two new methods to estimate the asymptotic covariance matrix for marginal maximum likelihood estimation of cognitive diagnosis models (CDMs), the inverse of the observed information matrix and the sandwich-type estimator, are introduced. Unlike several previous covariance matrix estimators, the new methods take into account both the item and structural parameters. The relationships between the observed information matrix, the empirical cross-product information matrix, the sandwich-type covariance matrix and the two approaches proposed by de la Torre (2009, J. Educ. Behav. Stat., 34, 115) are discussed. Simulation results show that, for a correctly specified CDM and Q-matrix or with a slightly misspecified probability model, the observed information matrix and the sandwich-type covariance matrix exhibit good performance with respect to providing consistent standard errors of item parameter estimates. However, with substantial model misspecification only the sandwich-type covariance matrix exhibits robust performance. © 2018 The British Psychological Society.

  13. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  14. Generalized Full-Information Item Bifactor Analysis

    PubMed Central

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682

  15. Linear Logistic Test Modeling with R

    ERIC Educational Resources Information Center

    Baghaei, Purya; Kubinger, Klaus D.

    2015-01-01

    The present paper gives a general introduction to the linear logistic test model (Fischer, 1973), an extension of the Rasch model with linear constraints on item parameters, along with eRm (an R package to estimate different types of Rasch models; Mair, Hatzinger, & Mair, 2014) functions to estimate the model and interpret its parameters. The…

  16. Differential item functioning of the patient-reported outcomes information system (PROMIS®) pain interference item bank by language (Spanish versus English).

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D

    2017-06-01

    About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.

  17. A Procedure to Detect Item Bias Present Simultaneously in Several Items

    DTIC Science & Technology

    1991-04-25

    exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a

  18. A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

    PubMed

    Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

    2017-03-01

    Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.

  19. Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level

    PubMed Central

    Savalei, Victoria; Rhemtulla, Mijke

    2017-01-01

    In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data—that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study. PMID:29276371

  20. Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level.

    PubMed

    Savalei, Victoria; Rhemtulla, Mijke

    2017-08-01

    In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data-that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study.

  1. Depression symptoms across cultures: an IRT analysis of standard depression symptoms using data from eight countries.

    PubMed

    Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J

    2016-07-01

    Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.

  2. Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

    PubMed Central

    Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

    2004-01-01

    Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681

  3. Comparison of Unidimensional and Multidimensional Approaches to IRT Parameter Estimation. Research Report. ETS RR-04-44

    ERIC Educational Resources Information Center

    Zhang, Jinming

    2004-01-01

    It is common to assume during statistical analysis of a multiscale assessment that the assessment has simple structure or that it is composed of several unidimensional subtests. Under this assumption, both the unidimensional and multidimensional approaches can be used to estimate item parameters. This paper theoretically demonstrates that these…

  4. Markov Chain Monte Carlo Estimation of Item Parameters for the Generalized Graded Unfolding Model

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S.

    2006-01-01

    The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…

  5. Life sciences payload definition and integration study. Volume 3: Preliminary equipment item specification catalog for the carry-on laboratories. [for Spacelab

    NASA Technical Reports Server (NTRS)

    1974-01-01

    All general purpose equipment items contained in the final carry-on laboratory (COL) design concepts are described in terms of specific requirements identified for COL use, hardware status, and technical parameters such as weight, volume, power, range, and precision. Estimated costs for each item are given, along with projected development times.

  6. An Empirical Investigation of the Potential Impact of Item Misfit on Test Scores. Research Report. ETS RR-17-60

    ERIC Educational Resources Information Center

    Kim, Sooyeon; Robin, Frederic

    2017-01-01

    In this study, we examined the potential impact of item misfit on the reported scores of an admission test from the subpopulation invariance perspective. The target population of the test consisted of 3 major subgroups with different geographic regions. We used the logistic regression function to estimate item parameters of the operational items…

  7. Computing Maximum Likelihood Estimates of Loglinear Models from Marginal Sums with Special Attention to Loglinear Item Response Theory. [Project Psychometric Aspects of Item Banking No. 53.] Research Report 91-1.

    ERIC Educational Resources Information Center

    Kelderman, Henk

    In this paper, algorithms are described for obtaining the maximum likelihood estimates of the parameters in log-linear models. Modified versions of the iterative proportional fitting and Newton-Raphson algorithms are described that work on the minimal sufficient statistics rather than on the usual counts in the full contingency table. This is…

  8. Do Concept Inventories Actually Measure Anything?

    ERIC Educational Resources Information Center

    Wallace, Colin S.; Bailey, Janelle M.

    2010-01-01

    Although concept inventories are among the most frequently used tools in the physics and astronomy education communities, they are rarely evaluated using item response theory (IRT). When IRT models fit the data, they offer sample-independent estimates of item and person parameters. IRT may also provide a way to measure students' learning gains…

  9. The Long-Term Sustainability of Different Item Response Theory Scaling Methods

    ERIC Educational Resources Information Center

    Keller, Lisa A.; Keller, Robert R.

    2011-01-01

    This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests…

  10. Poisson and negative binomial item count techniques for surveys with sensitive question.

    PubMed

    Tian, Guo-Liang; Tang, Man-Lai; Wu, Qin; Liu, Yin

    2017-04-01

    Although the item count technique is useful in surveys with sensitive questions, privacy of those respondents who possess the sensitive characteristic of interest may not be well protected due to a defect in its original design. In this article, we propose two new survey designs (namely the Poisson item count technique and negative binomial item count technique) which replace several independent Bernoulli random variables required by the original item count technique with a single Poisson or negative binomial random variable, respectively. The proposed models not only provide closed form variance estimate and confidence interval within [0, 1] for the sensitive proportion, but also simplify the survey design of the original item count technique. Most importantly, the new designs do not leak respondents' privacy. Empirical results show that the proposed techniques perform satisfactorily in the sense that it yields accurate parameter estimate and confidence interval.

  11. Maximum Marginal Likelihood Estimation of a Monotonic Polynomial Generalized Partial Credit Model with Applications to Multiple Group Analysis.

    PubMed

    Falk, Carl F; Cai, Li

    2016-06-01

    We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.

  12. Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

    PubMed

    Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

    2016-03-12

    Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.

  13. Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

    PubMed

    Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

    2017-02-01

    The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Item response theory - A first approach

    NASA Astrophysics Data System (ADS)

    Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

    2017-07-01

    The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.

  15. A Short Note on Estimating the Testlet Model with Different Estimators in Mplus

    ERIC Educational Resources Information Center

    Luo, Yong

    2018-01-01

    Mplus is a powerful latent variable modeling software program that has become an increasingly popular choice for fitting complex item response theory models. In this short note, we demonstrate that the two-parameter logistic testlet model can be estimated as a constrained bifactor model in Mplus with three estimators encompassing limited- and…

  16. A Multidimensional Item Response Model: Constrained Latent Class Analysis Using the Gibbs Sampler and Posterior Predictive Checks.

    ERIC Educational Resources Information Center

    Hoijtink, Herbert; Molenaar, Ivo W.

    1997-01-01

    This paper shows that a certain class of constrained latent class models may be interpreted as a special case of nonparametric multidimensional item response models. Parameters of this latent class model are estimated using an application of the Gibbs sampler, and model fit is investigated using posterior predictive checks. (SLD)

  17. The Examination of the Classification of Students into Performance Categories by Two Different Equating Methods

    ERIC Educational Resources Information Center

    Keller, Lisa A.; Keller, Robert R.; Parker, Pauline A.

    2011-01-01

    This study investigates the comparability of two item response theory based equating methods: true score equating (TSE), and estimated true equating (ETE). Additionally, six scaling methods were implemented within each equating method: mean-sigma, mean-mean, two versions of fixed common item parameter, Stocking and Lord, and Haebara. Empirical…

  18. Effects of Design Properties on Parameter Estimation in Large-Scale Assessments

    ERIC Educational Resources Information Center

    Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas

    2015-01-01

    The selection of an appropriate booklet design is an important element of large-scale assessments of student achievement. Two design properties that are typically optimized are the "balance" with respect to the positions the items are presented and with respect to the mutual occurrence of pairs of items in the same booklet. The purpose…

  19. Examination of the Assumptions and Properties of the Graded Item Response Model: An Example Using a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    1995-01-01

    Over 5,000 students participated in a study of the dimensionality and stability of the item parameter estimates of a mathematics performance assessment developed for the Quantitative Understanding: Amplifying Student Achievement and Reasoning (QUASAR) Project. Results demonstrate the test's dimensionality and illustrate ways to examine use of the…

  20. An Estimation Procedure for the Structural Parameters of the Unified Cognitive/IRT Model.

    ERIC Educational Resources Information Center

    Jiang, Hai; And Others

    L. V. DiBello, W. F. Stout, and L. A. Roussos (1993) have developed a new item response model, the Unified Model, which brings together the discrete, deterministic aspects of cognition favored by cognitive scientists, and the continuous, stochastic aspects of test response behavior that underlie item response theory (IRT). The Unified Model blends…

  1. Development and community-based validation of eight item banks to assess mental health.

    PubMed

    Batterham, Philip J; Sunderland, Matthew; Carragher, Natacha; Calear, Alison L

    2016-09-30

    There is a need for precise but brief screening of mental health problems in a range of settings. The development of item banks to assess depression and anxiety has resulted in new adaptive and static screeners that accurately assess severity of symptoms. However, expansion to a wider array of mental health problems is required. The current study developed item banks for eight mental health problems: social anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder, adult attention-deficit hyperactivity disorder, drug use, psychosis and suicidality. The item banks were calibrated in a population-based Australian adult sample (N=3175) by administering large item pools (45-75 items) and excluding items on the basis of local dependence or measurement non-invariance. Item Response Theory parameters were estimated for each item bank using a two-parameter graded response model. Each bank consisted of 19-47 items, demonstrating excellent fit and precision across a range of -1 to 3 standard deviations from the mean. No previous study has developed such a broad range of mental health item banks. The calibrated item banks will form the basis of a new system of static and adaptive measures to screen for a broad array of mental health problems in the community. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Bayes Factor Covariance Testing in Item Response Models.

    PubMed

    Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

    2017-12-01

    Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.

  3. Bayesian inference in an item response theory model with a generalized student t link function

    NASA Astrophysics Data System (ADS)

    Azevedo, Caio L. N.; Migon, Helio S.

    2012-10-01

    In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.

  4. An Extension of the Partial Credit Model with an Application to the Measurement of Change.

    ERIC Educational Resources Information Center

    Fischer, Gerhard H.; Ponocny, Ivo

    1994-01-01

    An extension to the partial credit model, the linear partial credit model, is considered under the assumption of a certain linear decomposition of the item x category parameters into basic parameters. A conditional maximum likelihood algorithm for estimating basic parameters is presented and illustrated with simulation and an empirical study. (SLD)

  5. The Least-Squares Estimation of Latent Trait Variables.

    ERIC Educational Resources Information Center

    Tatsuoka, Kikumi

    This paper presents a new method for estimating a given latent trait variable by the least-squares approach. The beta weights are obtained recursively with the help of Fourier series and expressed as functions of item parameters of response curves. The values of the latent trait variable estimated by this method and by maximum likelihood method…

  6. Psychometric properties of the SDM-Q-9 questionnaire for shared decision-making in multiple sclerosis: item response theory modelling and confirmatory factor analysis.

    PubMed

    Ballesteros, Javier; Moral, Ester; Brieva, Luis; Ruiz-Beato, Elena; Prefasi, Daniel; Maurino, Jorge

    2017-04-22

    Shared decision-making is a cornerstone of patient-centred care. The 9-item Shared Decision-Making Questionnaire (SDM-Q-9) is a brief self-assessment tool for measuring patients' perceived level of involvement in decision-making related to their own treatment and care. Information related to the psychometric properties of the SDM-Q-9 for multiple sclerosis (MS) patients is limited. The objective of this study was to assess the performance of the items composing the SDM-Q-9 and its dimensional structure in patients with relapsing-remitting MS. A non-interventional, cross-sectional study in adult patients with relapsing-remitting MS was conducted in 17 MS units throughout Spain. A nonparametric item response theory (IRT) analysis was used to assess the latent construct and dimensional structure underlying the observed responses. A parametric IRT model, General Partial Credit Model, was fitted to obtain estimates of the relationship between the latent construct and item characteristics. The unidimensionality of the SDM-Q-9 instrument was assessed by confirmatory factor analysis. A total of 221 patients were studied (mean age = 42.1 ± 9.9 years, 68.3% female). Median Expanded Disability Status Scale score was 2.5 ± 1.5. Most patients reported taking part in each step of the decision-making process. Internal reliability of the instrument was high (Cronbach's α = 0.91) and the overall scale scalability score was 0.57, indicative of a strong scale. All items, except for the item 1, showed scalability indices higher than 0.30. Four items (items 6 through to 9) conveyed more than half of the SDM-Q-9 overall information (67.3%). The SDM-Q-9 was a good fit for a unidimensional latent structure (comparative fit index = 0.98, root-mean-square error of approximation = 0.07). All freely estimated parameters were statistically significant (P < 0.001). All items presented standardized parameter estimates with salient loadings (>0.40) with the exception of item 1 which presented the lowest loading (0.26). Items 6 through to 8 were the most relevant items for shared decision-making. The SDM-Q-9 presents appropriate psychometric properties and is therefore useful for assessing different aspects of shared decision-making in patients with multiple sclerosis.

  7. Bayesian Estimation of the DINA Model with Gibbs Sampling

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew

    2015-01-01

    A Bayesian model formulation of the deterministic inputs, noisy "and" gate (DINA) model is presented. Gibbs sampling is employed to simulate from the joint posterior distribution of item guessing and slipping parameters, subject attribute parameters, and latent class probabilities. The procedure extends concepts in Béguin and Glas,…

  8. An Evaluation of a Markov Chain Monte Carlo Method for the Two-Parameter Logistic Model.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho; Cohen, Allan S.

    The accuracy of the Markov Chain Monte Carlo (MCMC) procedure Gibbs sampling was considered for estimation of item parameters of the two-parameter logistic model. Data for the Law School Admission Test (LSAT) Section 6 were analyzed to illustrate the MCMC procedure. In addition, simulated data sets were analyzed using the MCMC, marginal Bayesian…

  9. Is the General Self-Efficacy Scale a Reliable Measure to be used in Cross-Cultural Studies? Results from Brazil, Germany and Colombia.

    PubMed

    Damásio, Bruno F; Valentini, Felipe; Núñes-Rodriguez, Susana I; Kliem, Soeren; Koller, Sílvia H; Hinz, Andreas; Brähler, Elmar; Finck, Carolyn; Zenger, Markus

    2016-05-26

    This study evaluated cross-cultural measurement invariance for the General Self-efficacy Scale (GSES) in a large Brazilian (N = 2.394) and representative German (N = 2.046) and Colombian (N = 1.500) samples. Initially, multiple-indicators multiple-causes (MIMIC) analyses showed that sex and age were biasing items responses on the total sample (2 and 10 items, respectively). After controlling for these two covariates, a multigroup confirmatory factor analysis (MGCFA) was employed. Configural invariance was attested. However, metric invariance was not supported for five items, in a total of 10, and scalar invariance was not supported for all items. We also evaluated the differences between the latent scores estimated by two models: MIMIC and MGCFA unconstraining the non-equivalent parameters across countries. The average difference was equal to |.07| on the estimation of the latent scores, and 22.8% of the scores were biased in at least .10 standardized points. Bias effects were above the mean for the German group, which the average difference was equal to |.09|, and 33.7% of the scores were biased in at least .10. In synthesis, the GSES did not provide evidence of measurement invariance to be employed in this cross-cultural study. More than that, our results showed that even when controlling for sex and age effects, the absence of control on items parameters in the MGCFA analyses across countries would implicate in bias of the latent scores estimation, with a higher effect for the German population.

  10. Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

    PubMed Central

    2010-01-01

    Background Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula. PMID:20338031

  11. Better assessment of physical function: item improvement is neglected but essential

    PubMed Central

    2009-01-01

    Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354

  12. Better assessment of physical function: item improvement is neglected but essential.

    PubMed

    Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

    2009-01-01

    Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.

  13. Bias Correction for the Maximum Likelihood Estimate of Ability. Research Report. ETS RR-05-15

    ERIC Educational Resources Information Center

    Zhang, Jinming

    2005-01-01

    Lord's bias function and the weighted likelihood estimation method are effective in reducing the bias of the maximum likelihood estimate of an examinee's ability under the assumption that the true item parameters are known. This paper presents simulation studies to determine the effectiveness of these two methods in reducing the bias when the item…

  14. DIF Testing with an Empirical-Histogram Approximation of the Latent Density for Each Group

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2011-01-01

    This research introduces, illustrates, and tests a variation of IRT-LR-DIF, called EH-DIF-2, in which the latent density for each group is estimated simultaneously with the item parameters as an empirical histogram (EH). IRT-LR-DIF is used to evaluate the degree to which items have different measurement properties for one group of people versus…

  15. Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS).

    PubMed

    Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E

    2008-01-01

    The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.

  16. An Evaluation of One- and Three-Parameter Logistic Tailored Testing Procedures for Use with Small Item Pools.

    ERIC Educational Resources Information Center

    McKinley, Robert L.; Reckase, Mark D.

    A two-stage study was conducted to compare the ability estimates yielded by tailored testing procedures based on the one-parameter logistic (1PL) and three-parameter logistic (3PL) models. The first stage of the study employed real data, while the second stage employed simulated data. In the first stage, response data for 3,000 examinees were…

  17. PROC IRT: A SAS Procedure for Item Response Theory

    PubMed Central

    Matlock Cole, Ki; Paek, Insu

    2017-01-01

    This article reviews the procedure for item response theory (PROC IRT) procedure in SAS/STAT 14.1 to conduct item response theory (IRT) analyses of dichotomous and polytomous datasets that are unidimensional or multidimensional. The review provides an overview of available features, including models, estimation procedures, interfacing, input, and output files. A small-scale simulation study evaluates the IRT model parameter recovery of the PROC IRT procedure. The use of the IRT procedure in Statistical Analysis Software (SAS) may be useful for researchers who frequently utilize SAS for analyses, research, and teaching.

  18. Maximum likelihood estimation for life distributions with competing failure modes

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.

    1979-01-01

    Systems which are placed on test at time zero, function for a period and die at some random time were studied. Failure may be due to one of several causes or modes. The parameters of the life distribution may depend upon the levels of various stress variables the item is subject to. Maximum likelihood estimation methods are discussed. Specific methods are reported for the smallest extreme-value distributions of life. Monte-Carlo results indicate the methods to be promising. Under appropriate conditions, the location parameters are nearly unbiased, the scale parameter is slight biased, and the asymptotic covariances are rapidly approached.

  19. Language-related differential item functioning between English and German PROMIS Depression items is negligible.

    PubMed

    Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

    2017-12-01

    To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Two Simple Approaches to Overcome a Problem with the Mantel-Haenszel Statistic: Comments on Wang, Bradlow, Wainer, and Muller (2008)

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Dorans, Neil J.

    2010-01-01

    The Mantel-Haenszel (MH) procedure (Mantel and Haenszel) is a popular method for estimating and testing a common two-factor association parameter in a 2 x 2 x K table. Holland and Holland and Thayer described how to use the procedure to detect differential item functioning (DIF) for tests with dichotomously scored items. Wang, Bradlow, Wainer, and…

  1. Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

    PubMed

    Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

    2018-02-01

    Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.

  2. Linking Parameters Estimated with the Generalized Graded Unfolding Model: A Comparison of the Accuracy of Characteristic Curve Methods

    ERIC Educational Resources Information Center

    Anderson Koenig, Judith; Roberts, James S.

    2007-01-01

    Methods for linking item response theory (IRT) parameters are developed for attitude questionnaire responses calibrated with the generalized graded unfolding model (GGUM). One class of IRT linking methods derives the linking coefficients by comparing characteristic curves, and three of these methods---test characteristic curve (TCC), item…

  3. Estimation of Item Parameters and the GEM Algorithm.

    ERIC Educational Resources Information Center

    Tsutakawa, Robert K.

    The models and procedures discussed in this paper are related to those presented in Bock and Aitkin (1981), where they considered the 2-parameter probit model and approximated a normally distributed prior distribution of abilities by a finite and discrete distribution. One purpose of this paper is to clarify the nature of the general EM (GEM)…

  4. Cancer Health Literacy Test-30-Spanish (CHLT-30-DKspa), a New Spanish-Language Version of the Cancer Health Literacy Test (CHLT-30) for Spanish-Speaking Latinos.

    PubMed

    Echeverri, Margarita; Anderson, David; Nápoles, Anna María

    2016-01-01

    This article describes the adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish speakers. A cross-sectional field test of the Spanish version of the CHLT (CHLT-30-DKspa) was conducted among healthy Latinos in Louisiana. Diagonally weighted least squares was used to confirm the factor structure. Item response analysis using 2-parameter logistic estimates was used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. The mean CHLT-30-DKspa score (N = 400) was 17.13 (range = 0-30, SD = 6.65). Results confirmed a unidimensional structure, χ(2)(405) = 461.55, p = .027, comparative fit index = .993, Tucker-Lewis index = .992, root mean square error of approximation = .0180. Cronbach's alpha was .88. Items Q1-High Calorie and Q15-Tumor Spread had the lowest item-scale correlations (.148 and .288, respectively) and standardized factor loadings (.152 and .302, respectively). Items Q19-Smoking Risk, Q8-Palliative Care, and Q1-High Calorie had the highest item difficulty parameters (difficulty = 1.12, 1.21, and 2.40, respectively). Results generally support the applicability of the CHLT-30-DKspa for healthy Spanish-speaking populations, with the exception of 4 items that need to be deleted or revised and further studied: Q1, Q8, Q15, and Q19.

  5. Evaluation of the Patient-Reported Outcomes Information System (PROMIS(®)) Spanish-language physical functioning items.

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D

    2013-09-01

    To evaluate the equivalence of the PROMIS(®) physical functioning item bank by language of administration (English versus Spanish). The PROMIS(®) wave 1 English-language physical functioning bank consists of 124 items, and 114 of these were translated into Spanish. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were calculated. The IRT assumption of unidimensionality was evaluated by fitting a single-factor confirmatory factor analytic model. IRT threshold and discrimination parameters were estimated using Samejima's Graded Response Model. DIF by language of administration was evaluated. Item means ranged from 2.53 (SD = 1.36) to 4.62 (SD = 0.82). Coefficient alpha was 0.99, and item-rest correlations ranged from 0.41 to 0.89. A one-factor model fits the data well (CFI = 0.971, TLI = 0.970, and RMSEA = 0.052). The slope parameters ranged from 0.45 ("Are you able to run 10 miles?") to 4.50 ("Are you able to put on a shirt or blouse?"). The threshold parameters ranged from -1.92 ("How much do physical health problems now limit your usual physical activities (such as walking or climbing stairs)?") to 6.06 ("Are you able to run 10 miles?"). Fifty of the 114 items were flagged for DIF based on an R(2) of 0.02 or above criterion. The expected total score was higher for Spanish- than English-language respondents. English- and Spanish-speaking subjects with the same level of underlying physical function responded differently to 50 of 114 items. This study has important implications in the study of physical functioning among diverse populations.

  6. Theory-Based Parameterization of Semiotics for Measuring Pre-literacy Development

    NASA Astrophysics Data System (ADS)

    Bezruczko, N.

    2013-09-01

    A probabilistic model was applied to problem of measuring pre-literacy in young children. First, semiotic philosophy and contemporary cognition research were conceptually integrated to establish theoretical foundations for rating 14 characteristics of children's drawings and narratives (N = 120). Then ratings were transformed with a Rasch model, which estimated linear item parameter values that accounted for 79 percent of rater variance. Principle Components Analysis of item residual matrix confirmed variance remaining after item calibration was largely unsystematic. Validation analyses found positive correlations between semiotic measures and preschool literacy outcomes. Practical implications of a semiotics dimension for preschool practice were discussed.

  7. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    ERIC Educational Resources Information Center

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  8. A New Online Calibration Method Based on Lord's Bias-Correction.

    PubMed

    He, Yinhong; Chen, Ping; Li, Yong; Zhang, Shumei

    2017-09-01

    Online calibration technique has been widely employed to calibrate new items due to its advantages. Method A is the simplest online calibration method and has attracted many attentions from researchers recently. However, a key assumption of Method A is that it treats person-parameter estimates θ ^ s (obtained by maximum likelihood estimation [MLE]) as their true values θ s , thus the deviation of the estimated θ ^ s from their true values might yield inaccurate item calibration when the deviation is nonignorable. To improve the performance of Method A, a new method, MLE-LBCI-Method A, is proposed. This new method combines a modified Lord's bias-correction method (named as maximum likelihood estimation-Lord's bias-correction with iteration [MLE-LBCI]) with the original Method A in an effort to correct the deviation of θ ^ s which may adversely affect the item calibration precision. Two simulation studies were carried out to explore the performance of both MLE-LBCI and MLE-LBCI-Method A under several scenarios. Simulation results showed that MLE-LBCI could make a significant improvement over the ML ability estimates, and MLE-LBCI-Method A did outperform Method A in almost all experimental conditions.

  9. HIV/AIDS knowledge among men who have sex with men: applying the item response theory.

    PubMed

    Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland

    2014-04-01

    To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.

  10. Cancer Health Literacy Test-30-Spanish (CHLT-30-DKspa), a new Spanish- language version of the Cancer Health Literacy Test (CHLT-30) for Spanish-speaking Latinos

    PubMed Central

    Echeverri, Margarita; Anderson, David; Nápoles, Anna María

    2016-01-01

    Objective Describe adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish-speakers. Methods Cross-sectional field test of the CHLT Spanish version (CHLT-30-DKspa) among healthy Latinos in Louisiana. Diagonally Weighted Least Squares were used to confirm the factor structure. Item-Response Analysis using 2-parameter logistic estimates were used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. Results Mean CHLT-30-DKspa score (N=400) was 17.13 (range 0 to 30; SD 6.65). Results confirmed a unidimensional structure (X2[405] =461.55, p=.027, CFI=.993; TLI=.992, RMSEA=.0180). Cronbach's alpha was 0.88. Items Q1-High calorie and Q15-Tumor spread had the lowest item-scale correlations (.148 and .288) and standardized factor loadings (.152 and .302). Items Q1-High Calories, Q8-Palliative Care, and Q19-Smoking Risk had the highest item-difficulty parameters (diff=1.12, 1.21, and 2.40). Conclusions Results generally supported the applicability of the CHLT-30-DKspa for Spanish-speaking healthy populations, with the exception of four items that need to be deleted or revised and further studied Q1, Q8, Q15, and Q19). Practical Implications The CHLT-30-DKspa can be used to assess cancer health literacy among Spanish-speaking populations to advance research on cancer health literacy and outcomes. PMID:27043760

  11. On the Complexity of Item Response Theory Models.

    PubMed

    Bonifay, Wes; Cai, Li

    2017-01-01

    Complexity in item response theory (IRT) has traditionally been quantified by simply counting the number of freely estimated parameters in the model. However, complexity is also contingent upon the functional form of the model. We examined four popular IRT models-exploratory factor analytic, bifactor, DINA, and DINO-with different functional forms but the same number of free parameters. In comparison, a simpler (unidimensional 3PL) model was specified such that it had 1 more parameter than the previous models. All models were then evaluated according to the minimum description length principle. Specifically, each model was fit to 1,000 data sets that were randomly and uniformly sampled from the complete data space and then assessed using global and item-level fit and diagnostic measures. The findings revealed that the factor analytic and bifactor models possess a strong tendency to fit any possible data. The unidimensional 3PL model displayed minimal fitting propensity, despite the fact that it included an additional free parameter. The DINA and DINO models did not demonstrate a proclivity to fit any possible data, but they did fit well to distinct data patterns. Applied researchers and psychometricians should therefore consider functional form-and not goodness-of-fit alone-when selecting an IRT model.

  12. Item response theory analysis of the Pain Self-Efficacy Questionnaire.

    PubMed

    Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

    2017-01-01

    The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.

  13. A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

    ERIC Educational Resources Information Center

    Guo, Rui; Zheng, Yi; Chang, Hua-Hua

    2015-01-01

    An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…

  14. A Comparison of Kernel Equating and Traditional Equipercentile Equating Methods and the Parametric Bootstrap Methods for Estimating Standard Errors in Equipercentile Equating

    ERIC Educational Resources Information Center

    Choi, Sae Il

    2009-01-01

    This study used simulation (a) to compare the kernel equating method to traditional equipercentile equating methods under the equivalent-groups (EG) design and the nonequivalent-groups with anchor test (NEAT) design and (b) to apply the parametric bootstrap method for estimating standard errors of equating. A two-parameter logistic item response…

  15. Rasch Measurement of Collaborative Problem Solving in an Online Environment.

    PubMed

    Harding, Susan-Marie E; Griffin, Patrick E

    2016-01-01

    This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.

  16. Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework.

    PubMed

    Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A

    2006-11-01

    To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.

  17. The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

    ERIC Educational Resources Information Center

    Lee, Wooyeol; Cho, Sun-Joo

    2017-01-01

    Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

  18. A semi-parametric within-subject mixture approach to the analyses of responses and response times.

    PubMed

    Molenaar, Dylan; Bolsinova, Maria; Vermunt, Jeroen K

    2018-05-01

    In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach. © 2017 The British Psychological Society.

  19. The Importance of Isomorphism for Conclusions about Homology: A Bayesian Multilevel Structural Equation Modeling Approach with Ordinal Indicators.

    PubMed

    Guenole, Nigel

    2016-01-01

    We describe a Monte Carlo study examining the impact of assuming item isomorphism (i.e., equivalent construct meaning across levels of analysis) on conclusions about homology (i.e., equivalent structural relations across levels of analysis) under varying degrees of non-isomorphism in the context of ordinal indicator multilevel structural equation models (MSEMs). We focus on the condition where one or more loadings are higher on the between level than on the within level to show that while much past research on homology has ignored the issue of psychometric isomorphism, psychometric isomorphism is in fact critical to valid conclusions about homology. More specifically, when a measurement model with non-isomorphic items occupies an exogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the within level exogenous latent variance is under-estimated leading to over-estimation of the within level structural coefficient, while the between level exogenous latent variance is overestimated leading to underestimation of the between structural coefficient. When a measurement model with non-isomorphic items occupies an endogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the endogenous within level latent variance is under-estimated leading to under-estimation of the within level structural coefficient while the endogenous between level latent variance is over-estimated leading to over-estimation of the between level structural coefficient. The innovative aspect of this article is demonstrating that even minor violations of psychometric isomorphism render claims of homology untenable. We also show that posterior predictive p-values for ordinal indicator Bayesian MSEMs are insensitive to violations of isomorphism even when they lead to severely biased within and between level structural parameters. We highlight conditions where poor estimation of even correctly specified models rules out empirical examination of isomorphism and homology without taking precautions, for instance, larger Level-2 sample sizes, or using informative priors.

  20. The Importance of Isomorphism for Conclusions about Homology: A Bayesian Multilevel Structural Equation Modeling Approach with Ordinal Indicators

    PubMed Central

    Guenole, Nigel

    2016-01-01

    We describe a Monte Carlo study examining the impact of assuming item isomorphism (i.e., equivalent construct meaning across levels of analysis) on conclusions about homology (i.e., equivalent structural relations across levels of analysis) under varying degrees of non-isomorphism in the context of ordinal indicator multilevel structural equation models (MSEMs). We focus on the condition where one or more loadings are higher on the between level than on the within level to show that while much past research on homology has ignored the issue of psychometric isomorphism, psychometric isomorphism is in fact critical to valid conclusions about homology. More specifically, when a measurement model with non-isomorphic items occupies an exogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the within level exogenous latent variance is under-estimated leading to over-estimation of the within level structural coefficient, while the between level exogenous latent variance is overestimated leading to underestimation of the between structural coefficient. When a measurement model with non-isomorphic items occupies an endogenous position in a multilevel structural model and the non-isomorphism of these items is not modeled, the endogenous within level latent variance is under-estimated leading to under-estimation of the within level structural coefficient while the endogenous between level latent variance is over-estimated leading to over-estimation of the between level structural coefficient. The innovative aspect of this article is demonstrating that even minor violations of psychometric isomorphism render claims of homology untenable. We also show that posterior predictive p-values for ordinal indicator Bayesian MSEMs are insensitive to violations of isomorphism even when they lead to severely biased within and between level structural parameters. We highlight conditions where poor estimation of even correctly specified models rules out empirical examination of isomorphism and homology without taking precautions, for instance, larger Level-2 sample sizes, or using informative priors. PMID:26973580

  1. A Solution to Separation and Multicollinearity in Multiple Logistic Regression

    PubMed Central

    Shen, Jianzhao; Gao, Sujuan

    2010-01-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286

  2. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    PubMed

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  3. Immediate list recall as a measure of short-term episodic memory: insights from the serial position effect and item response theory.

    PubMed

    Gavett, Brandon E; Horwitz, Julie E

    2012-03-01

    The serial position effect shows that two interrelated cognitive processes underlie immediate recall of a supraspan word list. The current study used item response theory (IRT) methods to determine whether the serial position effect poses a threat to the construct validity of immediate list recall as a measure of verbal episodic memory. Archival data were obtained from a national sample of 4,212 volunteers aged 28-84 in the Midlife Development in the United States study. Telephone assessment yielded item-level data for a single immediate recall trial of the Rey Auditory Verbal Learning Test (RAVLT). Two parameter logistic IRT procedures were used to estimate item parameters and the Q(1) statistic was used to evaluate item fit. A two-dimensional model better fit the data than a unidimensional model, supporting the notion that list recall is influenced by two underlying cognitive processes. IRT analyses revealed that 4 of the 15 RAVLT items (1, 12, 14, and 15) were misfit (p < .05). Item characteristic curves for items 14 and 15 decreased monotonically, implying an inverse relationship between the ability level and the probability of recall. Elimination of the four misfit items provided better fit to the data and met necessary IRT assumptions. Performance on a supraspan list learning test is influenced by multiple cognitive abilities; failure to account for the serial position of words decreases the construct validity of the test as a measure of episodic memory and may provide misleading results. IRT methods can ameliorate these problems and improve construct validity.

  4. Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

    ERIC Educational Resources Information Center

    Kim, Sooyeon; Livingston, Samuel A.

    2017-01-01

    The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

  5. 40 CFR 63.4490 - What emission limits must I meet?

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...) National Emission Standards for Hazardous Air Pollutants for Surface Coating of Plastic Parts and Products... estimate the relative mass of coating solids used from parameters other than coating consumption and mass solids content (e.g., design specifications for the parts or products coated and the number of items...

  6. 40 CFR 63.4490 - What emission limits must I meet?

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...) National Emission Standards for Hazardous Air Pollutants for Surface Coating of Plastic Parts and Products... estimate the relative mass of coating solids used from parameters other than coating consumption and mass solids content (e.g., design specifications for the parts or products coated and the number of items...

  7. 40 CFR 63.4490 - What emission limits must I meet?

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...) National Emission Standards for Hazardous Air Pollutants for Surface Coating of Plastic Parts and Products... estimate the relative mass of coating solids used from parameters other than coating consumption and mass solids content (e.g., design specifications for the parts or products coated and the number of items...

  8. Using Generalizability Analysis to Estimate Parameters for Anatomy Assessments: A Multi-institutional Study

    ERIC Educational Resources Information Center

    Byram, Jessica N.; Seifert, Mark F.; Brooks, William S.; Fraser-Cotlin, Laura; Thorp, Laura E.; Williams, James M.; Wilson, Adam B.

    2017-01-01

    With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in…

  9. Test Operations Procedure (TOP) 10-2-400 Open End Compressed Gas Driven Shock Tube

    DTIC Science & Technology

    gas-driven shock tube. Procedures are provided for instrumentation, test item positioning, estimation of key test parameters, operation of the shock...tube, data collection, and reporting. The procedures in this document are based on the use of helium gas and Mylar film diaphragms.

  10. ROLE OF LABORATORY SAMPLING DEVICES AND LABORATORY SUBSAMPLING METHODS IN OPTIMIZING REPRESENTATIVENESS STRATEGIES

    EPA Science Inventory

    Sampling is the act of selecting items from a specified population in order to estimate the parameters of that population (e.g., selecting soil samples to characterize the properties at an environmental site). Sampling occurs at various levels and times throughout an environmenta...

  11. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary.

    PubMed

    Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R

    2015-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.

  12. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    PubMed Central

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2016-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568

  13. A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire Developed and Validated for Multiple Sclerosis: The MusiQoL-MCAT.

    PubMed

    Michel, Pierre; Baumstarck, Karine; Ghattas, Badih; Pelletier, Jean; Loundou, Anderson; Boucekine, Mohamed; Auquier, Pascal; Boyer, Laurent

    2016-04-01

    The aim was to develop a multidimensional computerized adaptive short-form questionnaire, the MusiQoL-MCAT, from a fixed-length QoL questionnaire for multiple sclerosis.A total of 1992 patients were enrolled in this international cross-sectional study. The development of the MusiQoL-MCAT was based on the assessment of between-items MIRT model fit followed by real-data simulations. The MCAT algorithm was based on Bayesian maximum a posteriori estimation of latent traits and Kullback-Leibler information item selection. We examined several simulations based on a fixed number of items. Accuracy was assessed using correlations (r) between initial IRT scores and MCAT scores. Precision was assessed using the standard error measurement (SEM) and the root mean square error (RMSE).The multidimensional graded response model was used to estimate item parameters and IRT scores. Among the MCAT simulations, the 16-item version of the MusiQoL-MCAT was selected because the accuracy and precision became stable with 16 items with satisfactory levels (r ≥ 0.9, SEM ≤ 0.55, and RMSE ≤ 0.3). External validity of the MusiQoL-MCAT was satisfactory.The MusiQoL-MCAT presents satisfactory properties and can individually tailor QoL assessment to each patient, making it less burdensome to patients and better adapted for use in clinical practice.

  14. A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire Developed and Validated for Multiple Sclerosis

    PubMed Central

    Michel, Pierre; Baumstarck, Karine; Ghattas, Badih; Pelletier, Jean; Loundou, Anderson; Boucekine, Mohamed; Auquier, Pascal; Boyer, Laurent

    2016-01-01

    Abstract The aim was to develop a multidimensional computerized adaptive short-form questionnaire, the MusiQoL-MCAT, from a fixed-length QoL questionnaire for multiple sclerosis. A total of 1992 patients were enrolled in this international cross-sectional study. The development of the MusiQoL-MCAT was based on the assessment of between-items MIRT model fit followed by real-data simulations. The MCAT algorithm was based on Bayesian maximum a posteriori estimation of latent traits and Kullback–Leibler information item selection. We examined several simulations based on a fixed number of items. Accuracy was assessed using correlations (r) between initial IRT scores and MCAT scores. Precision was assessed using the standard error measurement (SEM) and the root mean square error (RMSE). The multidimensional graded response model was used to estimate item parameters and IRT scores. Among the MCAT simulations, the 16-item version of the MusiQoL-MCAT was selected because the accuracy and precision became stable with 16 items with satisfactory levels (r ≥ 0.9, SEM ≤ 0.55, and RMSE ≤ 0.3). External validity of the MusiQoL-MCAT was satisfactory. The MusiQoL-MCAT presents satisfactory properties and can individually tailor QoL assessment to each patient, making it less burdensome to patients and better adapted for use in clinical practice. PMID:27057832

  15. Psychometric Consequences of Subpopulation Item Parameter Drift

    ERIC Educational Resources Information Center

    Huggins-Manley, Anne Corinne

    2017-01-01

    This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…

  16. An Investigation of Sample Size Splitting on ATFIND and DIMTEST

    ERIC Educational Resources Information Center

    Socha, Alan; DeMars, Christine E.

    2013-01-01

    Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

  17. Calibration of Response Data Using MIRT Models with Simple and Mixed Structures

    ERIC Educational Resources Information Center

    Zhang, Jinming

    2012-01-01

    It is common to assume during a statistical analysis of a multiscale assessment that the assessment is composed of several unidimensional subtests or that it has simple structure. Under this assumption, the unidimensional and multidimensional approaches can be used to estimate item parameters. These two approaches are equivalent in parameter…

  18. Computerized Adaptive Testing with Item Clones. Research Report.

    ERIC Educational Resources Information Center

    Glas, Cees A. W.; van der Linden, Wim J.

    To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…

  19. Measuring Constructs in Family Science: How Can Item Response Theory Improve Precision and Validity?

    PubMed Central

    Gordon, Rachel A.

    2014-01-01

    This article provides family scientists with an understanding of contemporary measurement perspectives and the ways in which item response theory (IRT) can be used to develop measures with desired evidence of precision and validity for research uses. The article offers a nontechnical introduction to some key features of IRT, including its orientation toward locating items along an underlying dimension and toward estimating precision of measurement for persons with different levels of that same construct. It also offers a didactic example of how the approach can be used to refine conceptualization and operationalization of constructs in the family sciences, using data from the National Longitudinal Survey of Youth 1979 (n = 2,732). Three basic models are considered: (a) the Rasch and (b) two-parameter logistic models for dichotomous items and (c) the Rating Scale Model for multicategory items. Throughout, the author highlights the potential for researchers to elevate measurement to a level on par with theorizing and testing about relationships among constructs. PMID:25663714

  20. Sampling Variances and Covariances of Parameter Estimates in Item Response Theory.

    DTIC Science & Technology

    1982-08-01

    substituting (15) into (16) and solving for k and K k = b b1 - o K , (17)k where b and b are means for m and r items, respectively. To find the variance...C5 , and C12 were treated as known. We find that the standard errors of B1 to B5 are increased drastically by ignorance of C 1 to C5 ; all...ERIC Facilltv-Acquisitlons Davie Hall 013A 4833 Rugby Avenue Chapel Hill, NC 27514 Bethesda, MD 20014 -7- Dr. A. J. Eschenbrenner 1 Dr. John R

  1. The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency.

    PubMed

    Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E

    2014-05-01

    To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.

  2. Linking Item Parameters to a Base Scale

    ERIC Educational Resources Information Center

    Kang, Taehoon; Petersen, Nancy S.

    2012-01-01

    This paper compares three methods of item calibration--concurrent calibration, separate calibration with linking, and fixed item parameter calibration--that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord in "Appl Psychol Measure"…

  3. A Note on the Item Information Function of the Four-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Magis, David

    2013-01-01

    This article focuses on four-parameter logistic (4PL) model as an extension of the usual three-parameter logistic (3PL) model with an upper asymptote possibly different from 1. For a given item with fixed item parameters, Lord derived the value of the latent ability level that maximizes the item information function under the 3PL model. The…

  4. Explore the Usefulness of Person-Fit Analysis on Large-Scale Assessment

    ERIC Educational Resources Information Center

    Cui, Ying; Mousavi, Amin

    2015-01-01

    The current study applied the person-fit statistic, l[subscript z], to data from a Canadian provincial achievement test to explore the usefulness of conducting person-fit analysis on large-scale assessments. Item parameter estimates were compared before and after the misfitting student responses, as identified by l[subscript z], were removed. The…

  5. The Impact of Different Missing Data Handling Methods on DINA Model

    ERIC Educational Resources Information Center

    Sünbül, Seçil Ömür

    2018-01-01

    In this study, it was aimed to investigate the impact of different missing data handling methods on DINA model parameter estimation and classification accuracy. In the study, simulated data were used and the data were generated by manipulating the number of items and sample size. In the generated data, two different missing data mechanisms…

  6. Ordinary Least Squares Estimation of Parameters in Exploratory Factor Analysis with Ordinal Data

    ERIC Educational Resources Information Center

    Lee, Chun-Ting; Zhang, Guangjian; Edwards, Michael C.

    2012-01-01

    Exploratory factor analysis (EFA) is often conducted with ordinal data (e.g., items with 5-point responses) in the social and behavioral sciences. These ordinal variables are often treated as if they were continuous in practice. An alternative strategy is to assume that a normally distributed continuous variable underlies each ordinal variable.…

  7. The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Liu, Chen-Wei; Wu, Shiu-Lien

    2013-01-01

    The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs…

  8. Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.

    PubMed

    Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C

    2015-05-01

    To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.

  9. Linking Item Parameters to a Base Scale. ACT Research Report Series, 2009-2

    ERIC Educational Resources Information Center

    Kang, Taehoon; Petersen, Nancy S.

    2009-01-01

    This paper compares three methods of item calibration--concurrent calibration, separate calibration with linking, and fixed item parameter calibration--that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord (1983) characteristic curve method…

  10. Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations.

    PubMed

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert

    2016-01-01

    Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta ( θ ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness.

  11. Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations

    PubMed Central

    Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert

    2017-01-01

    Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta (θ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Conclusions Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness. PMID:28983449

  12. Item selection via Bayesian IRT models.

    PubMed

    Arima, Serena

    2015-02-10

    With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.

  13. Just-in-time adaptive disturbance estimation for run-to-run control of photolithography overlay

    NASA Astrophysics Data System (ADS)

    Firth, Stacy K.; Campbell, W. J.; Edgar, Thomas F.

    2002-07-01

    One of the main challenges to implementations of traditional run-to-run control in the semiconductor industry is a high mix of products in a single factory. To address this challenge, Just-in-time Adaptive Disturbance Estimation (JADE) has been developed. JADE uses a recursive weighted least-squares parameters estimation technique to identify the contributions to variation that are dependent on product, as well as the tools on which the lot was processed. As applied to photolithography overlay, JADE assigns these sources of variation to contributions from the context items: tool, product, reference tool, and reference reticle. Simulations demonstrate that JADE effectively identifies disturbances in contributing context items when the variations are known to be additive. The superior performance of JADE over traditional EWMA is also shown in these simulations. The results of application of JADE to data from a high mix production facility show that JADE still performs better than EWMA, even with the challenges of a real manufacturing environment.

  14. Measuring the quality of life in hypertension according to Item Response Theory.

    PubMed

    Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; Andrade, Dalton Francisco de; Barbetta, Pedro Alberto; Souza, Ana Célia Caetano de; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

    2017-05-04

    To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL - Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. Analisar o Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL) por meio da Teoria da Resposta ao Item. Estudo analítico realizado com 712 pessoas com hipertensão arterial atendidas em 13 unidades de atenção primária em saúde de Fortaleza, CE, em 2015. As etapas da análise pela Teoria da Resposta ao Item foram: avaliação da dimensionalidade, estimação dos parâmetros dos itens e construção da escala. O estudo da dimensionalidade foi realizado sobre a matriz de correlação policórica e análise fatorial confirmatória. Para a estimação dos parâmetros dos itens, foi utilizado o Modelo de Resposta Gradual de Samejima. As análises foram conduzidas no software livre R com o auxílio dos pacotes psych e mirt. A análise permitiu a visualização dos parâmetros dos itens e suas contribuições individuais na mensuração do traço latente, gerando mais informação, permitindo a construção de uma escala com um modelo interpretativo que demonstra a evolução da piora da qualidade de vida em cinco níveis. Quanto aos parâmetros dos itens, houve bom desempenho daqueles referentes ao estado somático, pois apresentaram melhor poder de discriminar os indivíduos com pior qualidade de vida. Os itens relacionados ao estado mental foram os que contribuíram com menor quantidade de informação psicométrica no MINICHAL. Conclui-se que o instrumento é indicado para a identificação da deterioração da qualidade de vida em hipertensão arterial. A análise do MINICHAL pela Teoria da Resposta ao Item permitiu identificar novas facetas desse instrumento ainda não abordadas em estudos anteriores.

  15. A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

    PubMed

    van Rijn, Peter W; Ali, Usama S

    2017-05-01

    We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.

  16. The Extent, Causes, and Importance of Context Effects on Item Parameters for Two Latent-Trait Models.

    ERIC Educational Resources Information Center

    Yen, Wendy M.

    The extent, causes, and importance of context effects on item parameters for one- and three-parameter latent-trait models were examined. Items were taken from the California Achievement Tests Reading Comprehension and Mathematics Concepts and Applications subtests. The reading items were administered to 1,678 fourth-grade students, and the…

  17. A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

    PubMed

    Massof, Robert W

    2014-10-01

    A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  18. Methods for estimating comparable prevalence rates of food insecurity experienced by adults in 147 countries and areas

    NASA Astrophysics Data System (ADS)

    Nord, Mark; Cafiero, Carlo; Viviani, Sara

    2016-11-01

    Statistical methods based on item response theory are applied to experiential food insecurity survey data from 147 countries, areas, and territories to assess data quality and develop methods to estimate national prevalence rates of moderate and severe food insecurity at equal levels of severity across countries. Data were collected from nationally representative samples of 1,000 adults in each country. A Rasch-model-based scale was estimated for each country, and data were assessed for consistency with model assumptions. A global reference scale was calculated based on item parameters from all countries. Each country's scale was adjusted to the global standard, allowing for up to 3 of the 8 scale items to be considered unique in that country if their deviance from the global standard exceeded a set tolerance. With very few exceptions, data from all countries were sufficiently consistent with model assumptions to constitute reasonably reliable measures of food insecurity and were adjustable to the global standard with fair confidence. National prevalence rates of moderate-or-severe food insecurity assessed over a 12-month recall period ranged from 3 percent to 92 percent. The correlations of national prevalence rates with national income, health, and well-being indicators provide external validation of the food security measure.

  19. Weighted Maximum-a-Posteriori Estimation in Tests Composed of Dichotomous and Polytomous Items

    ERIC Educational Resources Information Center

    Sun, Shan-Shan; Tao, Jian; Chang, Hua-Hua; Shi, Ning-Zhong

    2012-01-01

    For mixed-type tests composed of dichotomous and polytomous items, polytomous items often yield more information than dichotomous items. To reflect the difference between the two types of items and to improve the precision of ability estimation, an adaptive weighted maximum-a-posteriori (WMAP) estimation is proposed. To evaluate the performance of…

  20. Modeling the Psychometric Properties of Complex Performance Assessment Tasks Using Confirmatory Factor Analysis: A Multistage Model for Calibrating Tasks

    ERIC Educational Resources Information Center

    Kahraman, Nilufer; De Champlain, Andre; Raymond, Mark

    2012-01-01

    Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very…

  1. catcher: A Software Program to Detect Answer Copying in Multiple-Choice Tests Based on Nominal Response Model

    ERIC Educational Resources Information Center

    Kalender, Ilker

    2012-01-01

    catcher is a software program designed to compute the [omega] index, a common statistical index for the identification of collusions (cheating) among examinees taking an educational or psychological test. It requires (a) responses and (b) ability estimations of individuals, and (c) item parameters to make computations and outputs the results of…

  2. Planned Missing Designs to Optimize the Efficiency of Latent Growth Parameter Estimates

    ERIC Educational Resources Information Center

    Rhemtulla, Mijke; Jia, Fan; Wu, Wei; Little, Todd D.

    2014-01-01

    We examine the performance of planned missing (PM) designs for correlated latent growth curve models. Using simulated data from a model where latent growth curves are fitted to two constructs over five time points, we apply three kinds of planned missingness. The first is item-level planned missingness using a three-form design at each wave such…

  3. Using generalizability analysis to estimate parameters for anatomy assessments: A multi-institutional study.

    PubMed

    Byram, Jessica N; Seifert, Mark F; Brooks, William S; Fraser-Cotlin, Laura; Thorp, Laura E; Williams, James M; Wilson, Adam B

    2017-03-01

    With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in the context of gross anatomy and histology. A generalizability analysis was conducted on gross anatomy and histology written and practical examination items that were administered in a discipline-based format at Indiana University School of Medicine and in an integrated fashion at the University of Alabama School of Medicine and Rush University Medical College. Examination items were analyzed using a partially nested design s×(i:o) in which items were nested within occasions (i:o) and crossed with students (s). A reliability standard of 0.80 was used to determine the minimum number of items needed across examinations (occasions) to make reliable and informed decisions about students' competence in anatomical knowledge. Decision study plots are presented to demonstrate how the number of items per examination influences the reliability of each administered assessment. Using the example of a curriculum that assesses gross anatomy knowledge over five summative written and practical examinations, the results of the decision study estimated that 30 and 25 items would be needed on each written and practical examination to reach a reliability of 0.80, respectively. This study is particularly relevant to educators who may question whether the amount of anatomy content assessed in multidisciplinary evaluations is sufficient for making judgments about the anatomical aptitude of students. Anat Sci Educ 10: 109-119. © 2016 American Association of Anatomists. © 2016 American Association of Anatomists.

  4. Correspondence of verbal descriptor and numeric rating scales for pain intensity: an item response theory calibration.

    PubMed

    Edelen, Maria Orlando; Saliba, Debra

    2010-07-01

    Assessing pain intensity in older adults is critical and challenging. There is debate about the most effective way to ask older adults to describe their pain severity, and clinicians vary in their preferred approaches, making comparison of pain intensity scores across settings difficult. A total of 3,676 residents from 71 community nursing homes across eight states were asked about pain presence. The 1,960 residents who reported pain within the past 5 days (53% of total, 70% female; age: M = 77.9, SD = 12.4) were included in analyses. Those who reported pain were also asked to provide a rating of pain intensity using either a verbal descriptor scale (VDS; mild, moderate, severe, and very severe and horrible), a numeric rating scale (NRS; 0 = no pain to 10 = worst pain imaginable), or both. We used item response theory (IRT) methods to identify the correspondence between the VDS and the NRS response options by estimating item parameters for these and five additional pain items. The sample reported moderate amounts of pain on average. Examination of the IRT location parameters for the pain intensity items indicated the following approximate correspondence: VDS mild approximately NRS 1-4, VDS moderate approximately NRS 5-7, VDS severe approximately NRS 8-9, and VDS very severe, horrible approximately NRS 10. This IRT calibration provides a crosswalk between the two response scales so that either can be used in practice depending on the preference of the clinician and respondent.

  5. Optimal Bayesian Adaptive Design for Test-Item Calibration.

    PubMed

    van der Linden, Wim J; Ren, Hao

    2015-06-01

    An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.

  6. Influence of Context on Item Parameters in Forced-Choice Personality Assessments

    ERIC Educational Resources Information Center

    Lin, Yin; Brown, Anna

    2017-01-01

    A fundamental assumption in computerized adaptive testing is that item parameters are invariant with respect to context--items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the…

  7. Robust Measurement via A Fused Latent and Graphical Item Response Theory Model.

    PubMed

    Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang

    2018-03-12

    Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.

  8. Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

    PubMed Central

    Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.

    2015-01-01

    Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965

  9. Overview and current management of computerized adaptive testing in licensing/certification examinations.

    PubMed

    Seo, Dong Gi

    2017-01-01

    Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees' ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.

  10. Improving measurement of injection drug risk behavior using item response theory.

    PubMed

    Janulis, Patrick

    2014-03-01

    Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.

  11. Overview and current management of computerized adaptive testing in licensing/certification examinations

    PubMed Central

    2017-01-01

    Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations. PMID:28811394

  12. Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

    PubMed

    Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

    2017-09-16

    This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.

  13. SAMICS support study. Volume 1: Cost account catalog

    NASA Technical Reports Server (NTRS)

    1977-01-01

    The Jet Propulsion Laboratory (JPL) is examining the feasibility of a new industry to produce photovoltaic solar energy collectors similar to those used on spacecraft. To do this, a standardized costing procedure was developed. The Solar Array Manufacturing Industry Costing Standards (SAMICS) support study supplies the following information: (1) SAMICS critique; (2) Standard data base--cost account structure, expense item costs, inflation rates, indirect requirements relationships, and standard financial parameter values; (3) Facilities capital cost estimating relationships; (4) Conceptual plant designs; (5) Construction lead times; (6) Production start-up times; (7) Manufacturing price estimates.

  14. The Estimation of the IRT Reliability Coefficient and Its Lower and Upper Bounds, with Comparisons to CTT Reliability Statistics

    ERIC Educational Resources Information Center

    Kim, Seonghoon; Feldt, Leonard S.

    2010-01-01

    The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient rho[subscript XX'] as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two…

  15. The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew

    2013-01-01

    A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…

  16. Toward DSM-V: mapping the alcohol use disorder continuum in college students.

    PubMed

    Hagman, Brett T; Cohn, Amy M

    2011-11-01

    The present study examined the dimensionality of DSM-IV Alcohol Use Disorder (AUD) criteria using Item Response Theory (IRT) methods and tested the validity of the proposed DSM-V AUD guidelines in a sample of college students. Participants were 396 college students who reported any alcohol use in the past 90 days and were aged 18 years or older. We conducted factor analyses to determine whether a one- or two-factor model provided a better fit to the AUD criteria. IRT analyses estimated item severity and discrimination parameters for each criterion. Multivariate analyses examined differences among the DSM-V diagnostic cut-off (AUD vs. No AUD) and severity qualifiers (no diagnosis, moderate, severe) across several validating measures of alcohol use. A dominant single-factor model provided the best fit to the AUD criteria. IRT analyses indicated that abuse and dependence criteria were intermixed along the latent continuum. The "legal problems" criterion had the highest severity parameter and the tolerance criterion had the lowest severity parameter. The abuse criterion "social/interpersonal problems" and dependence criterion "activities to obtain alcohol" had the highest discrimination parameter estimates. Multivariate analysis indicated that the DSM-V cut-off point, and severity qualifier groups were distinguishable on several measures of alcohol consumption, drinking consequences, and drinking restraint. Findings suggest that the AUD criteria reflect a latent variable that represents a primary disorder and provide support for the proposed DSM-V AUD criteria in a sample of college students. Continued research in other high-risk samples of college students is needed. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  17. Combining computer adaptive testing technology with cognitively diagnostic assessment.

    PubMed

    McGlohen, Meghan; Chang, Hua-Hua

    2008-08-01

    A major advantage of computerized adaptive testing (CAT) is that it allows the test to home in on an examinee's ability level in an interactive manner. The aim of the new area of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help. The goal of this study was to combine the benefit of specific feedback from cognitively diagnostic assessment with the advantages of CAT. In this study, three approaches to combining these were investigated: (1) item selection based on the traditional ability level estimate (theta), (2) item selection based on the attribute mastery feedback provided by cognitively diagnostic assessment (alpha), and (3) item selection based on both the traditional ability level estimate (theta) and the attribute mastery feedback provided by cognitively diagnostic assessment (alpha). The results from these three approaches were compared for theta estimation accuracy, attribute mastery estimation accuracy, and item exposure control. The theta- and alpha-based condition outperformed the alpha-based condition regarding theta estimation, attribute mastery pattern estimation, and item exposure control. Both the theta-based condition and the theta- and alpha-based condition performed similarly with regard to theta estimation, attribute mastery estimation, and item exposure control, but the theta- and alpha-based condition has an additional advantage in that it uses the shadow test method, which allows the administrator to incorporate additional constraints in the item selection process, such as content balancing, item type constraints, and so forth, and also to select items on the basis of both the current theta and alpha estimates, which can be built on top of existing 3PL testing programs.

  18. A New Model for Acquiescence at the Interface of Psychometrics and Cognitive Psychology.

    PubMed

    Plieninger, Hansjörg; Heck, Daniel W

    2018-05-29

    When measuring psychological traits, one has to consider that respondents often show content-unrelated response behavior in answering questionnaires. To disentangle the target trait and two such response styles, extreme responding and midpoint responding, Böckenholt ( 2012a ) developed an item response model based on a latent processing tree structure. We propose a theoretically motivated extension of this model to also measure acquiescence, the tendency to agree with both regular and reversed items. Substantively, our approach builds on multinomial processing tree (MPT) models that are used in cognitive psychology to disentangle qualitatively distinct processes. Accordingly, the new model for response styles assumes a mixture distribution of affirmative responses, which are either determined by the underlying target trait or by acquiescence. In order to estimate the model parameters, we rely on Bayesian hierarchical estimation of MPT models. In simulations, we show that the model provides unbiased estimates of response styles and the target trait, and we compare the new model and Böckenholt's model in a recovery study. An empirical example from personality psychology is used for illustrative purposes.

  19. Psychometric properties of the neck disability index amongst patients with chronic neck pain using item response theory.

    PubMed

    Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri

    2017-05-13

    The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.

  20. Item Vector Plots for the Multidimensional Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Bryant, Damon; Davis, Larry

    2011-01-01

    This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…

  1. Random Item IRT Models

    ERIC Educational Resources Information Center

    De Boeck, Paul

    2008-01-01

    It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters…

  2. Assessing the Performance of Classical Test Theory Item Discrimination Estimators in Monte Carlo Simulations

    ERIC Educational Resources Information Center

    Bazaldua, Diego A. Luna; Lee, Young-Sun; Keller, Bryan; Fellers, Lauren

    2017-01-01

    The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT:…

  3. Nonparametric Item Response Curve Estimation with Correction for Measurement Error

    ERIC Educational Resources Information Center

    Guo, Hongwen; Sinharay, Sandip

    2011-01-01

    Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…

  4. Conditional statistical inference with multistage testing designs.

    PubMed

    Zwitser, Robert J; Maris, Gunter

    2015-03-01

    In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.

  5. Content Validity and Psychometric Characteristics of the "Knowledge about Older Patients Quiz" for Nurses Using Item Response Theory.

    PubMed

    Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J

    2016-11-01

    To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.

  6. A Comparison of the One-and Three-Parameter Logistic Models on Measures of Test Efficiency.

    ERIC Educational Resources Information Center

    Benson, Jeri

    Two methods of item selection were used to select sets of 40 items from a 50-item verbal analogies test, and the resulting item sets were compared for relative efficiency. The BICAL program was used to select the 40 items having the best mean square fit to the one parameter logistic (Rasch) model. The LOGIST program was used to select the 40 items…

  7. Exploiting Auxiliary Information about Examinees in the Estimation of Item Parameters.

    DTIC Science & Technology

    1986-05-01

    Research. The author is grateful to Kathleen Sheehan and Martha Stocking for their comments and suggestions. a, • ... Am -. ,** . Exploiting...Research Laboratory Dr. Hans Crombag 103 South Mathews Street University of Leyden Urbana, IL 61801 Education Research Center Boerhaavelaan 2 Dr. Susan...Embretson 23-4 EN Leyden University of Kansas The NETHERLANDS Psychology Department Lawrence, KS 66045 CTR/McGraw-Hill Library 2500 Garden Road ERIC

  8. Estimating Parameters in the Generalized Graded Unfolding Model: Sensitivity to the Prior Distribution Assumption and the Number of Quadrature Points Used.

    ERIC Educational Resources Information Center

    Roberts, James S.; Donoghue, John R.; Laughlin, James E.

    The generalized graded unfolding model (J. Roberts, J. Donoghue, and J. Laughlin, 1998, 1999) is an item response theory model designed to unfold polytomous responses. The model is based on a proximity relation that postulates higher levels of expected agreement with a given statement to the extent that a respondent is located close to the…

  9. Statistical Bias in Maximum Likelihood Estimators of Item Parameters.

    DTIC Science & Technology

    1982-04-01

    34 a> E r’r~e r ,C Ie I# ne,..,.rVi rnd Id.,flfv b1 - bindk numb.r) I; ,t-i i-cd I ’ tiie bias in the maximum likelihood ,st i- i;, ’ t iIeiIrs in...NTC, IL 60088 Psychometric Laboratory University of North Carolina I ERIC Facility-Acquisitions Davie Hall 013A 4833 Rugby Avenue Chapel Hill, NC

  10. Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Wilson, Mark

    2005-01-01

    This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…

  11. A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

    ERIC Educational Resources Information Center

    Penfield, Randall D.; Algina, James

    2006-01-01

    One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…

  12. A Primer on the 2- and 3-Parameter Item Response Theory Models.

    ERIC Educational Resources Information Center

    Thornton, Artist

    Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…

  13. The Montgomery Äsberg and the Hamilton Ratings of Depression

    PubMed Central

    Carmody, Thomas; Rush, A. John; Bernstein, Ira; Warden, Diane; Brannan, Stephen; Burnham, Daniel; Woo, Ada; Trivedi, Madhukar

    2007-01-01

    The 17-item Hamilton Rating Scale for Depression (HRSD17) and the Montgomery Äsberg Depression Rating Scale (MADRS) are two widely used clinicianrated symptom scales. A 6-item version of the HRSD (HRSD6) was created by Bech to address the psychometric limitations of the HRSD17. The psychometric properties of these measures were compared using classical test theory (CTT) and item response theory (IRT) methods. IRT methods were used to equate total scores on any two scales. Data from two distinctly different outpatient studies of nonpsychotic major depression: a 12-month study of highly treatment-resistant patients (n=233) and an 8-week acute phase drug treatment trial (n=985) were used for robustness of results. MADRS and HRSD6 items generally contributed more to the measurement of depression than HRSD17 items as shown by higher item-total correlations and higher IRT slope parameters. The MADRS and HRSD6 were unifactorial while the HRSD17 contained 2 factors. The MADRS showed about twice the precision in estimating depression as either the HRSD17 or HRSD6 for average severity of depression. An HRSD17 of 7 corresponded to an 8 or 9 on the MADRS and 4 on the HRSD6. The MADRS would be superior to the HRSD17 in the conduct of clinical trials. PMID:16769204

  14. Identifying the Source of Misfit in Item Response Theory Models.

    PubMed

    Liu, Yang; Maydeu-Olivares, Alberto

    2014-01-01

    When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X(2), (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X(2) with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X(2) is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.

  15. [Estimation model for daily transpiration of greenhouse muskmelon in its vegetative growth period].

    PubMed

    Zhang, Da-Long; Li, Jian-Ming; Wu, Pu-Te; Li, Wei-Li; Zhao, Zhi-Hua; Xu, Fei; Li, Jun

    2013-07-01

    For developing an estimation method of muskmelon transpiration in greenhouse, an estimation model for the daily transpiration of greenhouse muskmelon in its vegetative growth period was established, based on the greenhouse environmental parameters, muskmelon growth and development parameters, and soil moisture parameters. According to the specific environment in greenhouse, the item of aerodynamics in Penman-Monteith equation was modified, and the greenhouse environmental sub-model suitable for calculating the reference crop evapotranspiration in greenhouse was deduced. The crop factor sub-model was established with the leaf area index as independent variable, and the form of the model was linear function. The soil moisture sub-model was established with the soil relative effective moisture content as independent variable, and the form of the model was logarithmic function. With interval sowing, the model parameters were estimated and analyzed, according to the measurement data of different sowing dates in a year. The prediction accuracy of the model for sufficient irrigation and water-saving irrigation was verified, according to measurement data when the relative soil moisture content was 80%, 70%, and 60%, and the mean relative error was 11.5%, 16.2% , and 16.9% respectively. The model was a beneficial exploration for the application of Penman-Monteith equation under greenhouse environment and water-saving irrigation, having good application foreground and popularization value.

  16. The association between quality of care and quality of life in long-stay nursing home residents with preserved cognition.

    PubMed

    Kim, Sun Jung; Park, Eun-Cheol; Kim, Sulgi; Nakagawa, Shunichi; Lung, John; Choi, Jong Bum; Ryu, Woo Sang; Min, Too Jae; Shin, Hyun Phil; Kim, Kyudam; Yoo, Ji Won

    2014-03-01

    To assess the overall quality of life of long-stay nursing home residents with preserved cognition, to examine whether the Centers for Medicare and Medicaid Service's Nursing Home Compare 5-star quality rating system reflects the overall quality of life of such residents, and to examine whether residents' demographics and clinical characteristics affect their quality of life. Quality of life was measured using the Participant Outcomes and Status Measures-Nursing Facility survey, which has 10 sections and 63 items. Total scores range from 20 (lowest possible quality of life) to 100 (highest). Long-stay nursing home residents with preserved cognition (n = 316) were interviewed. The average quality- of-life score was 71.4 (SD: 7.6; range: 45.1-93.0). Multilevel regression models revealed that quality of life was associated with physical impairment (parameter estimate = -0.728; P = .04) and depression (parameter estimate = -3.015; P = .01) but not Nursing Home Compare's overall star rating (parameter estimate = 0.683; P = .12) and not pain (parameter estimate = -0.705; P = .47). The 5-star quality rating system did not reflect the quality of life of long-stay nursing home residents with preserved cognition. Notably, pain was not associated with quality of life, but physical impairment and depression were. Copyright © 2014 American Medical Directors Association, Inc. Published by Elsevier Inc. All rights reserved.

  17. Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.

    ERIC Educational Resources Information Center

    Sachar, Jane; Suppes, Patrick

    It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…

  18. Kernel-Smoothing Estimation of Item Characteristic Functions for Continuous Personality Items: An Empirical Comparison with the Linear and the Continuous-Response Models

    ERIC Educational Resources Information Center

    Ferrando, Pere J.

    2004-01-01

    This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…

  19. Evaluation of diagnostic criteria for panic attack using item response theory: findings from the National Comorbidity Survey in USA.

    PubMed

    Ietsugu, Tetsuji; Sukigara, Masune; Furukawa, Toshiaki A

    2007-12-01

    The dichotomous diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) lose much important information concerning what each symptom can offer. This study explored the characteristics and performances of DSM-IV and ICD-10 diagnostic criteria items for panic attack using modern item response theory (IRT). The National Comorbidity Survey used the Composite International Diagnostic Interview to assess 14 DSM-IV and ICD-10 panic attack diagnostic criteria items in the general population in the USA. The dimensionality and measurement properties of these items were evaluated using dichotomous factor analysis and the two-parameter IRT model. A total of 1213 respondents reported at least one subsyndromal or syndromal panic attack in their lifetime. Factor analysis indicated that all items constitute a unidimensional construct. The two-parameter IRT model produced meaningful and interpretable results. Among items with high discrimination parameters, the difficulty parameter for "palpitation" was relatively low, while those for "choking," "fear of dying" and "paresthesia" were relatively high. Several items including "dry mouth" and "fear of losing control" had low discrimination parameters. The item characteristics of diagnostic criteria among help-seeking clinical populations may be different from those that we observed in the general population and deserve further examination. "Paresthesia," "choking" and "fear of dying" can be thought to be good indicators of severe panic attacks, while "palpitation" can discriminate well between cases and non-cases at low level of panic attack severity. Items such as "dry mouth" would contribute less to the discrimination.

  20. A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Doong, Shing H.

    2009-01-01

    The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…

  1. An alternative to Rasch analysis using triadic comparisons and multi-dimensional scaling

    NASA Astrophysics Data System (ADS)

    Bradley, C.; Massof, R. W.

    2016-11-01

    Rasch analysis is a principled approach for estimating the magnitude of some shared property of a set of items when a group of people assign ordinal ratings to them. In the general case, Rasch analysis not only estimates person and item measures on the same invariant scale, but also estimates the average thresholds used by the population to define rating categories. However, Rasch analysis fails when there is insufficient variance in the observed responses because it assumes a probabilistic relationship between person measures, item measures and the rating assigned by a person to an item. When only a single person is rating all items, there may be cases where the person assigns the same rating to many items no matter how many times he rates them. We introduce an alternative to Rasch analysis for precisely these situations. Our approach leverages multi-dimensional scaling (MDS) and requires only rank orderings of items and rank orderings of pairs of distances between items to work. Simulations show one variant of this approach - triadic comparisons with non-metric MDS - provides highly accurate estimates of item measures in realistic situations.

  2. Evaluation of the IRT Parameter Invariance Property for the MCAT.

    ERIC Educational Resources Information Center

    Kelkar, Vinaya; Wightman, Linda F.; Luecht, Richard M.

    The purpose of this study was to investigate the viability of the property of parameter invariance for the one-parameter (1P), two-parameter (2P), and three-parameter (3P) item response theory (IRT) models for the Medical College Admissions Tests (MCAT). Invariance of item parameters across different gender, ethnic, and language groups and the…

  3. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    PubMed

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

  4. An item response theory analysis of the Psychological Inventory of Criminal Thinking Styles: comparing male and female probationers and prisoners.

    PubMed

    Walters, Glenn D

    2014-09-01

    An item response theory (IRT) analysis of the Psychological Inventory of Criminal Thinking Styles (PICTS) was performed on 26,831 (19,067 male and 7,764 female) federal probationers and compared with results obtained on 3,266 (3,039 male and 227 female) prisoners from previous research. Despite the fact male and female federal probationers scored significantly lower on the PICTS thinking style scales than male and female prisoners, discrimination and location parameter estimates for the individual PICTS items were comparable across sex and setting. Consistent with the results of a previous IRT analysis conducted on the PICTS, the current results did not support sentimentality as a component of general criminal thinking. Findings from this study indicate that the discriminative power of the individual PICTS items is relatively stable across sex (male, female) and correctional setting (probation, prison) and that the PICTS may be measuring the same criminal thinking construct in male and female probationers and prisoners. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  5. An Improved Internal Consistency Reliability Estimate.

    ERIC Educational Resources Information Center

    Cliff, Norman

    1984-01-01

    The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…

  6. Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.

    ERIC Educational Resources Information Center

    Sachar, Jane; Suppes, Patrick

    1980-01-01

    The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)

  7. A Comparison of Methods for Nonparametric Estimation of Item Characteristic Curves for Binary Items

    ERIC Educational Resources Information Center

    Lee, Young-Sun

    2007-01-01

    This study compares the performance of three nonparametric item characteristic curve (ICC) estimation procedures: isotonic regression, smoothed isotonic regression, and kernel smoothing. Smoothed isotonic regression, employed along with an appropriate kernel function, provides better estimates and also satisfies the assumption of strict…

  8. Development and Validation of the Behavioral Tendencies Questionnaire

    PubMed Central

    Van Dam, Nicholas T.; Brown, Anna; Mole, Tom B.; Davis, Jake H.; Britton, Willoughby B.; Brewer, Judson A.

    2015-01-01

    At a fundamental level, taxonomy of behavior and behavioral tendencies can be described in terms of approach, avoid, or equivocate (i.e., neither approach nor avoid). While there are numerous theories of personality, temperament, and character, few seem to take advantage of parsimonious taxonomy. The present study sought to implement this taxonomy by creating a questionnaire based on a categorization of behavioral temperaments/tendencies first identified in Buddhist accounts over fifteen hundred years ago. Items were developed using historical and contemporary texts of the behavioral temperaments, described as “Greedy/Faithful”, “Aversive/Discerning”, and “Deluded/Speculative”. To both maintain this categorical typology and benefit from the advantageous properties of forced-choice response format (e.g., reduction of response biases), binary pairwise preferences for items were modeled using Latent Class Analysis (LCA). One sample (n1 = 394) was used to estimate the item parameters, and the second sample (n2 = 504) was used to classify the participants using the established parameters and cross-validate the classification against multiple other measures. The cross-validated measure exhibited good nomothetic span (construct-consistent relationships with related measures) that seemed to corroborate the ideas present in the original Buddhist source documents. The final 13-block questionnaire created from the best performing items (the Behavioral Tendencies Questionnaire or BTQ) is a psychometrically valid questionnaire that is historically consistent, based in behavioral tendencies, and promises practical and clinical utility particularly in settings that teach and study meditation practices such as Mindfulness Based Stress Reduction (MBSR). PMID:26535904

  9. Development and Validation of the Behavioral Tendencies Questionnaire.

    PubMed

    Van Dam, Nicholas T; Brown, Anna; Mole, Tom B; Davis, Jake H; Britton, Willoughby B; Brewer, Judson A

    2015-01-01

    At a fundamental level, taxonomy of behavior and behavioral tendencies can be described in terms of approach, avoid, or equivocate (i.e., neither approach nor avoid). While there are numerous theories of personality, temperament, and character, few seem to take advantage of parsimonious taxonomy. The present study sought to implement this taxonomy by creating a questionnaire based on a categorization of behavioral temperaments/tendencies first identified in Buddhist accounts over fifteen hundred years ago. Items were developed using historical and contemporary texts of the behavioral temperaments, described as "Greedy/Faithful", "Aversive/Discerning", and "Deluded/Speculative". To both maintain this categorical typology and benefit from the advantageous properties of forced-choice response format (e.g., reduction of response biases), binary pairwise preferences for items were modeled using Latent Class Analysis (LCA). One sample (n1 = 394) was used to estimate the item parameters, and the second sample (n2 = 504) was used to classify the participants using the established parameters and cross-validate the classification against multiple other measures. The cross-validated measure exhibited good nomothetic span (construct-consistent relationships with related measures) that seemed to corroborate the ideas present in the original Buddhist source documents. The final 13-block questionnaire created from the best performing items (the Behavioral Tendencies Questionnaire or BTQ) is a psychometrically valid questionnaire that is historically consistent, based in behavioral tendencies, and promises practical and clinical utility particularly in settings that teach and study meditation practices such as Mindfulness Based Stress Reduction (MBSR).

  10. Construction of a memory battery for computerized administration, using item response theory.

    PubMed

    Ferreira, Aristides I; Almeida, Leandro S; Prieto, Gerardo

    2012-10-01

    In accordance with Item Response Theory, a computer memory battery with six tests was constructed for use in the Portuguese adult population. A factor analysis was conducted to assess the internal structure of the tests (N = 547 undergraduate students). According to the literature, several confirmatory factor models were evaluated. Results showed better fit of a model with two independent latent variables corresponding to verbal and non-verbal factors, reproducing the initial battery organization. Internal consistency reliability for the six tests were alpha = .72 to .89. IRT analyses (Rasch and partial credit models) yielded good Infit and Outfit measures and high precision for parameter estimation. The potential utility of these memory tasks for psychological research and practice willbe discussed.

  11. A Comparison of Methods for Estimating Conditional Item Score Differences in Differential Item Functioning (DIF) Assessments. Research Report. ETS RR-10-15

    ERIC Educational Resources Information Center

    Moses, Tim; Miao, Jing; Dorans, Neil

    2010-01-01

    This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…

  12. Hybrid Gibbs Sampling and MCMC for CMB Analysis at Small Angular Scales

    NASA Technical Reports Server (NTRS)

    Jewell, Jeffrey B.; Eriksen, H. K.; Wandelt, B. D.; Gorski, K. M.; Huey, G.; O'Dwyer, I. J.; Dickinson, C.; Banday, A. J.; Lawrence, C. R.

    2008-01-01

    A) Gibbs Sampling has now been validated as an efficient, statistically exact, and practically useful method for "low-L" (as demonstrated on WMAP temperature polarization data). B) We are extending Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters for the entire range of angular scales relevant for Planck. C) Made possible by inclusion of foreground model parameters in Gibbs sampling and hybrid MCMC and Gibbs sampling for the low signal to noise (high-L) regime. D) Future items to be included in the Bayesian framework include: 1) Integration with Hybrid Likelihood (or posterior) code for cosmological parameters; 2) Include other uncertainties in instrumental systematics? (I.e. beam uncertainties, noise estimation, calibration errors, other).

  13. Effects of Item Parameter Drift on Vertical Scaling with the Nonequivalent Groups with Anchor Test (NEAT) Design

    ERIC Educational Resources Information Center

    Ye, Meng; Xin, Tao

    2014-01-01

    The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…

  14. On the validity of measuring change over time in routine clinical assessment: a close examination of item-level response shifts in psychosomatic inpatients.

    PubMed

    Nolte, S; Mierke, A; Fischer, H F; Rose, M

    2016-06-01

    Significant life events such as severe health status changes or intensive medical treatment often trigger response shifts in individuals that may hamper the comparison of measurements over time. Drawing from the Oort model, this study aims at detecting response shift at the item level in psychosomatic inpatients and evaluating its impact on the validity of comparing repeated measurements. Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on average 24 days after intake. Reconceptualization, reprioritization, and recalibration response shifts were explored applying tests of measurement invariance. In the item-level approach, all model parameters were constrained to be equal between pretest and posttest. If non-invariance was detected, these were linked to the different types of response shift. When constraining across-occasion model parameters, model fit worsened as indicated by a significant Satorra-Bentler Chi-square difference test suggesting potential presence of response shifts. A close examination revealed presence of two types of response shift, i.e., (non)uniform recalibration and both higher- and lower-level reconceptualization response shifts leading to four model adjustments. Our analyses suggest that psychosomatic inpatients experienced some response shifts during their hospital stay. According to the hierarchy of measurement invariance, however, only one of the detected non-invariances is critical for unbiased mean comparisons over time, which did not have a substantial impact on estimating change. Hence, the use of the ISR can be recommended for outcomes assessment in clinical routine, as change score estimates do not seem hampered by response shift effects.

  15. Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.

    ERIC Educational Resources Information Center

    Wainer, Howard

    It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…

  16. A new computerized adaptive test advancing the measurement of health-related quality of life (HRQoL) in children: the Kids-CAT.

    PubMed

    Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U

    2015-04-01

    Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.

  17. Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks.

    PubMed

    Zhao, Yue

    2017-03-01

    In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.

  18. A Cross-Cultural Analysis of the Infant Behavior Questionnaire Very Short Form: An Item Response Theory Analysis of Infant Temperament in New Zealand.

    PubMed

    Peterson, Elizabeth R; Mohal, Jatender; Waldie, Karen E; Reese, Elaine; Atatoa Carr, Polly E; Grant, Cameron C; Morton, Susan M B

    2017-01-01

    The Infant Behavior Questionnaire-Revised Very Short Form (IBQ-R VSF; Putnam, Helbig, Gartstein, Rothbart, & Leerkes, 2014 ) is a newly published measure of infant temperament with a 3-factor structure. Recently Peterson et al. ( 2017 ) suggested that a 5-factor structure (Positive Affectivity/Surgency, Negative Emotionality, Orienting Capacity, Affiliation/Regulation, and Fear) was more parsimonious and showed promising reliability and predictive validity in a large, diverse sample. However, little is known about the 5-factor model's precision across the temperament dimensions range and whether it discriminates equally well across ethnicities. A total of 5,567 mothers responded to the IBQ-R VSF in relation to their infants (N = 5,639) between 23 and 52 weeks old. Using item response theory, we conducted a series of 2 parameter logistic item response models and found that 5 IBQ-R VSF temperament dimensions showed a good distribution of estimates across each latent trait range and these estimates centered close to the population mean. The IBQ-R VSF was also similarly precise across 4 ethnic groups (European, Māori, Pacific peoples, and Asians), suggesting that it can be used as comparable measure for infant temperament in a diversity of ethnic groups.

  19. Reliability and validity of a short form household food security scale in a Caribbean community.

    PubMed

    Gulliford, Martin C; Mahabir, Deepak; Rocke, Brian

    2004-06-16

    We evaluated the reliability and validity of the short form household food security scale in a different setting from the one in which it was developed. The scale was interview administered to 531 subjects from 286 households in north central Trinidad in Trinidad and Tobago, West Indies. We evaluated the six items by fitting item response theory models to estimate item thresholds, estimating agreement among respondents in the same households and estimating the slope index of income-related inequality (SII) after adjusting for age, sex and ethnicity. Item-score correlations ranged from 0.52 to 0.79 and Cronbach's alpha was 0.87. Item responses gave within-household correlation coefficients ranging from 0.70 to 0.78. Estimated item thresholds (standard errors) from the Rasch model ranged from -2.027 (0.063) for the 'balanced meal' item to 2.251 (0.116) for the 'hungry' item. The 'balanced meal' item had the lowest threshold in each ethnic group even though there was evidence of differential functioning for this item by ethnicity. Relative thresholds of other items were generally consistent with US data. Estimation of the SII, comparing those at the bottom with those at the top of the income scale, gave relative odds for an affirmative response of 3.77 (95% confidence interval 1.40 to 10.2) for the lowest severity item, and 20.8 (2.67 to 162.5) for highest severity item. Food insecurity was associated with reduced consumption of green vegetables after additionally adjusting for income and education (0.52, 0.28 to 0.96). The household food security scale gives reliable and valid responses in this setting. Differing relative item thresholds compared with US data do not require alteration to the cut-points for classification of 'food insecurity without hunger' or 'food insecurity with hunger'. The data provide further evidence that re-evaluation of the 'balanced meal' item is required.

  20. Reliability of Summed Item Scores Using Structural Equation Modeling: An Alternative to Coefficient Alpha

    ERIC Educational Resources Information Center

    Green, Samuel B.; Yang, Yanyun

    2009-01-01

    A method is presented for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores. Assuming the focus is on consistency of summed item scores, this method for estimating reliability is preferred to those based on linear SEM models and to the most commonly reported estimate of…

  1. Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28

    ERIC Educational Resources Information Center

    Guo, Hongwen; Sinharay, Sandip

    2011-01-01

    Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…

  2. Mixture Item Response Theory-MIMIC Model: Simultaneous Estimation of Differential Item Functioning for Manifest Groups and Latent Classes

    ERIC Educational Resources Information Center

    Bilir, Mustafa Kuzey

    2009-01-01

    This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…

  3. A Lower Bound to the Probability of Choosing the Optimal Passing Score for a Mastery Test When There is an External Criterion [and] Estimating the Parameters of the Beta-Binomial Distribution.

    ERIC Educational Resources Information Center

    Wilcox, Rand R.

    A mastery test is frequently described as follows: an examinee responds to n dichotomously scored test items. Depending upon the examinee's observed (number correct) score, a mastery decision is made and the examinee is advanced to the next level of instruction. Otherwise, a nonmastery decision is made and the examinee is given remedial work. This…

  4. The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.

    ERIC Educational Resources Information Center

    Reckase, Mark D.; McKinley, Robert L.

    A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…

  5. An improved non-Markovian degradation model with long-term dependency and item-to-item uncertainty

    NASA Astrophysics Data System (ADS)

    Xi, Xiaopeng; Chen, Maoyin; Zhang, Hanwen; Zhou, Donghua

    2018-05-01

    It is widely noted in the literature that the degradation should be simplified into a memoryless Markovian process for the purpose of predicting the remaining useful life (RUL). However, there actually exists the long-term dependency in the degradation processes of some industrial systems, including electromechanical equipments, oil tankers, and large blast furnaces. This implies the new degradation state depends not only on the current state, but also on the historical states. Such dynamic systems cannot be accurately described by traditional Markovian models. Here we present an improved non-Markovian degradation model with both the long-term dependency and the item-to-item uncertainty. As a typical non-stationary process with dependent increments, fractional Brownian motion (FBM) is utilized to simulate the fractal diffusion of practical degradations. The uncertainty among multiple items can be represented by a random variable of the drift. Based on this model, the unknown parameters are estimated through the maximum likelihood (ML) algorithm, while a closed-form solution to the RUL distribution is further derived using a weak convergence theorem. The practicability of the proposed model is fully verified by two real-world examples. The results demonstrate that the proposed method can effectively reduce the prediction error.

  6. Force Limited Vibration Testing: Computation C2 for Real Load and Probabilistic Source

    NASA Astrophysics Data System (ADS)

    Wijker, J. J.; de Boer, A.; Ellenbroek, M. H. M.

    2014-06-01

    To prevent over-testing of the test-item during random vibration testing Scharton proposed and discussed the force limited random vibration testing (FLVT) in a number of publications, in which the factor C2 is besides the random vibration specification, the total mass and the turnover frequency of the load(test item), a very important parameter. A number of computational methods to estimate C2 are described in the literature, i.e. the simple and the complex two degrees of freedom system, STDFS and CTDFS, respectively. Both the STDFS and the CTDFS describe in a very reduced (simplified) manner the load and the source (adjacent structure to test item transferring the excitation forces, i.e. spacecraft supporting an instrument).The motivation of this work is to establish a method for the computation of a realistic value of C2 to perform a representative random vibration test based on force limitation, when the adjacent structure (source) description is more or less unknown. Marchand formulated a conservative estimation of C2 based on maximum modal effective mass and damping of the test item (load) , when no description of the supporting structure (source) is available [13].Marchand discussed the formal description of getting C 2 , using the maximum PSD of the acceleration and maximum PSD of the force, both at the interface between load and source, in combination with the apparent mass and total mass of the the load. This method is very convenient to compute the factor C 2 . However, finite element models are needed to compute the spectra of the PSD of both the acceleration and force at the interface between load and source.Stevens presented the coupled systems modal approach (CSMA), where simplified asparagus patch models (parallel-oscillator representation) of load and source are connected, consisting of modal effective masses and the spring stiffnesses associated with the natural frequencies. When the random acceleration vibration specification is given the CMSA method is suitable to compute the valueof the parameter C 2 .When no mathematical model of the source can be made available, estimations of the value C2 can be find in literature.In this paper a probabilistic mathematical representation of the unknown source is proposed, such that the asparagus patch model of the source can be approximated. The computation of the value C2 can be done in conjunction with the CMSA method, knowing the apparent mass of the load and the random acceleration specification at the interface between load and source, respectively.Strength & stiffness design rules for spacecraft, instrumentation, units, etc. will be practiced, as mentioned in ECSS Standards and Handbooks, Launch Vehicle User's manuals, papers, books , etc. A probabilistic description of the design parameters is foreseen.As an example a simple experiment has been worked out.

  7. Using SAS PROC MCMC for Item Response Theory Models

    PubMed Central

    Samonte, Kelli

    2014-01-01

    Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models. PMID:29795834

  8. Modeling the dynamics of recognition memory testing with an integrated model of retrieval and decision making.

    PubMed

    Osth, Adam F; Jansson, Anna; Dennis, Simon; Heathcote, Andrew

    2018-08-01

    A robust finding in recognition memory is that performance declines monotonically across test trials. Despite the prevalence of this decline, there is a lack of consensus on the mechanism responsible. Three hypotheses have been put forward: (1) interference is caused by learning of test items (2) the test items cause a shift in the context representation used to cue memory and (3) participants change their speed-accuracy thresholds through the course of testing. We implemented all three possibilities in a combined model of recognition memory and decision making, which inherits the memory retrieval elements of the Osth and Dennis (2015) model and uses the diffusion decision model (DDM: Ratcliff, 1978) to generate choice and response times. We applied the model to four datasets that represent three challenges, the findings that: (1) the number of test items plays a larger role in determining performance than the number of studied items, (2) performance decreases less for strong items than weak items in pure lists but not in mixed lists, and (3) lexical decision trials interspersed between recognition test trials do not increase the rate at which performance declines. Analysis of the model's parameter estimates suggests that item interference plays a weak role in explaining the effects of recognition testing, while context drift plays a very large role. These results are consistent with prior work showing a weak role for item noise in recognition memory and that retrieval is a strong cause of context change in episodic memory. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. www.common-metrics.org: a web application to estimate scores from different patient-reported outcome measures on a common scale.

    PubMed

    Fischer, H Felix; Rose, Matthias

    2016-10-19

    Recently, a growing number of Item-Response Theory (IRT) models has been published, which allow estimation of a common latent variable from data derived by different Patient Reported Outcomes (PROs). When using data from different PROs, direct estimation of the latent variable has some advantages over the use of sum score conversion tables. It requires substantial proficiency in the field of psychometrics to fit such models using contemporary IRT software. We developed a web application ( http://www.common-metrics.org ), which allows estimation of latent variable scores more easily using IRT models calibrating different measures on instrument independent scales. Currently, the application allows estimation using six different IRT models for Depression, Anxiety, and Physical Function. Based on published item parameters, users of the application can directly estimate latent trait estimates using expected a posteriori (EAP) for sum scores as well as for specific response patterns, Bayes modal (MAP), Weighted likelihood estimation (WLE) and Maximum likelihood (ML) methods and under three different prior distributions. The obtained estimates can be downloaded and analyzed using standard statistical software. This application enhances the usability of IRT modeling for researchers by allowing comparison of the latent trait estimates over different PROs, such as the Patient Health Questionnaire Depression (PHQ-9) and Anxiety (GAD-7) scales, the Center of Epidemiologic Studies Depression Scale (CES-D), the Beck Depression Inventory (BDI), PROMIS Anxiety and Depression Short Forms and others. Advantages of this approach include comparability of data derived with different measures and tolerance against missing values. The validity of the underlying models needs to be investigated in the future.

  10. Evaluation of a photographic food atlas as a tool for quantifying food portion size in the United Arab Emirates

    PubMed Central

    Platat, Carine; El Mesmoudi, Najoua; El Sadig, Mohamed; Tewfik, Ihab

    2018-01-01

    Although, United Arab Emirates (UAE) has one of the highest prevalence of overweight, obesity and type 2 diabetes in the world, however, validated dietary assessment aids to estimate food intake of individuals and populations in the UAE are currently lacking. We conducted two observational studies to evaluate the accuracy of a photographic food atlas which was developed as a tool for food portion size estimation in the UAE. The UAE Food Atlas presents eight portion sizes for each food. Study 1 involved portion size estimations of 13 food items consumed during the previous day. Study 2 involved portion size estimations of nine food items immediately after consumption. Differences between the food portion sizes estimated from the photographs and the weighed food portions (estimation error), as well as the percentage differences relative to the weighed food portion for each tested food item were calculated. Four of the evaluated food items were underestimated (by -8.9% to -18.4%), while nine were overestimated (by 9.5% to 90.9%) in Study 1. Moreover, there were significant differences between estimated and eaten food portions for eight food items (P<0.05). In Study 2, one food item was underestimated (-8.1%) while eight were overestimated (range 2.52% to 82.1%). Furthermore, there were significant differences between estimated and eaten food portions (P<0.05) for six food items. The limits of agreement between the estimated and consumed food portion size were wide indicating a large variability in food portion estimation errors. These reported findings highlight the need for further developments of the UAE Food Atlas to improve the accuracy of food portion size intake estimations in dietary assessments. Additionally, recalling food portions from the previous day did not seem to increase food portion estimation errors in this study. PMID:29698434

  11. Asymptotic Properties of Induced Maximum Likelihood Estimates of Nonlinear Models for Item Response Variables: The Finite-Generic-Item-Pool Case.

    ERIC Educational Resources Information Center

    Jones, Douglas H.

    The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…

  12. Variation in the Readability of Items Within Surveys

    PubMed Central

    Calderón, José L.; Morales, Leo S.; Liu, Honghu; Hays, Ron D.

    2006-01-01

    The objective of this study was to estimate the variation in the readability of survey items within 2 widely used health-related quality-of-life surveys: the National Eye Institute Visual Functioning Questionnaire–25 (VFQ-25) and the Short Form Health Survey, version 2 (SF-36v2). Flesch-Kincaid and Flesch Reading Ease formulas were used to estimate readability. Individual survey item scores and descriptive statistics for each survey were calculated. Variation of individual item scores from the mean survey score was graphically depicted for each survey. The mean reading grade level and reading ease estimates for the VFQ-25 and SF-36v2 were 7.8 (fairly easy) and 6.4 (easy), respectively. Both surveys had notable variation in item readability; individual item readability scores ranged from 3.7 to 12.0 (very easy to difficult) for the VFQ-25 and 2.2 to 12.0 (very easy to difficult) for the SF-36v2. Because survey respondents may not comprehend items with readability scores that exceed their reading ability, estimating the readability of each survey item is an important component of evaluating survey readability. Standards for measuring the readability of surveys are needed. PMID:16401705

  13. Quality of surgical randomized controlled trials for acute cholecystitis: assessment based on CONSORT and additional check items.

    PubMed

    Shikata, Satoru; Nakayama, Takeo; Yamagishi, Hisakazu

    2008-01-01

    In this study, we conducted a limited survey of reports of surgical randomized controlled trials, using the consolidated standards of reporting trials (CONSORT) statement and additional check items to clarify problems in the evaluation of surgical reports. A total of 13 randomized trials were selected from two latest review articles on biliary surgery. Each randomized trial was evaluated according to 28 quality measures that comprised items from the CONSORT statement plus additional items. Analysis focused on relationships between the quality of each study and the estimated effect gap ("pooled estimate in meta-analysis" -- "estimated effect of each study"). No definite relationships were found between individual study quality and the estimated effect gap. The following items could have been described but were not provided in almost all the surgical RCT reports: "clearly defined outcomes"; "details of randomization"; "participant flow charts"; "intention-to-treat analysis"; "ancillary analyses"; and "financial conflicts of interest". The item, "participation of a trial methodologist in the study" was not found in any of the reports. Although the quality of reporting trials is not always related to a biased estimation of treatment effect, the items used for quality measures must be described to enable readers to evaluate the quality and applicability of the reporting. Further development of an assessment tool is needed for items specific to surgical randomized controlled trials.

  14. An Examination of Two Procedures for Identifying Consequential Item Parameter Drift

    ERIC Educational Resources Information Center

    Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu

    2014-01-01

    The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…

  15. The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory

    ERIC Educational Resources Information Center

    Anil, Duygu

    2008-01-01

    In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…

  16. A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

    ERIC Educational Resources Information Center

    Lee, Guemin; Park, In-Yong

    2012-01-01

    Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

  17. A Monte Carlo Study of the Effect of Item Characteristic Curve Estimation on the Accuracy of Three Person-Fit Statistics

    ERIC Educational Resources Information Center

    St-Onge, Christina; Valois, Pierre; Abdous, Belkacem; Germain, Stephane

    2009-01-01

    To date, there have been no studies comparing parametric and nonparametric Item Characteristic Curve (ICC) estimation methods on the effectiveness of Person-Fit Statistics (PFS). The primary aim of this study was to determine if the use of ICCs estimated by nonparametric methods would increase the accuracy of item response theory-based PFS for…

  18. A study of Korean students' creativity in science using structural equation modeling

    NASA Astrophysics Data System (ADS)

    Jo, Son Mi

    Through the review of creativity research I have found that studies lack certain crucial parts: (a) a theoretical framework for the study of creativity in science, (b) studies considering the unique components related to scientific creativity, and (c) studies of the interactions among key components through simultaneous analyses. The primary purpose of this study is to explore the dynamic interactions among four components (scientific proficiency, intrinsic motivation, creative competence, context supporting creativity) related to scientific creativity under the framework of scientific creativity. A total of 295 Korean middle school students participated. Well-known and commonly used measurements were selected and developed. Two scientific achievement scores and one score measured by performance-based assessment were used to measure student scientific knowledge/inquiry skills. Six items selected from the study of Lederman, Abd-El-Khalick, Bell, and Schwartz (2002) were used to assess how well students understand the nature of science. Five items were selected from the subscale of the scientific attitude inventory version II (Moore & Foy, 1997) to assess student attitude toward science. The Test of Creative Thinking-Drawing Production (Urban & Jellen, 1996) was used to measure creative competence. Eight items chosen from the 15 items of the Work Preference Inventory (1994) were applied to measure students' intrinsic motivation. To assess the level of context supporting creativity, eight items were adapted from measurement of the work environment (Amabile, Conti, Coon, Lazenby, and Herron, 1996). To assess scientific creativity, one open-ended science problem was used and three raters rated the level of scientific creativity through the Consensual Assessment Technique (Amabile, 1996). The results show that scientific proficiency and creative competence correlates with scientific creativity. Intrinsic motivation and context components do not predict scientific creativity. The strength of relationships between scientific proficiency and scientific creativity (estimate parameter=0.43) and creative competence and scientific creativity (estimate parameter=0.17) are similar [chi2.05(1)=0.670, P>.05]. In specific analysis of structural model, I found that creative competence and scientific proficiency play a role of partial mediators among three components (general creativity, scientific proficiency, and scientific creativity). The moderate effects of intrinsic motivation and context component were investigated, but the moderation effects were not found.

  19. Assessing items on the SF-8 Japanese version for health-related quality of life: a psychometric analysis based on the nominal categories model of item response theory.

    PubMed

    Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya

    2009-06-01

    The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.

  20. Underestimating numerosity of items in visual search tasks.

    PubMed

    Cassenti, Daniel N; Kelley, Troy D; Ghirardelli, Thomas G

    2010-10-01

    Previous research on numerosity judgments addressed attended items, while the present research addresses underestimation for unattended items in visual search tasks. One potential cause of underestimation for unattended items is that estimates of quantity may depend on viewing a large portion of the display within foveal vision. Another theory follows from the occupancy model: estimating quantity of items in greater proximity to one another increases the likelihood of an underestimation error. Three experimental manipulations addressed aspects of underestimation for unattended items: the size of the distracters, the distance of the target from fixation, and whether items were clustered together. Results suggested that the underestimation effect for unattended items was best explained within a Gestalt grouping framework.

  1. Testing measurement invariance of the patient-reported outcomes measurement information system pain behaviors score between the US general population sample and a sample of individuals with chronic pain.

    PubMed

    Chung, Hyewon; Kim, Jiseon; Cook, Karon F; Askew, Robert L; Revicki, Dennis A; Amtmann, Dagmar

    2014-02-01

    In order to test the difference between group means, the construct measured must have the same meaning for all groups under investigation. This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain behavior (PB) item bank in two samples: the PROMIS calibration sample (Wave 1, N = 426) and a sample recruited from the American Chronic Pain Association (ACPA, N = 750). The ACPA data were collected to increase the number of participants with higher levels of pain. Multi-group confirmatory factor analysis (MG-CFA) and two item response theory (IRT)-based differential item functioning (DIF) approaches were employed to evaluate the existence of measurement invariance. MG-CFA results supported metric invariance of the PROMIS-PB, indicating unstandardized factor loadings with equal across samples. DIF analyses revealed that impact of 6 DIF items was negligible. Based on the results of both MG-CFA and IRT-based DIF approaches, we recommend retaining the original parameter estimates obtained from the combined samples based on the results of MG-CFA.

  2. A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

    ERIC Educational Resources Information Center

    Livingston, Samuel A.; Dorans, Neil J.

    2004-01-01

    This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…

  3. On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit

    ERIC Educational Resources Information Center

    Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey

    2009-01-01

    The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…

  4. SAS and SPSS macros to calculate standardized Cronbach's alpha using the upper bound of the phi coefficient for dichotomous items.

    PubMed

    Sun, Wei; Chou, Chih-Ping; Stacy, Alan W; Ma, Huiyan; Unger, Jennifer; Gallaher, Peggy

    2007-02-01

    Cronbach's a is widely used in social science research to estimate the internal consistency of reliability of a measurement scale. However, when items are not strictly parallel, the Cronbach's a coefficient provides a lower-bound estimate of true reliability, and this estimate may be further biased downward when items are dichotomous. The estimation of standardized Cronbach's a for a scale with dichotomous items can be improved by using the upper bound of coefficient phi. SAS and SPSS macros have been developed in this article to obtain standardized Cronbach's a via this method. The simulation analysis showed that Cronbach's a from upper-bound phi might be appropriate for estimating the real reliability when standardized Cronbach's a is problematic.

  5. The effects of sleep deprivation on item and associative recognition memory.

    PubMed

    Ratcliff, Roger; Van Dongen, Hans P A

    2018-02-01

    Sleep deprivation adversely affects the ability to perform cognitive tasks, but theories range from predicting an overall decline in cognitive functioning because of reduced stability in attentional networks to specific deficits in various cognitive domains or processes. We measured the effects of sleep deprivation on two memory tasks, item recognition ("was this word in the list studied") and associative recognition ("were these two words studied in the same pair"). These tasks test memory for information encoded a few minutes earlier and so do not address effects of sleep deprivation on working memory or consolidation after sleep. A diffusion model was used to decompose accuracy and response time distributions to produce parameter estimates of components of cognitive processing. The model assumes that over time, noisy evidence from the task stimulus is accumulated to one of two decision criteria, and parameters governing this process are extracted and interpreted in terms of distinct cognitive processes. Results showed that sleep deprivation reduces drift rate (evidence used in the decision process), with little effect on the other components of the decision process. These results contrast with the effects of aging, which show little decline in item recognition but large declines in associative recognition. The results suggest that sleep deprivation degrades the quality of information stored in memory and that this may occur through degraded attentional processes. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  6. Item response theory analysis of the mechanics baseline test

    NASA Astrophysics Data System (ADS)

    Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

    2012-02-01

    Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.

  7. Item Parameter Invariance of the Kaufman Adolescent and Adult Intelligence Test across Male and Female Samples

    ERIC Educational Resources Information Center

    Immekus, Jason C.; Maller, Susan J.

    2009-01-01

    The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…

  8. A Method of Q-Matrix Validation for the Linear Logistic Test Model

    PubMed Central

    Baghaei, Purya; Hohensinn, Christine

    2017-01-01

    The linear logistic test model (LLTM) is a well-recognized psychometric model for examining the components of difficulty in cognitive tests and validating construct theories. The plausibility of the construct model, summarized in a matrix of weights, known as the Q-matrix or weight matrix, is tested by (1) comparing the fit of LLTM with the fit of the Rasch model (RM) using the likelihood ratio (LR) test and (2) by examining the correlation between the Rasch model item parameters and LLTM reconstructed item parameters. The problem with the LR test is that it is almost always significant and, consequently, LLTM is rejected. The drawback of examining the correlation coefficient is that there is no cut-off value or lower bound for the magnitude of the correlation coefficient. In this article we suggest a simulation method to set a minimum benchmark for the correlation between item parameters from the Rasch model and those reconstructed by the LLTM. If the cognitive model is valid then the correlation coefficient between the RM-based item parameters and the LLTM-reconstructed item parameters derived from the theoretical weight matrix should be greater than those derived from the simulated matrices. PMID:28611721

  9. The Confounding Effects of Ability, Item Difficulty, and Content Balance within Multiple Dimensions on the Estimation of Unidimensional Thetas

    ERIC Educational Resources Information Center

    Matlock, Ki Lynn

    2013-01-01

    When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…

  10. Signal and array processing techniques for RFID readers

    NASA Astrophysics Data System (ADS)

    Wang, Jing; Amin, Moeness; Zhang, Yimin

    2006-05-01

    Radio Frequency Identification (RFID) has recently attracted much attention in both the technical and business communities. It has found wide applications in, for example, toll collection, supply-chain management, access control, localization tracking, real-time monitoring, and object identification. Situations may arise where the movement directions of the tagged RFID items through a portal is of interest and must be determined. Doppler estimation may prove complicated or impractical to perform by RFID readers. Several alternative approaches, including the use of an array of sensors with arbitrary geometry, can be applied. In this paper, we consider direction-of-arrival (DOA) estimation techniques for application to near-field narrowband RFID problems. Particularly, we examine the use of a pair of RFID antennas to track moving RFID tagged items through a portal. With two antennas, the near-field DOA estimation problem can be simplified to a far-field problem, yielding a simple way for identifying the direction of the tag movement, where only one parameter, the angle, needs to be considered. In this case, tracking of the moving direction of the tag simply amounts to computing the spatial cross-correlation between the data samples received at the two antennas. It is pointed out that the radiation patterns of the reader and tag antennas, particularly their phase characteristics, have a significant effect on the performance of DOA estimation. Indoor experiments are conducted in the Radar Imaging and RFID Labs at Villanova University for validating the proposed technique for target movement direction estimations.

  11. Item Information and Discrimination Functions for Trinary PCM Items.

    ERIC Educational Resources Information Center

    Akkermans, Wies; Muraki, Eiji

    1997-01-01

    For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)

  12. Use of Robust z in Detecting Unstable Items in Item Response Theory Models

    ERIC Educational Resources Information Center

    Huynh, Huynh; Meyer, Patrick

    2010-01-01

    The first part of this paper describes the use of the robust z[subscript R] statistic to link test forms using the Rasch (or one-parameter logistic) model. The procedure is then extended to the two-parameter and three-parameter logistic and two-parameter partial credit (2PPC) models. A real set of data was used to illustrate the extension. The…

  13. The efficiency of parameter estimation of latent path analysis using summated rating scale (SRS) and method of successive interval (MSI) for transformation of score to scale

    NASA Astrophysics Data System (ADS)

    Solimun, Fernandes, Adji Achmad Rinaldo; Arisoesilaningsih, Endang

    2017-12-01

    Research in various fields generally investigates systems and involves latent variables. One method to analyze the model representing the system is path analysis. The data of latent variables measured using questionnaires by applying attitude scale model yields data in the form of score, before analyzed should be transformation so that it becomes data of scale. Path coefficient, is parameter estimator, calculated from scale data using method of successive interval (MSI) and summated rating scale (SRS). In this research will be identifying which data transformation method is better. Path coefficients have smaller varieties are said to be more efficient. The transformation method that produces scaled data and used in path analysis capable of producing path coefficients (parameter estimators) with smaller varieties is said to be better. The result of analysis using real data shows that on the influence of Attitude variable to Intention Entrepreneurship, has relative efficiency (ER) = 1, where it shows that the result of analysis using data transformation of MSI and SRS as efficient. On the other hand, for simulation data, at high correlation between items (0.7-0.9), MSI method is more efficient 1.3 times better than SRS method.

  14. Development and psychometric evaluation of a cardiovascular risk and disease management knowledge assessment tool.

    PubMed

    Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna

    2014-01-01

    This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.

  15. Improving Measurement of Trait Competitiveness: A Rasch Analysis of the Revised Competitiveness Index With Samples From New Zealand and US University Students.

    PubMed

    Krägeloh, Christian U; Medvedev, Oleg N; Hill, Erin M; Webster, Craig S; Booth, Roger J; Henning, Marcus A

    2018-01-01

    Measuring competitiveness is necessary to fully understand variables affecting student learning. The 14-item Revised Competitiveness Index has become a widely used measure to assess trait competitiveness. The current study reports on a Rasch analysis to investigate the psychometric properties of the Revised Competitiveness Index and to improve its precision for international comparisons. Students were recruited from medical studies at a university in New Zealand, undergraduate health sciences courses at another New Zealand university, and a psychology undergraduate class at a university in the United States. Rasch model estimate parameters were affected by local dependency and item misfit. Best fit to the Rasch model (χ 2 (20) = 15.86, p = .73, person separation index = .95) was obtained for the Enjoyment of Competition subscale after combining locally dependent items into a subtest and discarding the highly misfitting Item 9. The only modifications required to obtain a suitable fit (χ 2 (25) = 25.81, p = .42, person separation index = .77) for the Contentiousness subscale were a subtest to combine two locally dependent items and splitting this subtest by country to deal with differential item functioning. The results support reliability and internal construct validity of the modified Revised Competitiveness Index. Precision of the measure may be enhanced using the ordinal-to-interval conversion algorithms presented here, allowing the use of parametric statistics without breaking fundamental statistical assumptions.

  16. Validating a multiple mini-interview question bank assessing entry-level reasoning skills in candidates for graduate-entry medicine and dentistry programmes.

    PubMed

    Roberts, Chris; Zoanetti, Nathan; Rothnie, Imogene

    2009-04-01

    The multiple mini-interview (MMI) was initially designed to test non-cognitive characteristics related to professionalism in entry-level students. However, it may be testing cognitive reasoning skills. Candidates to medical and dental schools come from diverse backgrounds and it is important for the validity and fairness of the MMI that these background factors do not impact on their scores. A suite of advanced psychometric techniques drawn from item response theory (IRT) was used to validate an MMI question bank in order to establish the conceptual equivalence of the questions. Bias against candidate subgroups of equal ability was investigated using differential item functioning (DIF) analysis. All 39 questions had a good fit to the IRT model. Of the 195 checklist items, none were found to have significant DIF after visual inspection of expected score curves, consideration of the number of applicants per category, and evaluation of the magnitude of the DIF parameter estimates. The question bank contains items that have been studied carefully in terms of model fit and DIF. Questions appear to measure a cognitive unidimensional construct, 'entry-level reasoning skills in professionalism', as suggested by goodness-of-fit statistics. The lack of items exhibiting DIF is encouraging in a contemporary high-stakes admission setting where candidates of diverse personal, cultural and academic backgrounds are assessed by common means. This IRT approach has potential to provide assessment designers with a quality control procedure that extends to the level of checklist items.

  17. The Nursing Home Physical Performance Test: A Secondary Data Analysis of Women in Long-Term Care Using Item Response Theory.

    PubMed

    Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L

    2017-04-11

    The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Dual-process models of associative recognition in young and older adults: evidence from receiver operating characteristics.

    PubMed

    Healy, Michael R; Light, Leah L; Chung, Christie

    2005-07-01

    In 3 experiments, young and older adults studied lists of unrelated word pairs and were given confidence-rated item and associative recognition tests. Several different models of recognition were fit to the confidence-rating data using techniques described by S. Macho (2002, 2004). Concordant with previous findings, item recognition data were best fit by an unequal-variance signal detection theory model for both young and older adults. For both age groups, associative recognition performance was best explained by models incorporating both recollection and familiarity components. Examination of parameter estimates supported the conclusion that recollection is reduced in old age, but inferences about age differences in familiarity were highly model dependent. Implications for dual-process models of memory in old age are discussed. ((c) 2005 APA, all rights reserved).

  19. Accounting for Parcel-Allocation Variability in Practice: Combining Sources of Uncertainty and Choosing the Number of Allocations.

    PubMed

    Sterba, Sonya K; Rights, Jason D

    2016-01-01

    Item parceling remains widely used under conditions that can lead to parcel-allocation variability in results. Hence, researchers may be interested in quantifying and accounting for parcel-allocation variability within sample. To do so in practice, three key issues need to be addressed. First, how can we combine sources of uncertainty arising from sampling variability and parcel-allocation variability when drawing inferences about parameters in structural equation models? Second, on what basis can we choose the number of repeated item-to-parcel allocations within sample? Third, how can we diagnose and report proportions of total variability per estimate arising due to parcel-allocation variability versus sampling variability? This article addresses these three methodological issues. Developments are illustrated using simulated and empirical examples, and software for implementing them is provided.

  20. Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test

    ERIC Educational Resources Information Center

    Ho, Tsung-Han; Dodd, Barbara G.

    2012-01-01

    In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler…

  1. Estimating Ordinal Reliability for Likert-Type and Ordinal Item Response Data: A Conceptual, Empirical, and Practical Guide

    ERIC Educational Resources Information Center

    Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.

    2012-01-01

    This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…

  2. A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Weissman, Alexander

    2006-01-01

    A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level ([theta]) estimation and vice versa. When discrepancies exist between an examinee's estimated and true [theta] levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with…

  3. IRTPRO 2.1 for Windows (Item Response Theory for Patient-Reported Outcomes)

    ERIC Educational Resources Information Center

    Paek, Insu; Han, Kyung T.

    2013-01-01

    This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data. (Contains 1 figure and 2 notes.)

  4. A Note on Item-Restscore Association in Rasch Models

    ERIC Educational Resources Information Center

    Kreiner, Svend

    2011-01-01

    To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…

  5. Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

    ERIC Educational Resources Information Center

    He, Wei; Reckase, Mark D.

    2014-01-01

    For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…

  6. Large Sample Confidence Intervals for Item Response Theory Reliability Coefficients

    ERIC Educational Resources Information Center

    Andersson, Björn; Xin, Tao

    2018-01-01

    In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…

  7. Stability of INFIT and OUTFIT Compared to Simulated Estimates in Applied Setting.

    PubMed

    Hodge, Kari J; Morgan, Grant B

    Residual-based fit statistics are commonly used as an indication of the extent to which the item response data fit the Rash model. Fit statistic estimates are influenced by sample size and rules-of thumb estimates may result in incorrect conclusions about the extent to which the model fits the data. Estimates obtained in this analysis were compared to 250 simulated data sets to examine the stability of the estimates. All INFIT estimates were within the rule-of-thumb range of 0.7 to 1.3. However, only 82% of the INFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's INFIT distributions using this 95% confidence-like interval. This is a 18 percentage point difference in items that were classified as acceptable. Fourty-eight percent of OUTFIT estimates fell within the 0.7 to 1.3 rule- of-thumb range. Whereas 34% of OUTFIT estimates fell within the 2.5th and 97.5th percentile of the simulated item's OUTFIT distributions. This is a 13 percentage point difference in items that were classified as acceptable. When using the rule-of- thumb ranges for fit estimates the magnitude of misfit was smaller than with the 95% confidence interval of the simulated distribution. The findings indicate that the use of confidence intervals as critical values for fit statistics leads to different model data fit conclusions than traditional rule of thumb critical values.

  8. Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating. Research Report. ETS RR-12-09

    ERIC Educational Resources Information Center

    Li, Yanmei

    2012-01-01

    In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…

  9. A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Thurman, Carol

    2009-01-01

    The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…

  10. Meteor Crater (Barringer Meteorite Crater), Arizona: Summary of Impact Conditions

    NASA Astrophysics Data System (ADS)

    Roddy, D. J.; Shoemaker, E. M.

    1995-09-01

    Meteor Crater in northern Arizona represents the most abundant type of impact feature in our Solar System, i.e., the simple bowl-shaped crater. Excellent exposures and preservation of this large crater and its ejecta blanket have made it a critical data set in both terrestrial and planetary cratering research. Recognition of the value of the crater was initiated in the early 1900's by Daniel Moreau Barringer, whose 27 years of exploration championed its impact origin [1]. In 1960, Shoemaker presented information that conclusively demonstrated that Meteor Crater was formed by hypervelocity impact [2]. This led the U.S. Geological Survey to use the crater extensively in the 1960-70's as a prime training site for the Apollo astronauts. Today, Meteor Crater continues to serve as an important research site for the international science community, as well as an educational site for over 300,000 visitors per year. Since the late 1950's, studies of this crater have presented an increasingly clearer view of this impact and its effects and have provided an improved view of impact cratering in general. To expand on this data set, we are preparing an upgraded summary on the Meteor Crater event following the format in [3], including information and interpretations on: 1) Inferred origin and age of the impacting body, 2) Inferred ablation and deceleration history in Earth's atmosphere, 3) Estimated speed, trajectory, angle of impact, and bow shock conditions, 4) Estimated coherence, density, size, and mass of impacting body, 5) Composition of impacting body (Canyon Diablo meteorite), 6) Estimated kinetic energy coupled to target rocks and atmosphere, 7) Terrain conditions at time of impact and age of impact, 8) Estimated impact dynamics, such as pressures in air, meteorite, and rocks, 9) Inferred and estimated material partitioning into vapor, melt, and fragments, 10) Crater and near-field ejecta parameters, 11) Rock unit distributions in ejecta blanket, 12) Estimated far-field rock and meteorite ejecta parameters, 13) Inferred and estimated cloud-rise and fall-out conditions, 14) Late-stage meteorite falls after impact, 15) Estimated damage effect ranges, 16) Erosion of crater and ejecta blanket, 17) New topographic and digital maps of crater and ejecta blanket, 18) Other. (Suggestions are welcome) This compilation will contain expanded discussions of new data as well as revised interpretations of existing information. For example in Item 1, we suggest the impacting body most likely formed during a collision in the main asteroid belt that fragmented the iron-nickel core of an asteroid some 0.5 billion years ago. The fragments remained in space until about 50,000+/-3000 yrs ago, when they were captured by the Earth's gravitational field. In Item 3, the trajectory of the impacting body is interpreted by EMS as traveling north-northwest at a relatively low impact angle. The presence of both shocked meteorite fragments and melt spherules indicate the meteorite had a velocity in the range of about 13 to 20 km/s, probably in the lower part of this range [4]. In Item 4, the coherent meteorite diameter is estimated to have been 45 to 50 m with a mass of 300,000 to 400,000 tons, i.e., large enough to experience less than 1% in both mass ablation and velocity deceleration. During this time, minor flake-off of the meteorite's exterior produced a limited number of smaller fragments that followed the main mass to the impact site but at greatly reduced velocities. In Item 6, we estimate the kinetic energy of impact to be in the range of 20 to 40 Mt depending on the energy coupling functions used and corrections for angle of oblique impact. At impact, terrain conditions were about as we see them today, a gently rolling plain with outcrops of Moenkopi and a meter or so of soil cover. In Item 18, EMS estimates production of a Meteor Crater-size event should occur on the continents about every 50,000 years; interestingly, this is the age of Meteor Crater. References: [1] Barringer D. M. (1906) Proc. Acad. Nat. Sci. Philadelphia, 57, 861-886. [2] Shoemaker E. M. (1960) Intl. Geol. Congress, Rept. 18, 418-434. [3] Roddy D. J. (1978) Proc. LPS 9th, 3891-3930. [4] Roddy D. J. et al. (1980) Proc. LPSC 11th, 2275-2307.

  11. Methods for Linking Item Parameters.

    DTIC Science & Technology

    1981-08-01

    within and across data sets; all proportion-correct distributions were quite platykurtic . Biserial item-total correlations had relatively consistent...would produce a distribution of a parameters which had a larger mean and standard deviation, was more positively skewed, and was somewhat more platykurtic

  12. Internal consistency of the self-reporting questionnaire-20 in occupational groups

    PubMed Central

    Santos, Kionna Oliveira Bernardes; Carvalho, Fernando Martins; de Araújo, Tânia Maria

    2016-01-01

    ABSTRACT OBJECTIVE To assess the internal consistency of the measurements of the Self-Reporting Questionnaire (SRQ-20) in different occupational groups. METHODS A validation study was conducted with data from four surveys with groups of workers, using similar methods. A total of 9,959 workers were studied. In all surveys, the common mental disorders were assessed via SRQ-20. The internal consistency considered the items belonging to dimensions extracted by tetrachoric factor analysis for each study. Item homogeneity assessment compared estimates of Cronbach’s alpha (KD-20), the alpha applied to a tetrachoric correlation matrix and stratified Cronbach’s alpha. RESULTS The SRQ-20 dimensions showed adequate values, considering the reference parameters. The internal consistency of the instrument items, assessed by stratified Cronbach’s alpha, was high (> 0.80) in the four studies. CONCLUSIONS The SRQ-20 showed good internal consistency in the professional categories evaluated. However, there is still a need for studies using alternative methods and additional information able to refine the accuracy of latent variable measurement instruments, as in the case of common mental disorders. PMID:27007682

  13. Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

    ERIC Educational Resources Information Center

    Sydorenko, Tetyana

    2011-01-01

    This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…

  14. Effects of Content Balancing and Item Selection Method on Ability Estimation in Computerized Adaptive Tests

    ERIC Educational Resources Information Center

    Sahin, Alper; Ozbasi, Durmus

    2017-01-01

    Purpose: This study aims to reveal effects of content balancing and item selection method on ability estimation in computerized adaptive tests by comparing Fisher's maximum information (FMI) and likelihood weighted information (LWI) methods. Research Methods: Four groups of examinees (250, 500, 750, 1000) and a bank of 500 items with 10 different…

  15. Characterizing Sources of Uncertainty in Item Response Theory Scale Scores

    ERIC Educational Resources Information Center

    Yang, Ji Seung; Hansen, Mark; Cai, Li

    2012-01-01

    Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…

  16. Estimating the Reliability of a Test Battery Composite or a Test Score Based on Weighted Item Scoring

    ERIC Educational Resources Information Center

    Feldt, Leonard S.

    2004-01-01

    In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.

  17. A Quasi-Parametric Method for Fitting Flexible Item Response Functions

    ERIC Educational Resources Information Center

    Liang, Longjuan; Browne, Michael W.

    2015-01-01

    If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…

  18. Measuring positive and negative affect in older adults over 56 days: comparing trait level scoring methods using the partial credit model.

    PubMed

    Erbacher, Monica K; Schmidt, Karen M; Boker, Steven M; Bergeman, Cindy S

    2012-01-01

    Positive (PA) and negative affect (NA) are important constructs in health and well-being research. Good longitudinal measurement is crucial to conducting meaningful research on relationships between affect, health, and well-being across the lifespan. One common affect measure, the PANAS, has been evaluated thoroughly with factor analysis, but not with Racsh-based latent trait models (RLTMs) such as the Partial Credit Model (PCM), and not longitudinally. Current longitudinal RLTMs can computationally handle few occasions of data. The present study compares four methods of anchoring PCMs across 56 occasions to longitudinally evaluate the psychometric properties of the PANAS plus additional items. Anchoring item parameters on mean parameter values across occasions produced more desirable results than using no anchor, using first occasion parameters as anchors, or allowing anchor values to vary across occasions. Results indicated problems with NA items, including poor category utilization, gaps in the item distribution, and a lack of easy-to-endorse items. PA items had much more desirable psychometric qualities.

  19. Item Response Theory Using Hierarchical Generalized Linear Models

    ERIC Educational Resources Information Center

    Ravand, Hamdollah

    2015-01-01

    Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…

  20. IRT Item Parameter Scaling for Developing New Item Pools

    ERIC Educational Resources Information Center

    Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua

    2017-01-01

    Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

  1. An Effect Size Measure for Raju's Differential Functioning for Items and Tests

    ERIC Educational Resources Information Center

    Wright, Keith D.; Oshima, T. C.

    2015-01-01

    This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…

  2. Lord's Wald Test for Detecting Dif in Multidimensional Irt Models: A Comparison of Two Estimation Approaches

    ERIC Educational Resources Information Center

    Lee, Soo; Suh, Youngsuk

    2018-01-01

    Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…

  3. An Item Response Theory (IRT) analysis of the Short Inventory of Problems-Alcohol and Drugs (SIP-AD) among non-treatment seeking men-who-have-sex-with-men: evidence for a shortened 10-item SIP-AD.

    PubMed

    Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E

    2009-11-01

    The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.

  4. Item Parameter Changes and Equating: An Examination of the Effects of Lack of Item Parameter Invariance on Equating and Score Accuracy for Different Proficiency Levels

    ERIC Educational Resources Information Center

    Store, Davie

    2013-01-01

    The impact of particular types of context effects on actual scores is less understood although there has been some research carried out regarding certain types of context effects under the nonequivalent anchor test (NEAT) design. In addition, the issue of the impact of item context effects on scores has not been investigated extensively when item…

  5. Automatic portion estimation and visual refinement in mobile dietary assessment

    PubMed Central

    Woo, Insoo; Otsmo, Karl; Kim, SungYe; Ebert, David S.; Delp, Edward J.; Boushey, Carol J.

    2011-01-01

    As concern for obesity grows, the need for automated and accurate methods to monitor nutrient intake becomes essential as dietary intake provides a valuable basis for managing dietary imbalance. Moreover, as mobile devices with built-in cameras have become ubiquitous, one potential means of monitoring dietary intake is photographing meals using mobile devices and having an automatic estimate of the nutrient contents returned. One of the challenging problems of the image-based dietary assessment is the accurate estimation of food portion size from a photograph taken with a mobile digital camera. In this work, we describe a method to automatically calculate portion size of a variety of foods through volume estimation using an image. These “portion volumes” utilize camera parameter estimation and model reconstruction to determine the volume of food items, from which nutritional content is then extrapolated. In this paper, we describe our initial results of accuracy evaluation using real and simulated meal images and demonstrate the potential of our approach. PMID:22242198

  6. Automatic portion estimation and visual refinement in mobile dietary assessment

    NASA Astrophysics Data System (ADS)

    Woo, Insoo; Otsmo, Karl; Kim, SungYe; Ebert, David S.; Delp, Edward J.; Boushey, Carol J.

    2010-01-01

    As concern for obesity grows, the need for automated and accurate methods to monitor nutrient intake becomes essential as dietary intake provides a valuable basis for managing dietary imbalance. Moreover, as mobile devices with built-in cameras have become ubiquitous, one potential means of monitoring dietary intake is photographing meals using mobile devices and having an automatic estimate of the nutrient contents returned. One of the challenging problems of the image-based dietary assessment is the accurate estimation of food portion size from a photograph taken with a mobile digital camera. In this work, we describe a method to automatically calculate portion size of a variety of foods through volume estimation using an image. These "portion volumes" utilize camera parameter estimation and model reconstruction to determine the volume of food items, from which nutritional content is then extrapolated. In this paper, we describe our initial results of accuracy evaluation using real and simulated meal images and demonstrate the potential of our approach.

  7. A Comparison of Strategies for Estimating Conditional DIF

    ERIC Educational Resources Information Center

    Moses, Tim; Miao, Jing; Dorans, Neil J.

    2010-01-01

    In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…

  8. An NCME Instructional Module on Estimating Item Response Theory Models Using Markov Chain Monte Carlo Methods

    ERIC Educational Resources Information Center

    Kim, Jee-Seon; Bolt, Daniel M.

    2007-01-01

    The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…

  9. Large capacity temporary visual memory.

    PubMed

    Endress, Ansgar D; Potter, Mary C

    2014-04-01

    Visual working memory (WM) capacity is thought to be limited to 3 or 4 items. However, many cognitive activities seem to require larger temporary memory stores. Here, we provide evidence for a temporary memory store with much larger capacity than past WM capacity estimates. Further, based on previous WM research, we show that a single factor--proactive interference--is sufficient to bring capacity estimates down to the range of previous WM capacity estimates. Participants saw a rapid serial visual presentation of 5-21 pictures of familiar objects or words presented at rates of 4/s or 8/s, respectively, and thus too fast for strategies such as rehearsal. Recognition memory was tested with a single probe item. When new items were used on all trials, no fixed memory capacities were observed, with estimates of up to 9.1 retained pictures for 21-item lists, and up to 30.0 retained pictures for 100-item lists, and no clear upper bound to how many items could be retained. Further, memory items were not stored in a temporally stable form of memory but decayed almost completely after a few minutes. In contrast, when, as in most WM experiments, a small set of items was reused across all trials, thus creating proactive interference among items, capacity remained in the range reported in previous WM experiments. These results show that humans have a large-capacity temporary memory store in the absence of proactive interference, and raise the question of whether temporary memory in everyday cognitive processing is severely limited, as in WM experiments, or has the much larger capacity found in the present experiments.

  10. Fitting the Rasch Model to Account for Variation in Item Discrimination

    ERIC Educational Resources Information Center

    Weitzman, R. A.

    2009-01-01

    Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…

  11. Two Approaches to Estimation of Classification Accuracy Rate under Item Response Theory

    ERIC Educational Resources Information Center

    Lathrop, Quinn N.; Cheng, Ying

    2013-01-01

    Within the framework of item response theory (IRT), there are two recent lines of work on the estimation of classification accuracy (CA) rate. One approach estimates CA when decisions are made based on total sum scores, the other based on latent trait estimates. The former is referred to as the Lee approach, and the latter, the Rudner approach,…

  12. Inverse MDS: Inferring Dissimilarity Structure from Multiple Item Arrangements

    PubMed Central

    Kriegeskorte, Nikolaus; Mur, Marieke

    2012-01-01

    The pairwise dissimilarities of a set of items can be intuitively visualized by a 2D arrangement of the items, in which the distances reflect the dissimilarities. Such an arrangement can be obtained by multidimensional scaling (MDS). We propose a method for the inverse process: inferring the pairwise dissimilarities from multiple 2D arrangements of items. Perceptual dissimilarities are classically measured using pairwise dissimilarity judgments. However, alternative methods including free sorting and 2D arrangements have previously been proposed. The present proposal is novel (a) in that the dissimilarity matrix is estimated by “inverse MDS” based on multiple arrangements of item subsets, and (b) in that the subsets are designed by an adaptive algorithm that aims to provide optimal evidence for the dissimilarity estimates. The subject arranges the items (represented as icons on a computer screen) by means of mouse drag-and-drop operations. The multi-arrangement method can be construed as a generalization of simpler methods: It reduces to pairwise dissimilarity judgments if each arrangement contains only two items, and to free sorting if the items are categorically arranged into discrete piles. Multi-arrangement combines the advantages of these methods. It is efficient (because the subject communicates many dissimilarity judgments with each mouse drag), psychologically attractive (because dissimilarities are judged in context), and can characterize continuous high-dimensional dissimilarity structures. We present two procedures for estimating the dissimilarity matrix: a simple weighted-aligned-average of the partial dissimilarity matrices and a computationally intensive algorithm, which estimates the dissimilarity matrix by iteratively minimizing the error of MDS-predictions of the subject’s arrangements. The Matlab code for interactive arrangement and dissimilarity estimation is available from the authors upon request. PMID:22848204

  13. Evidence against global attention filters selective for absolute bar-orientation in human vision.

    PubMed

    Inverso, Matthew; Sun, Peng; Chubb, Charles; Wright, Charles E; Sperling, George

    2016-01-01

    The finding that an item of type A pops out from an array of distractors of type B typically is taken to support the inference that human vision contains a neural mechanism that is activated by items of type A but not by items of type B. Such a mechanism might be expected to yield a neural image in which items of type A produce high activation and items of type B low (or zero) activation. Access to such a neural image might further be expected to enable accurate estimation of the centroid of an ensemble of items of type A intermixed with to-be-ignored items of type B. Here, it is shown that as the number of items in stimulus displays is increased, performance in estimating the centroids of horizontal (vertical) items amid vertical (horizontal) distractors degrades much more quickly and dramatically than does performance in estimating the centroids of white (black) items among black (white) distractors. Together with previous findings, these results suggest that, although human vision does possess bottom-up neural mechanisms sensitive to abrupt local changes in bar-orientation, and although human vision does possess and utilize top-down global attention filters capable of selecting multiple items of one brightness or of one color from among others, it cannot use a top-down global attention filter capable of selecting multiple bars of a given absolute orientation and filtering bars of the opposite orientation in a centroid task.

  14. Maximum Likelihood Item Easiness Models for Test Theory Without an Answer Key

    PubMed Central

    Batchelder, William H.

    2014-01-01

    Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce two extensions to the basic model in order to account for item rating easiness/difficulty. The first extension is a multiplicative model and the second is an additive model. We show how the multiplicative model is related to the Rasch model. We describe several maximum-likelihood estimation procedures for the models and discuss issues of model fit and identifiability. We describe how the CCT models could be used to give alternative consensus-based measures of reliability. We demonstrate the utility of both the basic and extended models on a set of essay rating data and give ideas for future research. PMID:29795812

  15. Calibration of the Test of Relational Reasoning.

    PubMed

    Dumas, Denis; Alexander, Patricia A

    2016-10-01

    Relational reasoning, or the ability to discern meaningful patterns within a stream of information, is a critical cognitive ability associated with academic and professional success. Importantly, relational reasoning has been described as taking multiple forms, depending on the type of higher order relations being drawn between and among concepts. However, the reliable and valid measurement of such a multidimensional construct of relational reasoning has been elusive. The Test of Relational Reasoning (TORR) was designed to tap 4 forms of relational reasoning (i.e., analogy, anomaly, antinomy, and antithesis). In this investigation, the TORR was calibrated and scored using multidimensional item response theory in a large, representative undergraduate sample. The bifactor model was identified as the best-fitting model, and used to estimate item parameters and construct reliability. To improve the usefulness of the TORR to educators, scaled scores were also calculated and presented. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  16. RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT.

    PubMed

    Carlis, John; Bruso, Kelsey

    2012-03-01

    Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n(2)) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing.

  17. RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT

    PubMed Central

    Bruso, Kelsey

    2012-01-01

    Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n2) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing. PMID:22773923

  18. Internet Gaming Disorder as a formative construct: Implications for conceptualization and measurement.

    PubMed

    van Rooij, Antonius J; Van Looy, Jan; Billieux, Joël

    2017-07-01

    Some people have serious problems controlling their Internet and video game use. The DSM-5 now includes a proposal for 'Internet Gaming Disorder' (IGD) as a condition in need of further study. Various studies aim to validate the proposed diagnostic criteria for IGD and multiple new scales have been introduced that cover the suggested criteria. Using a structured approach, we demonstrate that IGD might be better interpreted as a formative construct, as opposed to the current practice of conceptualizing it as a reflective construct. Incorrectly approaching a formative construct as a reflective one causes serious problems in scale development, including: (i) incorrect reliance on item-to-total scale correlation to exclude items and incorrectly relying on indices of inter-item reliability that do not fit the measurement model (e.g., Cronbach's α); (ii) incorrect interpretation of composite or mean scores that assume all items are equal in contributing value to a sum score; and (iii) biased estimation of model parameters in statistical models. We show that these issues are impacting current validation efforts through two recent examples. A reinterpretation of IGD as a formative construct has broad consequences for current validation efforts and provides opportunities to reanalyze existing data. We discuss three broad implications for current research: (i) composite latent constructs should be defined and used in models; (ii) item exclusion and selection should not rely on item-to-total scale correlations; and (iii) existing definitions of IGD should be enriched further. © 2016 The Authors. Psychiatry and Clinical Neurosciences © 2016 Japanese Society of Psychiatry and Neurology.

  19. A signal detection-item response theory model for evaluating neuropsychological measures.

    PubMed

    Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

    2018-02-05

    Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.

  20. Development of the Sexual Minority Adolescent Stress Inventory

    PubMed Central

    Schrager, Sheree M.; Goldbach, Jeremy T.; Mamey, Mary Rose

    2018-01-01

    Although construct measurement is critical to explanatory research and intervention efforts, rigorous measure development remains a notable challenge. For example, though the primary theoretical model for understanding health disparities among sexual minority (e.g., lesbian, gay, bisexual) adolescents is minority stress theory, nearly all published studies of this population rely on minority stress measures with poor psychometric properties and development procedures. In response, we developed the Sexual Minority Adolescent Stress Inventory (SMASI) with N = 346 diverse adolescents ages 14–17, using a comprehensive approach to de novo measure development designed to produce a measure with desirable psychometric properties. After exploratory factor analysis on 102 candidate items informed by a modified Delphi process, we applied item response theory techniques to the remaining 72 items. Discrimination and difficulty parameters and item characteristic curves were estimated overall, within each of 12 initially derived factors, and across demographic subgroups. Two items were removed for excessive discrimination and three were removed following reliability analysis. The measure demonstrated configural and scalar invariance for gender and age; a three-item factor was excluded for demonstrating substantial differences by sexual identity and race/ethnicity. The final 64-item measure comprised 11 subscales and demonstrated excellent overall (α = 0.98), subscale (α range 0.75–0.96), and test–retest (scale r > 0.99; subscale r range 0.89–0.99) reliabilities. Subscales represented a mix of proximal and distal stressors, including domains of internalized homonegativity, identity management, intersectionality, and negative expectancies (proximal) and social marginalization, family rejection, homonegative climate, homonegative communication, negative disclosure experiences, religion, and work domains (distal). Thus, the SMASI development process illustrates a method to incorporate information from multiple sources, including item response theory models, to guide item selection in building a psychometrically sound measure. We posit that similar methods can be used to improve construct measurement across all areas of psychological research, particularly in areas where a strong theoretical framework exists but existing measures are limited. PMID:29599737

  1. Item Response Theory and Health Outcomes Measurement in the 21st Century

    PubMed Central

    Hays, Ron D.; Morales, Leo S.; Reise, Steve P.

    2006-01-01

    Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088

  2. Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2011-01-01

    A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…

  3. Sex differences in memory estimates for pictures and words with multiple recall trials.

    PubMed

    Ionescu, Marcos D

    2004-04-01

    Undergraduate students (23 men and 23 women) provided memory performance estimates before and after each of three recall trials involving 80 stimuli (40 pictures and 40 words). No sex differences were found across trials for the total recall of items or for the recall of pictures and words separately. A significant increase in recall for pictures (not words) was found for both sexes across trials. The previous results of Ionescu were replicated on the first and second recall trials: men underestimated their performance on the pictures and women underestimated their performance on the word items. These differences in postrecall estimates were not found after the third recall trial: men and women alike underestimated their performance on both the picture and word items. The disappearance of item-specific sex differences in postrecall estimates for the third recall trial does not imply that men and women become more accurate at estimating their actual performance with multiple recall trials.

  4. Automatic Item Generation of Probability Word Problems

    ERIC Educational Resources Information Center

    Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

    2009-01-01

    Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…

  5. A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

    ERIC Educational Resources Information Center

    Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.

    2010-01-01

    Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…

  6. Item Response Theory with Estimation of the Latent Density Using Davidian Curves

    ERIC Educational Resources Information Center

    Woods, Carol M.; Lin, Nan

    2009-01-01

    Davidian-curve item response theory (DC-IRT) is introduced, evaluated with simulations, and illustrated using data from the Schedule for Nonadaptive and Adaptive Personality Entitlement scale. DC-IRT is a method for fitting unidimensional IRT models with maximum marginal likelihood estimation, in which the latent density is estimated,…

  7. Bi-Factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification

    ERIC Educational Resources Information Center

    Md Desa, Zairul Nor Deana

    2012-01-01

    In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…

  8. Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

    PubMed

    Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

    2013-11-01

    To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.

  9. Variance Difference between Maximum Likelihood Estimation Method and Expected A Posteriori Estimation Method Viewed from Number of Test Items

    ERIC Educational Resources Information Center

    Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S.

    2016-01-01

    The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…

  10. Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

    ERIC Educational Resources Information Center

    Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

    2015-01-01

    Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…

  11. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  12. A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis Revision 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Narlesky, Joshua Edward; Kelly, Elizabeth J.

    2015-09-10

    This report documents the new PG calibration regression equation. These calibration equations incorporate new data that have become available since revision 1 of “A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis” was issued [3] The calibration equations are based on a weighted least squares (WLS) approach for the regression. The WLS method gives each data point its proper amount of influence over the parameter estimates. This gives two big advantages, more precise parameter estimates and better and more defensible estimates of uncertainties. The WLS approach makes sense both statistically and experimentally because themore » variances increase with concentration, and there are physical reasons that the higher measurements are less reliable and should be less influential. The new magnesium calibration includes a correction for sodium and separate calibration equation for items with and without chlorine. These additional calibration equations allow for better predictions and smaller uncertainties for sodium in materials with and without chlorine. Chlorine and sodium have separate equations for RICH materials. Again, these equations give better predictions and smaller uncertainties chlorine and sodium for RICH materials.« less

  13. Large capacity temporary visual memory

    PubMed Central

    Endress, Ansgar D.; Potter, Mary C.

    2014-01-01

    Visual working memory (WM) capacity is thought to be limited to three or four items. However, many cognitive activities seem to require larger temporary memory stores. Here, we provide evidence for a temporary memory store with much larger capacity than past WM capacity estimates. Further, based on previous WM research, we show that a single factor — proactive interference — is sufficient to bring capacity estimates down to the range of previous WM capacity estimates. Participants saw a rapid serial visual presentation (RSVP) of 5 to 21 pictures of familiar objects or words presented at rates of 4/s or 8/s, respectively, and thus too fast for strategies such as rehearsal. Recognition memory was tested with a single probe item. When new items were used on all trials, no fixed memory capacities were observed, with estimates of up to 9.1 retained pictures for 21-item lists, and up to 30.0 retained pictures for 100-item lists, and no clear upper bound to how many items could be retained. Further, memory items were not stored in a temporally stable form of memory, but decayed almost completely after a few minutes. In contrast, when, as in most WM experiments, a small set of items was reused across all trials, thus creating proactive interference among items, capacity remained in the range reported in previous WM experiments. These results show that humans have a large-capacity temporary memory store in the absence of proactive interference, and raise the question of whether temporary memory in everyday cognitive processing is severely limited as in WM experiments, or has the much larger capacity found in the present experiments. PMID:23937181

  14. The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.

    ERIC Educational Resources Information Center

    Finney, Sara J.; Smith, Russell W.; Wise, Steven L.

    Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…

  15. Optimization of injection molding process parameters for a plastic cell phone housing component

    NASA Astrophysics Data System (ADS)

    Rajalingam, Sokkalingam; Vasant, Pandian; Khe, Cheng Seong; Merican, Zulkifli; Oo, Zeya

    2016-11-01

    To produce thin-walled plastic items, injection molding process is one of the most widely used application tools. However, to set optimal process parameters is difficult as it may cause to produce faulty items on injected mold like shrinkage. This study aims at to determine such an optimum injection molding process parameters which can reduce the fault of shrinkage on a plastic cell phone cover items. Currently used setting of machines process produced shrinkage and mis-specified length and with dimensions below the limit. Thus, for identification of optimum process parameters, maintaining closer targeted length and width setting magnitudes with minimal variations, more experiments are needed. The mold temperature, injection pressure and screw rotation speed are used as process parameters in this research. For optimal molding process parameters the Response Surface Methods (RSM) is applied. The major contributing factors influencing the responses were identified from analysis of variance (ANOVA) technique. Through verification runs it was found that the shrinkage defect can be minimized with the optimal setting found by RSM.

  16. An Experimental Study of the Effect of Judges' Knowledge of Item Data on Two Forms of the Angoff Standard Setting Method.

    ERIC Educational Resources Information Center

    Garrido, Mariquita; Payne, David A.

    Minimum competency cut-off scores on a statistics exam were estimated under four conditions: the Angoff judging method with item data (n=20), and without data available (n=19); and the Modified Angoff method with (n=19), and without (n=19) item data available to judges. The Angoff method required free response percentage estimates (0-100) percent,…

  17. Time Requirements for the Different Item Types Proposed for Use in the Revised SAT®. Research Report No. 2007-3. ETS RR-07-35

    ERIC Educational Resources Information Center

    Bridgeman, Brent; Laitusis, Cara Cahalan; Cline, Frederick

    2007-01-01

    The current study used three data sources to estimate time requirements for different item types on the now current SAT Reasoning Test™. First, we estimated times from a computer-adaptive version of the SAT® (SAT CAT) that automatically recorded item times. Second, we observed students as they answered SAT questions under strict time limits and…

  18. Designing P-Optimal Item Pools in Computerized Adaptive Tests with Polytomous Items

    ERIC Educational Resources Information Center

    Zhou, Xuechun

    2012-01-01

    Current CAT applications consist of predominantly dichotomous items, and CATs with polytomously scored items are limited. To ascertain the best approach to polytomous CAT, a significant amount of research has been conducted on item selection, ability estimation, and impact of termination rules based on polytomous IRT models. Few studies…

  19. Gender-Based Differential Item Performance in Mathematics Achievement Items.

    ERIC Educational Resources Information Center

    Doolittle, Allen E.; Cleary, T. Anne

    1987-01-01

    Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)

  20. On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models.

    PubMed

    Magis, David

    2014-11-01

    In item response theory, the classical estimators of ability are highly sensitive to response disturbances and can return strongly biased estimates of the true underlying ability level. Robust methods were introduced to lessen the impact of such aberrant responses on the estimation process. The computation of asymptotic (i.e., large-sample) standard errors (ASE) for these robust estimators, however, has not yet been fully considered. This paper focuses on a broad class of robust ability estimators, defined by an appropriate selection of the weight function and the residual measure, for which the ASE is derived from the theory of estimating equations. The maximum likelihood (ML) and the robust estimators, together with their estimated ASEs, are then compared in a simulation study by generating random guessing disturbances. It is concluded that both the estimators and their ASE perform similarly in the absence of random guessing, while the robust estimator and its estimated ASE are less biased and outperform their ML counterparts in the presence of random guessing with large impact on the item response process. © 2013 The British Psychological Society.

  1. Multiple choice questions can be designed or revised to challenge learners' critical thinking.

    PubMed

    Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A

    2013-12-01

    Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.

  2. Item Response Theory with Estimation of the Latent Population Distribution Using Spline-Based Densities

    ERIC Educational Resources Information Center

    Woods, Carol M.; Thissen, David

    2006-01-01

    The purpose of this paper is to introduce a new method for fitting item response theory models with the latent population distribution estimated from the data using splines. A spline-based density estimation system provides a flexible alternative to existing procedures that use a normal distribution, or a different functional form, for the…

  3. An Analysis of Variance Approach for the Estimation of Response Time Distributions in Tests

    ERIC Educational Resources Information Center

    Attali, Yigal

    2010-01-01

    Generalizability theory and analysis of variance methods are employed, together with the concept of objective time pressure, to estimate response time distributions and the degree of time pressure in timed tests. By estimating response time variance components due to person, item, and their interaction, and fixed effects due to item types and…

  4. Direct Estimation of Correlation as a Measure of Association Strength Using Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Wang, Wen-Chung

    2004-01-01

    The Pearson correlation is used to depict effect sizes in the context of item response theory. Amultidimensional Rasch model is used to directly estimate the correlation between latent traits. Monte Carlo simulations were conducted to investigate whether the population correlation could be accurately estimated and whether the bootstrap method…

  5. Using Linear Equating to Map PROMIS(®) Global Health Items and the PROMIS-29 V2.0 Profile Measure to the Health Utilities Index Mark 3.

    PubMed

    Hays, Ron D; Revicki, Dennis A; Feeny, David; Fayers, Peter; Spritzer, Karen L; Cella, David

    2016-10-01

    Preference-based health-related quality of life (HR-QOL) scores are useful as outcome measures in clinical studies, for monitoring the health of populations, and for estimating quality-adjusted life-years. This was a secondary analysis of data collected in an internet survey as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)) project. To estimate Health Utilities Index Mark 3 (HUI-3) preference scores, we used the ten PROMIS(®) global health items, the PROMIS-29 V2.0 single pain intensity item and seven multi-item scales (physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, sleep disturbance), and the PROMIS-29 V2.0 items. Linear regression analyses were used to identify significant predictors, followed by simple linear equating to avoid regression to the mean. The regression models explained 48 % (global health items), 61 % (PROMIS-29 V2.0 scales), and 64 % (PROMIS-29 V2.0 items) of the variance in the HUI-3 preference score. Linear equated scores were similar to observed scores, although differences tended to be larger for older study participants. HUI-3 preference scores can be estimated from the PROMIS(®) global health items or PROMIS-29 V2.0. The estimated HUI-3 scores from the PROMIS(®) health measures can be used for economic applications and as a measure of overall HR-QOL in research.

  6. Gender Differential Item Functioning on a National Field-Specific Test: The Case of PhD Entrance Exam of TEFL in Iran

    ERIC Educational Resources Information Center

    Ahmadi, Alireza; Bazvand, Ali Darabi

    2016-01-01

    Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…

  7. Application of a General Polytomous Testlet Model to the Reading Section of a Large-Scale English Language Assessment. Research Report. ETS RR-10-21

    ERIC Educational Resources Information Center

    Li, Yanmei; Li, Shuhong; Wang, Lin

    2010-01-01

    Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…

  8. Classification framework for partially observed dynamical systems

    NASA Astrophysics Data System (ADS)

    Shen, Yuan; Tino, Peter; Tsaneva-Atanasova, Krasimira

    2017-04-01

    We present a general framework for classifying partially observed dynamical systems based on the idea of learning in the model space. In contrast to the existing approaches using point estimates of model parameters to represent individual data items, we employ posterior distributions over model parameters, thus taking into account in a principled manner the uncertainty due to both the generative (observational and/or dynamic noise) and observation (sampling in time) processes. We evaluate the framework on two test beds: a biological pathway model and a stochastic double-well system. Crucially, we show that the classification performance is not impaired when the model structure used for inferring posterior distributions is much more simple than the observation-generating model structure, provided the reduced-complexity inferential model structure captures the essential characteristics needed for the given classification task.

  9. Guessing and the Rasch Model

    ERIC Educational Resources Information Center

    Holster, Trevor A.; Lake, J.

    2016-01-01

    Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

  10. Separability of Item and Person Parameters in Response Time Models.

    ERIC Educational Resources Information Center

    Van Breukelen, Gerard J. P.

    1997-01-01

    Discusses two forms of separability of item and person parameters in the context of response time models. The first is "separate sufficiency," and the second is "ranking independence." For each form a theorem stating sufficient conditions is proved. The two forms are shown to include several cases of models from psychometric…

  11. Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales in school children.

    PubMed

    Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra

    2012-03-13

    Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.

  12. How Small the Number of Test Items Can Be for the Basis of Estimating the Operating Characteristics of the Discrete Responses to Unknown Test Items.

    ERIC Educational Resources Information Center

    Samejima, Fumiko; Changas, Paul S.

    The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…

  13. Use of Matrix Sampling Procedures to Assess Achievement in Solving Open Addition and Subtraction Sentences.

    ERIC Educational Resources Information Center

    Montague, Margariete A.

    This study investigated the feasibility of concurrently and randomly sampling examinees and items in order to estimate group achievement. Seven 32-item tests reflecting a 640-item universe of simple open sentences were used such that item selection (random, systematic) and assignment (random, systematic) of items (four, eight, sixteen) to forms…

  14. An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

    ERIC Educational Resources Information Center

    Ali, Usama S.; Chang, Hua-Hua

    2014-01-01

    Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

  15. Estimating procedure for major highway construction bid item cost : final report.

    DOT National Transportation Integrated Search

    1978-06-01

    The present procedure for estimating construction bid item cost makes use of the quarterly weighted average unit price report coupled with engineering judgement. The limitation to this method is that this report format provides only the lowest bid da...

  16. Item Parameter Drift as an Indication of Differential Opportunity to Learn: An Exploration of Item Flagging Methods & Accurate Classification of Examinees

    ERIC Educational Resources Information Center

    Sukin, Tia M.

    2010-01-01

    The presence of outlying anchor items is an issue faced by many testing agencies. The decision to retain or remove an item is a difficult one, especially when the content representation of the anchor set becomes questionable by item removal decisions. Additionally, the reason for the aberrancy is not always clear, and if the performance of the…

  17. Explaining and Controlling for the Psychometric Properties of Computer-Generated Figural Matrix Items

    ERIC Educational Resources Information Center

    Freund, Philipp Alexander; Hofer, Stefan; Holling, Heinz

    2008-01-01

    Figural matrix items are a popular task type for assessing general intelligence (Spearman's g). Items of this kind can be constructed rationally, allowing the implementation of computerized generation algorithms. In this study, the influence of different task parameters on the degree of difficulty in matrix items was investigated. A sample of N =…

  18. Least Squares Distance Method of Cognitive Validation and Analysis for Binary Items Using Their Item Response Theory Parameters

    ERIC Educational Resources Information Center

    Dimitrov, Dimiter M.

    2007-01-01

    The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…

  19. An Approach to Biased Item Identification Using Latent Trait Measurement Theory.

    ERIC Educational Resources Information Center

    Rudner, Lawrence M.

    Because it is a true score model employing item parameters which are independent of the examined sample, item characteristic curve theory (ICC) offers several advantages over classical measurement theory. In this paper an approach to biased item identification using ICC theory is described and applied. The ICC theory approach is attractive in that…

  20. Wrong-Site Surgery, Retained Surgical Items, and Surgical Fires : A Systematic Review of Surgical Never Events.

    PubMed

    Hempel, Susanne; Maggard-Gibbons, Melinda; Nguyen, David K; Dawes, Aaron J; Miake-Lye, Isomi; Beroes, Jessica M; Booth, Marika J; Miles, Jeremy N V; Shanman, Roberta; Shekelle, Paul G

    2015-08-01

    Serious, preventable surgical events, termed never events, continue to occur despite considerable patient safety efforts. To examine the incidence and root causes of and interventions to prevent wrong-site surgery, retained surgical items, and surgical fires in the era after the implementation of the Universal Protocol in 2004. We searched 9 electronic databases for entries from 2004 through June 30, 2014, screened references, and consulted experts. Two independent reviewers identified relevant publications in June 2014. One reviewer used a standardized form to extract data and a second reviewer checked the data. Strength of evidence was established by the review team. Data extraction was completed in January 2015. Incidence of wrong-site surgery, retained surgical items, and surgical fires. We found 138 empirical studies that met our inclusion criteria. Incidence estimates for wrong-site surgery in US settings varied by data source and procedure (median estimate, 0.09 events per 10,000 surgical procedures). The median estimate for retained surgical items was 1.32 events per 10,000 procedures, but estimates varied by item and procedure. The per-procedure surgical fire incidence is unknown. A frequently reported root cause was inadequate communication. Methodologic challenges associated with investigating changes in rare events limit the conclusions of 78 intervention evaluations. Limited evidence supported the Universal Protocol (5 studies), education (4 studies), and team training (4 studies) interventions to prevent wrong-site surgery. Limited evidence exists to prevent retained surgical items by using data-matrix-coded sponge-counting systems (5 pertinent studies). Evidence for preventing surgical fires was insufficient, and intervention effects were not estimable. Current estimates for wrong-site surgery and retained surgical items are 1 event per 100,000 and 1 event per 10,000 procedures, respectively, but the precision is uncertain, and the per-procedure prevalence of surgical fires is not known. Root-cause analyses suggest the need for improved communication. Despite promising approaches and global Universal Protocol evaluations, empirical evidence for interventions is limited.

  1. Cross-cultural differences in knee functional status outcomes in a polyglot society represented true disparities not biased by differential item functioning.

    PubMed

    Deutscher, Daniel; Hart, Dennis L; Crane, Paul K; Dickstein, Ruth

    2010-12-01

    Comparative effectiveness research across cultures requires unbiased measures that accurately detect clinical differences between patient groups. The purpose of this study was to assess the presence and impact of differential item functioning (DIF) in knee functional status (FS) items administered using computerized adaptive testing (CAT) as a possible cause for observed differences in outcomes between 2 cultural patient groups in a polyglot society. This study was a secondary analysis of prospectively collected data. We evaluated data from 9,134 patients with knee impairments from outpatient physical therapy clinics in Israel. Items were analyzed for DIF related to sex, age, symptom acuity, surgical history, exercise history, and language used to complete the functional survey (Hebrew versus Russian). Several items exhibited DIF, but unadjusted FS estimates and FS estimates that accounted for DIF were essentially equal (intraclass correlation coefficient [2,1]>.999). No individual patient had a difference between unadjusted and adjusted FS estimates as large as the median standard error of the unadjusted estimates. Differences between groups defined by any of the covariates considered were essentially unchanged when using adjusted instead of unadjusted FS estimates. The greatest group-level impact was <0.3% of 1 standard deviation of the unadjusted FS estimates. Complete data where patients answered all items in the scale would have been preferred for DIF analysis, but only CAT data were available. Differences in FS outcomes between groups of patients with knee impairments who answered the knee CAT in Hebrew or Russian in Israel most likely reflected true differences that may reflect societal disparities in this health outcome.

  2. Computerized Adaptive Testing: Overview and Introduction.

    ERIC Educational Resources Information Center

    Meijer, Rob R.; Nering, Michael L.

    1999-01-01

    Provides an overview of computerized adaptive testing (CAT) and introduces contributions to this special issue. CAT elements discussed include item selection, estimation of the latent trait, item exposure, measurement precision, and item-bank development. (SLD)

  3. A novel method for expediting the development of patient-reported outcome measures and an evaluation across several populations

    PubMed Central

    Garrard, Lili; Price, Larry R.; Bott, Marjorie J.; Gajewski, Byron J.

    2016-01-01

    Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts’ bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts’ information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts’ content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development. PMID:27667878

  4. A novel method for expediting the development of patient-reported outcome measures and an evaluation across several populations.

    PubMed

    Garrard, Lili; Price, Larry R; Bott, Marjorie J; Gajewski, Byron J

    2016-10-01

    Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts' bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts' information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts' content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development.

  5. Readability and Comprehension of the Geriatric Depression Scale and PROMIS® Physical Function Items in Older African Americans and Latinos

    PubMed Central

    Paz, Sylvia H.; Jones, Loretta; Calderón, José L.; Hays, Ron D.

    2016-01-01

    Background Depression and physical function are especially important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Function Item Bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. Objective To estimate the readability of the GDS and PROMIS® Physical Function items and to assess their comprehensibility by a sample of African American and Latino elderly. Methods Readability was estimated using the Flesch-Kincaid (F-K) and Flesch-Reading-Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS items by minority elderly was evaluated with 30 cognitive interviews. Results Readability estimates of a number of items in English and Spanish of the GDS and PROMIS physical functioning items exceed the recommended 5th grade level, or were rated as fairly difficult, difficult, or very difficult to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS items was considered confusing and responses potentially uninterpretable because they were based on physical aids. Conclusions Problems with item wording and response options of the GDS and PROMIS Physical Function items may negatively affect reliability and validity of measurement when used with minority elderly. PMID:27599978

  6. Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

    PubMed

    Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

    2006-11-01

    We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.

  7. Can health care providers recognise a fibromyalgia personality?

    PubMed

    Da Silva, José A P; Jacobs, Johannes W G; Branco, Jaime C; Canaipa, Rita; Gaspar, M Filomena; Griep, Ed N; van Helmond, Toon; Oliveira, Paula J; Zijlstra, Theo J; Geenen, Rinie

    2017-01-01

    To determine if experienced health care providers (HCPs) can recognise patients with fibromyalgia (FM) based on a limited set of personality items, exploring the existence of a FM personality. From the 240-item NEO-PI-R personality questionnaire, 8 HCPs from two different countries each selected 20 items they considered most discriminative of FM personality. Then, evaluating the scores on these items of 129 female patients with FM and 127 female controls, each HCP rated the probability of FM for each individual on a 0-10 scale. Personality characteristics (domains and facets) of selected items were determined. Scores of patients with FM and controls on the eight 20-item sets, and HCPs' estimates of each individual's probability of FM were analysed for their discriminative value. The eight 20-item sets discriminated for FM, with areas under the receiver operating characteristic curve ranging from 0.71-0.81. The estimated probabilities for FM showed, in general, percentages of correct classifications above 50%, with rising correct percentages for higher estimated probabilities. The most often chosen and discriminatory items were predominantly of the domain neuroticism (all with higher scores in FM), followed by some items of the facet trust (lower scores in FM). HCPs can, based on a limited set of items from a personality questionnaire, distinguish patients with FM from controls with a statistically significant probability. The HCPs' expectation that personality in FM patients is associated with higher levels for aspects of neuroticism (proneness to psychological distress) and lower scores for aspects of trust, proved to be correct.

  8. An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog).

    PubMed

    Dowling, N Maritza; Bolt, Daniel M; Deng, Sien

    2016-12-01

    When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  9. Score Equating and Item Response Theory: Some Practical Considerations.

    ERIC Educational Resources Information Center

    Cook, Linda L.; Eignor, Daniel R.

    The purposes of this paper are five-fold to discuss: (1) when item response theory (IRT) equating methods should provide better results than traditional methods; (2) which IRT model, the three-parameter logistic or the one-parameter logistic (Rasch), is the most reasonable to use; (3) what unique contributions IRT methods can offer the equating…

  10. Test Design and Speededness

    ERIC Educational Resources Information Center

    van der Linden, Wim J.

    2011-01-01

    A critical component of test speededness is the distribution of the test taker's total time on the test. A simple set of constraints on the item parameters in the lognormal model for response times is derived that can be used to control the distribution when assembling a new test form. As the constraints are linear in the item parameters, they can…

  11. The pack size effect: Influence on consumer perceptions of portion sizes.

    PubMed

    Hieke, Sophie; Palascha, Aikaterini; Jola, Corinne; Wills, Josephine; Raats, Monique M

    2016-01-01

    Larger portions as well as larger packs can lead to larger prospective consumption estimates, larger servings and increased consumption, described as 'portion-size effects' and 'pack size effects'. Although related, the effects of pack sizes on portion estimates have received less attention. While it is not possible to generalize consumer behaviour across cultures, external cues taken from pack size may affect us all. We thus examined whether pack sizes influence portion size estimates across cultures, leading to a general 'pack size effect'. We compared portion size estimates based on digital presentations of different product pack sizes of solid and liquid products. The study with 13,177 participants across six European countries consisted of three parts. Parts 1 and 2 asked participants to indicate the number of portions present in a combined photographic and text-based description of different pack sizes. The estimated portion size was calculated as the quotient of the content weight or volume of the food presented and the number of stated portions. In Part 3, participants stated the number of food items that make up a portion when presented with packs of food containing either a small or a large number of items. The estimated portion size was calculated as the item weight times the item number. For all three parts and across all countries, we found that participants' portion estimates were based on larger portions for larger packs compared to smaller packs (Part 1 and 2) as well as more items to make up a portion (Part 3); hence, portions were stated to be larger in all cases. Considering that the larger estimated portions are likely to be consumed, there are implications for energy intake and weight status. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

    ERIC Educational Resources Information Center

    Ali, Usama S.; Walker, Michael E.

    2014-01-01

    Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…

  13. Reliability and norms for the 10-item self-motivation inventory: The TIGER Study

    USDA-ARS?s Scientific Manuscript database

    The Self-Motivation Inventory (SMI) has been shown to be a predictor of exercise dropout. The original SMI of 40 items has been shortened to 10 items and the psychometric qualities of the 10-item SMI are not known. To estimate the reliability of a 10-item SMI and develop norms for an ethnically dive...

  14. Applications of computerized adaptive testing (CAT) to the assessment of headache impact.

    PubMed

    Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew

    2003-12-01

    To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.

  15. Differential Weight Procedure of the Conditional P.D.F. Approach for Estimating the Operating Characteristics of Discrete Item Responses.

    ERIC Educational Resources Information Center

    Samejima, Fumiko

    A method is proposed that increases the accuracies of estimation of the operating characteristics of discrete item responses, especially when the true operating characteristic is represented by a steep curve, and also at the lower and upper ends of the ability distribution where the estimation tends to be inaccurate because of the smaller number…

  16. Item Information in the Rasch Model. Project Psychometric Aspects of Item Banking No. 34. Research Report 88-7.

    ERIC Educational Resources Information Center

    Engelen, Ron J. H.; And Others

    Fisher's information measure for the item difficulty parameter in the Rasch model and its marginal and conditional formulations are investigated. It is shown that expected item information in the unconditional model equals information in the marginal model, provided the assumption of sampling examinees from an ability distribution is made. For the…

  17. Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

    ERIC Educational Resources Information Center

    Kopf, Julia; Zeileis, Achim; Strobl, Carolin

    2015-01-01

    Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…

  18. Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip

    2006-01-01

    Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model.…

  19. Calibration of an Item Bank for the Assessment of Basque Language Knowledge

    ERIC Educational Resources Information Center

    Lopez-Cuadrado, Javier; Perez, Tomas A.; Vadillo, Jose A.; Gutierrez, Julian

    2010-01-01

    The main requisite for a functional computerized adaptive testing system is the need of a calibrated item bank. This text presents the tasks carried out during the calibration of an item bank for assessing knowledge of Basque language. It has been done in terms of the 3-parameter logistic model provided by the item response theory. Besides, this…

  20. Influence of item distribution pattern and abundance on efficiency of benthic core sampling

    USGS Publications Warehouse

    Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.

    2014-01-01

    ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.

  1. Psychometric characteristics of daily diaries for the Patient-Reported Outcomes Measurement Information System (PROMIS®): a preliminary investigation.

    PubMed

    Schneider, Stefan; Choi, Seung W; Junghaenel, Doerte U; Schwartz, Joseph E; Stone, Arthur A

    2013-09-01

    The Patient-Reported Outcomes (PRO) Measurement Information System (PROMIS(®)) has developed assessment tools for numerous PROs, most using a 7-day recall format. We examined whether modifying the recall period for use in daily diary research would affect the psychometric characteristics of several PROMIS measures. Daily versions of short-forms for three PROMIS domains (pain interference, fatigue, depression) were administered to a general population sample (n = 100) for 28 days. Analyses used multilevel item response theory (IRT) models. We examined differential item functioning (DIF) across recall periods by comparing the IRT parameters from the daily data with the PROMIS 7-day recall IRT parameters. Additionally, we examined whether the IRT parameters for day-to-day within-person changes are invariant to those for between-person (cross-sectional) differences in PROs. Dimensionality analyses of the daily data suggested a single dimension for each PRO domain, consistent with PROMIS instruments. One-third of the daily items showed uniform DIF when compared with PROMIS 7-day recall, but the impact of DIF on the scale level was minor. IRT parameters for within-person changes differed from between-person parameters for 3 depression items, which were more sensitive for measuring change than between-person differences, but not for pain interference and fatigue items. Notably, mean scores from daily diaries were significantly lower than the PROMIS 7-day recall norms. The results provide initial evidence supporting the adaptation of PROMIS measures for daily diary research. However, scores from daily diaries cannot be directly interpreted on PROMIS norms established for 7-day recall.

  2. PREDICTION OF RELIABILITY IN BIOGRAPHICAL QUESTIONNAIRES.

    ERIC Educational Resources Information Center

    STARRY, ALLAN R.

    THE OBJECTIVES OF THIS STUDY WERE (1) TO DEVELOP A GENERAL CLASSIFICATION SYSTEM FOR LIFE HISTORY ITEMS, (2) TO DETERMINE TEST-RETEST RELIABILITY ESTIMATES, AND (3) TO ESTIMATE RESISTANCE TO EXAMINEE FAKING, FOR REPRESENTATIVE BIOGRAPHICAL QUESTIONNAIRES. TWO 100-ITEM QUESTIONNAIRES WERE CONSTRUCTED THROUGH RANDOM ASSIGNMENT BY CONTENT AREA OF 200…

  3. Reliability of Test Scores in Nonparametric Item Response Theory.

    ERIC Educational Resources Information Center

    Sijtsma, Klaas; Molenaar, Ivo W.

    1987-01-01

    Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four "classical" lower bounds to reliability. (Author/JAZ)

  4. IRT-Estimated Reliability for Tests Containing Mixed Item Formats

    ERIC Educational Resources Information Center

    Shu, Lianghua; Schwarz, Richard D.

    2014-01-01

    As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

  5. Robust Estimation of Latent Ability in Item Response Models

    ERIC Educational Resources Information Center

    Schuster, Christof; Yuan, Ke-Hai

    2011-01-01

    Because of response disturbances such as guessing, cheating, or carelessness, item response models often can only approximate the "true" individual response probabilities. As a consequence, maximum-likelihood estimates of ability will be biased. Typically, the nature and extent to which response disturbances are present is unknown, and, therefore,…

  6. Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2010-01-01

    This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

  7. A Violation of the Conditional Independence Assumption in the Two-High-Threshold Model of Recognition Memory

    ERIC Educational Resources Information Center

    Chen, Tina; Starns, Jeffrey J.; Rotello, Caren M.

    2015-01-01

    The 2-high-threshold (2HT) model of recognition memory assumes that test items result in distinct internal states: they are either detected or not, and the probability of responding at a particular confidence level that an item is "old" or "new" depends on the state-response mapping parameters. The mapping parameters are…

  8. Some Memories are Odder than Others: Judgments of Episodic Oddity Violate Known Decision Rules

    PubMed Central

    O’Connor, Akira R.; Guhl, Emily N.; Cox, Justin C.; Dobbins, Ian G.

    2011-01-01

    Current decision models of recognition memory are based almost entirely on one paradigm, single item old/new judgments accompanied by confidence ratings. This task results in receiver operating characteristics (ROCs) that are well fit by both signal-detection and dual-process models. Here we examine an entirely new recognition task, the judgment of episodic oddity, whereby participants select the mnemonically odd members of triplets (e.g., a new item hidden among two studied items). Using the only two known signal-detection rules of oddity judgment derived from the sensory perception literature, the unequal variance signal-detection model predicted that an old item among two new items would be easier to discover than a new item among two old items. In contrast, four separate empirical studies demonstrated the reverse pattern: triplets with two old items were the easiest to resolve. This finding was anticipated by the dual-process approach as the presence of two old items affords the greatest opportunity for recollection. Furthermore, a bootstrap-fed Monte Carlo procedure using two independent datasets demonstrated that the dual-process parameters typically observed during single item recognition correctly predict the current oddity findings, whereas unequal variance signal-detection parameters do not. Episodic oddity judgments represent a case where dual- and single-process predictions qualitatively diverge and the findings demonstrate that novelty is “odder” than familiarity. PMID:22833695

  9. Readability and Comprehension of the Geriatric Depression Scale and PROMIS® Physical Function Items in Older African Americans and Latinos.

    PubMed

    Paz, Sylvia H; Jones, Loretta; Calderón, José L; Hays, Ron D

    2017-02-01

    Depression and physical function are particularly important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) physical function item bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. The aim of this study was to estimate the readability of the GDS and PROMIS ® physical function items and to assess their comprehensibility using a sample of African American and Latino elderly. Readability was estimated using the Flesch-Kincaid and Flesch Reading Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS ® items by minority elderly was evaluated with 30 cognitive interviews. Readability estimates of a number of items in English and Spanish of the GDS and PROMIS ® physical functioning items exceed the U.S. recommended 5th-grade threshold for vulnerable populations, or were rated as 'fairly difficult', 'difficult', or 'very difficult' to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS ® items was considered confusing, and interpreting responses was problematic because they were based on using physical aids. Problems with item wording and response options of the GDS and PROMIS ® physical function items may reduce reliability and validity of measurement when used with minority elderly.

  10. Calibration of the Dutch-Flemish PROMIS Pain Behavior item bank in patients with chronic pain.

    PubMed

    Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B

    2016-02-01

    The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®

  11. Estimating the Effective System Dead Time Parameter for Correlated Neutron Counting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Croft, Stephen; Cleveland, Steve; Favalli, Andrea

    We present that neutron time correlation analysis is one of the main technical nuclear safeguards techniques used to verify declarations of, or to independently assay, special nuclear materials. Quantitative information is generally extracted from the neutron-event pulse train, collected from moderated assemblies of 3He proportional counters, in the form of correlated count rates that are derived from event-triggered coincidence gates. These count rates, most commonly referred to as singles, doubles and triples rates etc., when extracted using shift-register autocorrelation logic, are related to the reduced factorial moments of the time correlated clusters of neutrons emerging from the measurement items. Correctingmore » these various rates for dead time losses has received considerable attention recently. The dead time losses for the higher moments in particular, and especially for large mass (high rate and highly multiplying) items, can be significant. Consequently, even in thoughtfully designed systems, accurate dead time treatments are needed if biased mass determinations are to be avoided. In support of this effort, in this paper we discuss a new approach to experimentally estimate the effective system dead time of neutron coincidence counting systems. It involves counting a random neutron source (e.g. AmLi is a good approximation to a source without correlated emission) and relating the second and higher moments of the neutron number distribution recorded in random triggered interrogation coincidence gates to the effective value of dead time parameter. We develop the theoretical basis of the method and apply it to the Oak Ridge Large Volume Active Well Coincidence Counter using sealed AmLi radionuclide neutron sources and standard multiplicity shift register electronics. The method is simple to apply compared to the predominant present approach which involves using a set of 252Cf sources of wide emission rate, it gives excellent precision in a conveniently short time, and it yields consistent results as a function of the order of the moment used to extract the dead time parameter. In addition, this latter observation is reassuring in that it suggests the assumptions underpinning the theoretical analysis are fit for practical application purposes. However, we found that the effective dead time parameter obtained is not constant, as might be expected for a parameter that in the dead time model is characteristic of the detector system, but rather, varies systematically with gate width.« less

  12. Estimating the Effective System Dead Time Parameter for Correlated Neutron Counting

    DOE PAGES

    Croft, Stephen; Cleveland, Steve; Favalli, Andrea; ...

    2017-04-29

    We present that neutron time correlation analysis is one of the main technical nuclear safeguards techniques used to verify declarations of, or to independently assay, special nuclear materials. Quantitative information is generally extracted from the neutron-event pulse train, collected from moderated assemblies of 3He proportional counters, in the form of correlated count rates that are derived from event-triggered coincidence gates. These count rates, most commonly referred to as singles, doubles and triples rates etc., when extracted using shift-register autocorrelation logic, are related to the reduced factorial moments of the time correlated clusters of neutrons emerging from the measurement items. Correctingmore » these various rates for dead time losses has received considerable attention recently. The dead time losses for the higher moments in particular, and especially for large mass (high rate and highly multiplying) items, can be significant. Consequently, even in thoughtfully designed systems, accurate dead time treatments are needed if biased mass determinations are to be avoided. In support of this effort, in this paper we discuss a new approach to experimentally estimate the effective system dead time of neutron coincidence counting systems. It involves counting a random neutron source (e.g. AmLi is a good approximation to a source without correlated emission) and relating the second and higher moments of the neutron number distribution recorded in random triggered interrogation coincidence gates to the effective value of dead time parameter. We develop the theoretical basis of the method and apply it to the Oak Ridge Large Volume Active Well Coincidence Counter using sealed AmLi radionuclide neutron sources and standard multiplicity shift register electronics. The method is simple to apply compared to the predominant present approach which involves using a set of 252Cf sources of wide emission rate, it gives excellent precision in a conveniently short time, and it yields consistent results as a function of the order of the moment used to extract the dead time parameter. In addition, this latter observation is reassuring in that it suggests the assumptions underpinning the theoretical analysis are fit for practical application purposes. However, we found that the effective dead time parameter obtained is not constant, as might be expected for a parameter that in the dead time model is characteristic of the detector system, but rather, varies systematically with gate width.« less

  13. Estimating the effective system dead time parameter for correlated neutron counting

    NASA Astrophysics Data System (ADS)

    Croft, Stephen; Cleveland, Steve; Favalli, Andrea; McElroy, Robert D.; Simone, Angela T.

    2017-11-01

    Neutron time correlation analysis is one of the main technical nuclear safeguards techniques used to verify declarations of, or to independently assay, special nuclear materials. Quantitative information is generally extracted from the neutron-event pulse train, collected from moderated assemblies of 3He proportional counters, in the form of correlated count rates that are derived from event-triggered coincidence gates. These count rates, most commonly referred to as singles, doubles and triples rates etc., when extracted using shift-register autocorrelation logic, are related to the reduced factorial moments of the time correlated clusters of neutrons emerging from the measurement items. Correcting these various rates for dead time losses has received considerable attention recently. The dead time losses for the higher moments in particular, and especially for large mass (high rate and highly multiplying) items, can be significant. Consequently, even in thoughtfully designed systems, accurate dead time treatments are needed if biased mass determinations are to be avoided. In support of this effort, in this paper we discuss a new approach to experimentally estimate the effective system dead time of neutron coincidence counting systems. It involves counting a random neutron source (e.g. AmLi is a good approximation to a source without correlated emission) and relating the second and higher moments of the neutron number distribution recorded in random triggered interrogation coincidence gates to the effective value of dead time parameter. We develop the theoretical basis of the method and apply it to the Oak Ridge Large Volume Active Well Coincidence Counter using sealed AmLi radionuclide neutron sources and standard multiplicity shift register electronics. The method is simple to apply compared to the predominant present approach which involves using a set of 252Cf sources of wide emission rate, it gives excellent precision in a conveniently short time, and it yields consistent results as a function of the order of the moment used to extract the dead time parameter. This latter observation is reassuring in that it suggests the assumptions underpinning the theoretical analysis are fit for practical application purposes. However, we found that the effective dead time parameter obtained is not constant, as might be expected for a parameter that in the dead time model is characteristic of the detector system, but rather, varies systematically with gate width.

  14. Practical Considerations about Expected A Posteriori Estimation in Adaptive Testing: Adaptive A Priori, Adaptive Correction for Bias, and Adaptive Integration Interval.

    ERIC Educational Resources Information Center

    Raiche, Gilles; Blais, Jean-Guy

    In a computerized adaptive test (CAT), it would be desirable to obtain an acceptable precision of the proficiency level estimate using an optimal number of items. Decreasing the number of items is accompanied, however, by a certain degree of bias when the true proficiency level differs significantly from the a priori estimate. G. Raiche (2000) has…

  15. The Effects of Small Sample Size on Identifying Polytomous DIF Using the Liu-Agresti Estimator of the Cumulative Common Odds Ratio

    ERIC Educational Resources Information Center

    Carvajal, Jorge; Skorupski, William P.

    2010-01-01

    This study is an evaluation of the behavior of the Liu-Agresti estimator of the cumulative common odds ratio when identifying differential item functioning (DIF) with polytomously scored test items using small samples. The Liu-Agresti estimator has been proposed by Penfield and Algina as a promising approach for the study of polytomous DIF but no…

  16. A general diagnostic model applied to language testing data.

    PubMed

    von Davier, Matthias

    2008-11-01

    Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.

  17. The Research Identity Scale: Psychometric Analyses and Scale Refinement

    ERIC Educational Resources Information Center

    Jorgensen, Maribeth F.; Schweinle, William E.

    2018-01-01

    The 68-item Research Identity Scale (RIS) was informed through qualitative exploration of research identity development in master's-level counseling students and practitioners. Classical psychometric analyses revealed the items had strong validity and reliability and a single factor. A one-parameter Rasch analysis and item review was used to…

  18. Parent outcome expectancies for purchasing fruit and vegetables: a validation.

    PubMed

    Baranowski, Tom; Watson, Kathy; Missaghian, Mariam; Broadfoot, Alison; Baranowski, Janice; Cullen, Karen; Nicklas, Theresa; Fisher, Jennifer; O'Donnell, Sharon

    2007-03-01

    To validate four scales -- outcome expectancies for purchasing fruit and for purchasing vegetables, and comparative outcome expectancies for purchasing fresh fruit and for purchasing fresh vegetables versus other forms of fruit and vegetables (F&V). Survey instruments were administered twice, separated by 6 weeks. Recruited in front of supermarkets and grocery stores; interviews conducted by telephone. One hundred and sixty-one food shoppers with children (18 years or younger). Single dimension scales were specified for fruit and for vegetable purchasing outcome expectancies, and for comparative (fresh vs. other) fruit and vegetable purchasing outcome expectancies. Item Response Theory parameter estimates revealed easily interpreted patterns in the sequence of items by difficulty of response. Fruit and vegetable purchasing and fresh fruit comparative purchasing outcome expectancy scales were significantly correlated with home F&V availability, after controlling for social desirability of response. Comparative fresh vegetable outcome expectancy scale was significantly bivariately correlated with home vegetable availability, but not after controlling for social desirability. These scales are available to help better understand family F&V purchasing decisions.

  19. Response-Order Effects in Survey Methods: A Randomized Controlled Crossover Study in the Context of Sport Injury Prevention.

    PubMed

    Chan, Derwin K; Ivarsson, Andreas; Stenling, Andreas; Yang, Sophie X; Chatzisarantis, Nikos L; Hagger, Martin S

    2015-12-01

    Consistency tendency is characterized by the propensity for participants responding to subsequent items in a survey consistent with their responses to previous items. This method effect might contaminate the results of sport psychology surveys using cross-sectional design. We present a randomized controlled crossover study examining the effect of consistency tendency on the motivational pathway (i.e., autonomy support → autonomous motivation → intention) of self-determination theory in the context of sport injury prevention. Athletes from Sweden (N = 341) responded to the survey printed in either low interitem distance (IID; consistency tendency likely) or high IID (consistency tendency suppressed) on two separate occasions, with a one-week interim period. Participants were randomly allocated into two groups, and they received the survey of different IID at each occasion. Bayesian structural equation modeling showed that low IID condition had stronger parameter estimates than high IID condition, but the differences were not statistically significant.

  20. Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain

    PubMed Central

    Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.

    2015-01-01

    The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178

  1. Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

    PubMed

    Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B

    2015-01-01

    The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.

  2. Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

    ERIC Educational Resources Information Center

    Brutten, Sheila R.; And Others

    A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…

  3. Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Yen, Yung-Chin; Ho, Rong-Guey; Liao, Wen-Wei; Chen, Li-Ju

    2012-01-01

    In a test, the testing score would be closer to examinee's actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee's ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability…

  4. Application of a Method of Estimating DIF for Polytomous Test Items.

    ERIC Educational Resources Information Center

    Camilli, Gregory; Congdon, Peter

    1999-01-01

    Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)

  5. Developing Multidimensional Likert Scales Using Item Factor Analysis: The Case of Four-Point Items

    ERIC Educational Resources Information Center

    Asún, Rodrigo A.; Rdz-Navarro, Karina; Alvarado, Jesús M.

    2016-01-01

    This study compares the performance of two approaches in analysing four-point Likert rating scales with a factorial model: the classical factor analysis (FA) and the item factor analysis (IFA). For FA, maximum likelihood and weighted least squares estimations using Pearson correlation matrices among items are compared. For IFA, diagonally weighted…

  6. Estimating the Reliability of Single-Item Life Satisfaction Measures: Results from Four National Panel Studies

    ERIC Educational Resources Information Center

    Lucas, Richard E.; Donnellan, M. Brent

    2012-01-01

    Life satisfaction is often assessed using single-item measures. However, estimating the reliability of these measures can be difficult because internal consistency coefficients cannot be calculated. Existing approaches use longitudinal data to isolate occasion-specific variance from variance that is either completely stable or variance that…

  7. Tools of Robustness for Item Response Theory.

    ERIC Educational Resources Information Center

    Jones, Douglas H.

    This paper briefly demonstrates a few of the possibilities of a systematic application of robustness theory, concentrating on the estimation of ability when the true item response model does and does not fit the data. The definition of the maximum likelihood estimator (MLE) of ability is briefly reviewed. After introducing the notion of…

  8. 48 CFR 52.223-9 - Estimate of Percentage of Recovered Material Content for EPA-Designated Items.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 48 Federal Acquisition Regulations System 2 2011-10-01 2011-10-01 false Estimate of Percentage of... Regulations System FEDERAL ACQUISITION REGULATION (CONTINUED) CLAUSES AND FORMS SOLICITATION PROVISIONS AND... has been discarded for disposal or recovery, having completed its life as a consumer item...

  9. 48 CFR 52.223-9 - Estimate of Percentage of Recovered Material Content for EPA-Designated Items.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 48 Federal Acquisition Regulations System 2 2010-10-01 2010-10-01 false Estimate of Percentage of... Regulations System FEDERAL ACQUISITION REGULATION (CONTINUED) CLAUSES AND FORMS SOLICITATION PROVISIONS AND... has been discarded for disposal or recovery, having completed its life as a consumer item...

  10. Estimates of self-reported dietary behavior related to oral health among adolescents according to the type of food

    PubMed Central

    do AMARAL, Regiane Cristina; SCABAR, Luiz Felipe; SLATER, Betzabeth; FRAZÃO, Paulo

    2014-01-01

    Objective To compare estimates of food behavior related to oral health obtained through a self-report measure and 24 hour dietary recalls (R24h). Method We applied three R24h and one self-report measure in 87 adolescents. The estimates for eleven food items were compared at individual and group levels. Results No significant differences in mean values were found for ice cream, vegetables and biscuits without filling. For the remaining items, the values reported by the adolescents were higher than the values estimated by R24h. The percentage of adolescents who reported intake frequency of 1 or more times/ day was higher than the value obtained through R24h for all food items except soft drinks. The highest values of crude agreement between the instruments, individually, were found in the biscuits without filling (75.9%) and ice cream (72.4%). Conclusion The results suggest that adolescents tend to report a degree of exposure to the food items larger than what they actually experience in their daily lives. PMID:25466475

  11. [Reproducibility, internal consistency, and construct validity of KIDSCREEN-27 in Brazilian adolescents].

    PubMed

    Farias, José Cazuza de; Loch, Mathias Roberto; Lima, Antônio José de; Sales, Joana Marcela; Ferreira, Flávia Emília Leite de Lima

    2017-09-28

    : The objective of this two-part study was to estimate the reproducibility, internal consistency, and construct validity of KIDSCREEN-27, a questionnaire to measure health-related quality of life, in Brazilian adolescents. One study component estimated reproducibility (176 adolescents, 59.7% females, 64.7% 10 to 12 years of age), and another estimated internal consistency and validity (1,321 adolescents, 53.7% females, 56.9% 10 to 12 years of age). The studies were conducted with adolescents of both sexes in public schools in the municipality of João Pessoa, Paraíba State, Brazil. KIDSCREEN-27 consists of 27 items distributed across five domains (physical well-being, 5 items; psychological well-being, 7 items; parents and social support, 7 items; autonomy and relationship with parents, 4 items; school environment, 4 items). Reproducibility was estimated by intra-class correlation coefficient (ICC). Confirmatory factor analysis was used to assess construct validity, and composite reliability index (CRI) was used to verify the questionnaire's internal consistency. ICCs were greater than or equal to 0.70 (0.70 to 0.96). Factor loads were greater than 0.40, except for five items (0.28 to 0.39). The model's goodness-of-fit indices were adequate (χ2/df = 2.79; RMR = 0.035; RMSEA = 0.037; GFI = 0.951; AGFI = 0.941; CFI = 0.908; TLI = 0.901). CRI varied from 0.65 to 0.70 in the domains and was 0.90 for the questionnaire. KIDSCREEN-27 reached satisfactory levels of reproducibility, internal consistency, and construct validity and can be used to assess health-related quality of life in Brazilian adolescents 10 to 15 years of age.

  12. Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

    ERIC Educational Resources Information Center

    Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry

    2015-01-01

    The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…

  13. Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

    ERIC Educational Resources Information Center

    Cao, Yi; Lu, Ru; Tao, Wei

    2014-01-01

    The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…

  14. Development of the Abbreviated Masculine Gender Role Stress Scale

    PubMed Central

    Swartout, Kevin M.; Parrott, Dominic J.; Cohn, Amy M.; Hagman, Brett T.; Gallagher, Kathryn E.

    2014-01-01

    Data gathered from six independent samples (n = 1,729) that assessed men’s masculine gender role stress in college and community males were aggregated used to determine the reliability and validity of an abbreviated version of the Masculine Gender Role Stress Scale (MGRS scale). The 15 items with the highest item-to-total scale correlations were used to create an abbreviated MGRS scale. Psychometric properties of each of the 15-items were examined with Item Response Theory (IRT) analysis, using the discrimination and threshold parameters. IRT results showed that the abbreviated scale may hold promise at capturing the same amount of information as the full 40-item scale. Relative to the 40-item scale, the total score of the abbreviated MGRS scale demonstrated comparable convergent validity using the measurement domains of masculine identity, hyper-masculinity, trait anger, anger expression, and alcohol involvement. An abbreviated MGRS scale may be recommended for use in clinical practice and research settings to reduce cost, time, and patient/participant burden. Additionally, IRT analyses identified items with higher discrimination and threshold parameters that may be used to screen for problematic gender role stress in men who may be seen in routine clinical or medical practice. PMID:25528163

  15. Development of the Abbreviated Masculine Gender Role Stress Scale.

    PubMed

    Swartout, Kevin M; Parrott, Dominic J; Cohn, Amy M; Hagman, Brett T; Gallagher, Kathryn E

    2015-06-01

    Data gathered from 6 independent samples (n = 1,729) that assessed men's masculine gender role stress in college and community males were aggregated used to determine the reliability and validity of an abbreviated version of the Masculine Gender Role Stress (MGRS) Scale. The 15 items with the highest item-to-total scale correlations were used to create an abbreviated MGRS Scale. Psychometric properties of each of the 15 items were examined with item response theory (IRT) analysis, using the discrimination and threshold parameters. IRT results showed that the abbreviated scale may hold promise at capturing the same amount of information as the full 40-item scale. Relative to the 40-item scale, the total score of the abbreviated MGRS Scale demonstrated comparable convergent validity using the measurement domains of masculine identity, hypermasculinity, trait anger, anger expression, and alcohol involvement. An abbreviated MGRS Scale may be recommended for use in clinical practice and research settings to reduce cost, time, and patient/participant burden. Additionally, IRT analyses identified items with higher discrimination and threshold parameters that may be used to screen for problematic gender role stress in men who may be seen in routine clinical or medical practice. (c) 2015 APA, all rights reserved).

  16. Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

    PubMed

    Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

    2017-01-01

    The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.

  17. UAS CNPC Satellite Link Performance - Sharing Spectrum with Terrestrial Systems

    NASA Technical Reports Server (NTRS)

    Kerczewski, Robert J.; Wilson, Jeffrey D.; Bishop, William D.

    2016-01-01

    In order to provide for the safe integration of unmanned aircraft systems into the National Airspace System, the control and non-payload communications (CNPC) link connecting the ground-based pilot with the unmanned aircraft must be highly reliable. A specific requirement is that it must operate using aviation safety radiofrequency spectrum. The 2012 World Radiocommunication Conference (WRC-12) provided a potentially suitable allocation for radio line-of-sight (LOS), terrestrial based CNPC link at 5030-5091 MHz. For a beyond radio line-of-sight (BLOS), satellite-based CNPC link, aviation safety spectrum allocations are currently inadequate. Therefore, the 2015 WRC will consider the use of Fixed Satellite Service (FSS) bands to provide BLOS CNPC under Agenda Item 1.5. This agenda item requires studies to be conducted to allow for the consideration of how unmanned aircraft can employ FSS for BLOS CNPC while maintaining existing systems. Since there are terrestrial Fixed Service systems also using the same frequency bands under consideration in Agenda Item 1.5 one of the studies required considered spectrum sharing between earth stations on-board unmanned aircraft and Fixed Service station receivers. Studies carried out by NASA have concluded that such sharing is possible under parameters previously established by the International Telecommunications Union. As the preparation for WRC-15 has progressed, additional study parameters Agenda Item 1.5 have been proposed, and some studies using these parameters have been added. This paper examines the study results for the original parameters as well as results considering some of the more recently proposed parameters to provide insight into the complicated process of resolving WRC-15 Agenda Item 1.5 and achieving a solution for BLOS CNPC for unmanned aircraft.

  18. A Bayesian Semiparametric Item Response Model with Dirichlet Process Priors

    ERIC Educational Resources Information Center

    Miyazaki, Kei; Hoshino, Takahiro

    2009-01-01

    In Item Response Theory (IRT), item characteristic curves (ICCs) are illustrated through logistic models or normal ogive models, and the probability that examinees give the correct answer is usually a monotonically increasing function of their ability parameters. However, since only limited patterns of shapes can be obtained from logistic models…

  19. A Two-Parameter Latent Trait Model. Methodology Project.

    ERIC Educational Resources Information Center

    Choppin, Bruce

    On well-constructed multiple-choice tests, the most serious threat to measurement is not variation in item discrimination, but the guessing behavior that may be adopted by some students. Ways of ameliorating the effects of guessing are discussed, especially for problems in latent trait models. A new item response model, including an item parameter…

  20. IRT-ZIP Modeling for Multivariate Zero-Inflated Count Data

    ERIC Educational Resources Information Center

    Wang, Lijuan

    2010-01-01

    This study introduces an item response theory-zero-inflated Poisson (IRT-ZIP) model to investigate psychometric properties of multiple items and predict individuals' latent trait scores for multivariate zero-inflated count data. In the model, two link functions are used to capture two processes of the zero-inflated count data. Item parameters are…

  1. The relative effect of noise at different times of day: An analysis of existing survey data

    NASA Technical Reports Server (NTRS)

    Fields, J. M.

    1986-01-01

    This report examines survey evidence on the relative impact of noise at different times of day and assesses the survey methodology which produces that evidence. Analyses of the regression of overall (24-hour) annoyance on noise levels in different time periods can provide direct estimates of the value of the parameters in human reaction models which are used in environmental noise indices such as LDN and CNEL. In this report these analyses are based on the original computer tapes containing the responses of 22,000 respondents from ten studies of response to noise in residential areas. The estimates derived from these analyses are found to be so inaccurate that they do not provide useful information for policy or scientific purposes. The possibility that the type of questionnaire item could be biasing the estimates of the time-of-day weightings is considered but not supported by the data. Two alternatives to the conventional noise reaction model (adjusted energy model) are considered but not supported by the data.

  2. The relative effect of noise at different times of day: An analysis of existing survey data

    NASA Astrophysics Data System (ADS)

    Fields, J. M.

    1986-04-01

    This report examines survey evidence on the relative impact of noise at different times of day and assesses the survey methodology which produces that evidence. Analyses of the regression of overall (24-hour) annoyance on noise levels in different time periods can provide direct estimates of the value of the parameters in human reaction models which are used in environmental noise indices such as LDN and CNEL. In this report these analyses are based on the original computer tapes containing the responses of 22,000 respondents from ten studies of response to noise in residential areas. The estimates derived from these analyses are found to be so inaccurate that they do not provide useful information for policy or scientific purposes. The possibility that the type of questionnaire item could be biasing the estimates of the time-of-day weightings is considered but not supported by the data. Two alternatives to the conventional noise reaction model (adjusted energy model) are considered but not supported by the data.

  3. Measurement properties of the WOMAC LK 3.1 pain scale.

    PubMed

    Stratford, P W; Kennedy, D M; Woodhouse, L J; Spadoni, G F

    2007-03-01

    The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is applied extensively to patients with osteoarthritis of the hip or knee. Previous work has challenged the validity of its physical function scale however an extensive evaluation of its pain scale has not been reported. Our purpose was to estimate internal consistency, factorial validity, test-retest reliability, and the standard error of measurement (SEM) of the WOMAC LK 3.1 pain scale. Four hundred and seventy-four patients with osteoarthritis of the hip or knee awaiting arthroplasty were administered the WOMAC. Estimates of internal consistency (coefficient alpha), factorial validity (confirmatory factor analysis), and the SEM based on internal consistency (SEM(IC)) were obtained. Test-retest reliability [Type 2,1 intraclass correlation coefficients (ICC)] and a corresponding SEM(TRT) were estimated on a subsample of 36 patients. Our estimates were: internal consistency alpha=0.84; SEM(IC)=1.48; Type 2,1 ICC=0.77; SEM(TRT)=1.69. Confirmatory factor analysis failed to support a single factor structure of the pain scale with uncorrelated error terms. Two comparable models provided excellent fit: (1) a model with correlated error terms between the walking and stairs items, and between night and sit items (chi2=0.18, P=0.98); (2) a two factor model with walking and stairs items loading on one factor, night and sit items loading on a second factor, and the standing item loading on both factors (chi2=0.18, P=0.98). Our examination of the factorial structure of the WOMAC pain scale failed to support a single factor and internal consistency analysis yielded a coefficient less than optimal for individual patient use. An alternate strategy to summing the five-item responses when considering individual patient application would be to interpret item responses separately or to sum only those items which display homogeneity.

  4. Item Selection and Pre-equating with Empirical Item Characteristic Curves.

    ERIC Educational Resources Information Center

    Livingston, Samuel A.

    An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…

  5. Cost analysis of advanced turbine blade manufacturing processes

    NASA Technical Reports Server (NTRS)

    Barth, C. F.; Blake, D. E.; Stelson, T. S.

    1977-01-01

    A rigorous analysis was conducted to estimate relative manufacturing costs for high technology gas turbine blades prepared by three candidate materials process systems. The manufacturing costs for the same turbine blade configuration of directionally solidified eutectic alloy, an oxide dispersion strengthened superalloy, and a fiber reinforced superalloy were compared on a relative basis to the costs of the same blade currently in production utilizing the directional solidification process. An analytical process cost model was developed to quantitatively perform the cost comparisons. The impact of individual process yield factors on costs was also assessed as well as effects of process parameters, raw materials, labor rates and consumable items.

  6. Development and validation of a socioculturally competent trust in physician scale for a developing country setting.

    PubMed

    Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

    2015-05-03

    Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  7. Development and validation of a socioculturally competent trust in physician scale for a developing country setting

    PubMed Central

    Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

    2015-01-01

    Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182

  8. Umyuangcaryaraq "Reflecting": multidimensional assessment of reflective processes on the consequences of alcohol use among rural Yup'ik Alaska Native youth.

    PubMed

    Allen, James; Fok, Carlotta Ching Ting; Henry, David; Skewes, Monica

    2012-09-01

    Concerns in some settings regarding the accuracy and ethics of employing direct questions about alcohol use suggest need for alternative assessment approaches with youth. Umyuangcaryaraq is a Yup'ik Alaska Native word meaning "Reflecting." The Reflective Processes Scale was developed as a youth measure tapping awareness and thinking over potential negative consequences of alcohol misuse as a protective factor that includes cultural elements often shared by many other Alaska Native and American Indian cultures. This study assessed multidimensional structure, item functioning, and validity. Responses from 284 rural Alaska Native youth allowed bifactor analysis to assess structure, estimates of location and discrimination parameters, and convergent and discriminant validity. A bifactor model of the scale items with three content factors provided excellent fit to observed data. Item response theory analysis suggested a binary response format as optimal. Evidence of convergent and discriminant validity was established. The measure provides an assessment of reflective processes about alcohol that Alaska Native youth engage in when thinking about reasons not to drink. The concept of reflective processes has potential to extend understandings of cultural variation in mindfulness, alcohol expectancies research, and culturally mediated protective factors in Alaska Native and American Indian youth.

  9. Development and evaluation of the PI-G: a three-scale measure based on the German translation of the PROMIS ® pain interference item bank.

    PubMed

    Farin, Erik; Nagl, Michaela; Gramm, Lukas; Heyduck, Katja; Glattacker, Manuela

    2014-05-01

    Study aim was to translate the PROMIS(®) pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms. We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS(®) PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted. The assumptions regarding IRT scaling of the translated PROMIS(®) PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties. The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS(®) PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).

  10. Taking the Missing Propensity into Account When Estimating Competence Scores: Evaluation of Item Response Theory Models for Nonignorable Omissions

    ERIC Educational Resources Information Center

    Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H.

    2015-01-01

    When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically…

  11. Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level

    ERIC Educational Resources Information Center

    Savalei, Victoria; Rhemtulla, Mijke

    2017-01-01

    In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately…

  12. A Mixture Rasch Model with a Covariate: A Simulation Study via Bayesian Markov Chain Monte Carlo Estimation

    ERIC Educational Resources Information Center

    Dai, Yunyun

    2013-01-01

    Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…

  13. Two Prophecy Formulas for Assessing the Reliability of Item Response Theory-Based Ability Estimates

    ERIC Educational Resources Information Center

    Raju, Nambury S.; Oshima, T.C.

    2005-01-01

    Two new prophecy formulas for estimating item response theory (IRT)-based reliability of a shortened or lengthened test are proposed. Some of the relationships between the two formulas, one of which is identical to the well-known Spearman-Brown prophecy formula, are examined and illustrated. The major assumptions underlying these formulas are…

  14. Using SAS PROC MCMC for Item Response Theory Models

    ERIC Educational Resources Information Center

    Ames, Allison J.; Samonte, Kelli

    2015-01-01

    Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian…

  15. Learning User Preferences for Sets of Objects

    NASA Technical Reports Server (NTRS)

    desJardins, Marie; Eaton, Eric; Wagstaff, Kiri L.

    2006-01-01

    Most work on preference learning has focused on pairwise preferences or rankings over individual items. In this paper, we present a method for learning preferences over sets of items. Our learning method takes as input a collection of positive examples--that is, one or more sets that have been identified by a user as desirable. Kernel density estimation is used to estimate the value function for individual items, and the desired set diversity is estimated from the average set diversity observed in the collection. Since this is a new learning problem, we introduce a new evaluation methodology and evaluate the learning method on two data collections: synthetic blocks-world data and a new real-world music data collection that we have gathered.

  16. Applying Item Response Theory methods to design a learning progression-based science assessment

    NASA Astrophysics Data System (ADS)

    Chen, Jing

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.

  17. Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children

    PubMed Central

    2012-01-01

    Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135

  18. Syndromes of Self-Reported Psychopathology for Ages 18-59 in 29 Societies.

    PubMed

    Ivanova, Masha Y; Achenbach, Thomas M; Rescorla, Leslie A; Tumer, Lori V; Ahmeti-Pronaj, Adelina; Au, Alma; Maese, Carmen Avila; Bellina, Monica; Caldas, J Carlos; Chen, Yi-Chuen; Csemy, Ladislav; da Rocha, Marina M; Decoster, Jeroen; Dobrean, Anca; Ezpeleta, Lourdes; Fontaine, Johnny R J; Funabiki, Yasuko; Guðmundsson, Halldór S; Harder, Valerie S; de la Cabada, Marie Leiner; Leung, Patrick; Liu, Jianghong; Mahr, Safia; Malykh, Sergey; Maras, Jelena Srdanovic; Markovic, Jasminka; Ndetei, David M; Oh, Kyung Ja; Petot, Jean-Michel; Riad, Geylan; Sakarya, Direnc; Samaniego, Virginia C; Sebre, Sandra; Shahini, Mimoza; Silvares, Edwiges; Simulioniene, Roma; Sokoli, Elvisa; Talcott, Joel B; Vazquez, Natalia; Zasepa, Ewa

    2015-06-01

    This study tested the multi-society generalizability of an eight-syndrome assessment model derived from factor analyses of American adults' self-ratings of 120 behavioral, emotional, and social problems. The Adult Self-Report (ASR; Achenbach and Rescorla 2003) was completed by 17,152 18-59-year-olds in 29 societies. Confirmatory factor analyses tested the fit of self-ratings in each sample to the eight-syndrome model. The primary model fit index (Root Mean Square Error of Approximation) showed good model fit for all samples, while secondary indices showed acceptable to good fit. Only 5 (0.06%) of the 8,598 estimated parameters were outside the admissible parameter space. Confidence intervals indicated that sampling fluctuations could account for the deviant parameters. Results thus supported the tested model in societies differing widely in social, political, and economic systems, languages, ethnicities, religions, and geographical regions. Although other items, societies, and analytic methods might yield different results, the findings indicate that adults in very diverse societies were willing and able to rate themselves on the same standardized set of 120 problem items. Moreover, their self-ratings fit an eight-syndrome model previously derived from self-ratings by American adults. The support for the statistically derived syndrome model is consistent with previous findings for parent, teacher, and self-ratings of 1½-18-year-olds in many societies. The ASR and its parallel collateral-report instrument, the Adult Behavior Checklist (ABCL), may offer mental health professionals practical tools for the multi-informant assessment of clinical constructs of adult psychopathology that appear to be meaningful across diverse societies.

  19. Validation of the Adolescent Concerns Measure (ACM): evidence from exploratory and confirmatory factor analysis.

    PubMed

    Ang, Rebecca P; Chong, Wan Har; Huan, Vivien S; Yeo, Lay See

    2007-01-01

    This article reports the development and initial validation of scores obtained from the Adolescent Concerns Measure (ACM), a scale which assesses concerns of Asian adolescent students. In Study 1, findings from exploratory factor analysis using 619 adolescents suggested a 24-item scale with four correlated factors--Family Concerns (9 items), Peer Concerns (5 items), Personal Concerns (6 items), and School Concerns (4 items). Initial estimates of convergent validity for ACM scores were also reported. The four-factor structure of ACM scores derived from Study 1 was confirmed via confirmatory factor analysis in Study 2 using a two-fold cross-validation procedure with a separate sample of 811 adolescents. Support was found for both the multidimensional and hierarchical models of adolescent concerns using the ACM. Internal consistency and test-retest reliability estimates were adequate for research purposes. ACM scores show promise as a reliable and potentially valid measure of Asian adolescents' concerns.

  20. A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

    ERIC Educational Resources Information Center

    Yao, Lihua; Schwarz, Richard D.

    2006-01-01

    Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

Top