Molenaar, Dylan; de Boeck, Paul
2018-06-01
In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models
ERIC Educational Resources Information Center
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol
2016-01-01
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Different Approaches to Covariate Inclusion in the Mixture Rasch Model
ERIC Educational Resources Information Center
Li, Tongyun; Jiao, Hong; Macready, George B.
2016-01-01
The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…
ERIC Educational Resources Information Center
Bilir, Mustafa Kuzey
2009-01-01
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan
2016-01-01
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan
2016-01-01
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability. PMID:26941699
Tijmstra, Jesper; Bolsinova, Maria; Jeon, Minjeong
2018-01-10
This article proposes a general mixture item response theory (IRT) framework that allows for classes of persons to differ with respect to the type of processes underlying the item responses. Through the use of mixture models, nonnested IRT models with different structures can be estimated for different classes, and class membership can be estimated for each person in the sample. If researchers are able to provide competing measurement models, this mixture IRT framework may help them deal with some violations of measurement invariance. To illustrate this approach, we consider a two-class mixture model, where a person's responses to Likert-scale items containing a neutral middle category are either modeled using a generalized partial credit model, or through an IRTree model. In the first model, the middle category ("neither agree nor disagree") is taken to be qualitatively similar to the other categories, and is taken to provide information about the person's endorsement. In the second model, the middle category is taken to be qualitatively different and to reflect a nonresponse choice, which is modeled using an additional latent variable that captures a person's willingness to respond. The mixture model is studied using simulation studies and is applied to an empirical example.
ERIC Educational Resources Information Center
Maij-de Meij, Annette M.; Kelderman, Henk; van der Flier, Henk
2008-01-01
Mixture item response theory (IRT) models aid the interpretation of response behavior on personality tests and may provide possibilities for improving prediction. Heterogeneity in the population is modeled by identifying homogeneous subgroups that conform to different measurement models. In this study, mixture IRT models were applied to the…
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Item selection via Bayesian IRT models.
Arima, Serena
2015-02-10
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
A semi-parametric within-subject mixture approach to the analyses of responses and response times.
Molenaar, Dylan; Bolsinova, Maria; Vermunt, Jeroen K
2018-05-01
In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach. © 2017 The British Psychological Society.
A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)
ERIC Educational Resources Information Center
Arenson, Ethan A.; Karabatsos, George
2017-01-01
Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…
Mixture IRT Model with a Higher-Order Structure for Latent Traits
ERIC Educational Resources Information Center
Huang, Hung-Yu
2017-01-01
Mixture item response theory (IRT) models have been suggested as an efficient method of detecting the different response patterns derived from latent classes when developing a test. In testing situations, multiple latent traits measured by a battery of tests can exhibit a higher-order structure, and mixtures of latent classes may occur on…
ERIC Educational Resources Information Center
Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.
2015-01-01
The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…
An introduction to mixture item response theory models.
De Ayala, R J; Santiago, S Y
2017-02-01
Mixture item response theory (IRT) allows one to address situations that involve a mixture of latent subpopulations that are qualitatively different but within which a measurement model based on a continuous latent variable holds. In this modeling framework, one can characterize students by both their location on a continuous latent variable as well as by their latent class membership. For example, in a study of risky youth behavior this approach would make it possible to estimate an individual's propensity to engage in risky youth behavior (i.e., on a continuous scale) and to use these estimates to identify youth who might be at the greatest risk given their class membership. Mixture IRT can be used with binary response data (e.g., true/false, agree/disagree, endorsement/not endorsement, correct/incorrect, presence/absence of a behavior), Likert response scales, partial correct scoring, nominal scales, or rating scales. In the following, we present mixture IRT modeling and two examples of its use. Data needed to reproduce analyses in this article are available as supplemental online materials at http://dx.doi.org/10.1016/j.jsp.2016.01.002. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Dardick, William R.; Mislevy, Robert J.
2016-01-01
A new variant of the iterative "data = fit + residual" data-analytical approach described by Mosteller and Tukey is proposed and implemented in the context of item response theory psychometric models. Posterior probabilities from a Bayesian mixture model of a Rasch item response theory model and an unscalable latent class are expressed…
Model Selection Methods for Mixture Dichotomous IRT Models
ERIC Educational Resources Information Center
Li, Feiming; Cohen, Allan S.; Kim, Seock-Ho; Cho, Sun-Joo
2009-01-01
This study examines model selection indices for use with dichotomous mixture item response theory (IRT) models. Five indices are considered: Akaike's information coefficient (AIC), Bayesian information coefficient (BIC), deviance information coefficient (DIC), pseudo-Bayes factor (PsBF), and posterior predictive model checks (PPMC). The five…
Mixture Rasch model for guessing group identification
NASA Astrophysics Data System (ADS)
Siow, Hoo Leong; Mahdi, Rasidah; Siew, Eng Ling
2013-04-01
Several alternative dichotomous Item Response Theory (IRT) models have been introduced to account for guessing effect in multiple-choice assessment. The guessing effect in these models has been considered to be itemrelated. In the most classic case, pseudo-guessing in the three-parameter logistic IRT model is modeled to be the same for all the subjects but may vary across items. This is not realistic because subjects can guess worse or better than the pseudo-guessing. Derivation from the three-parameter logistic IRT model improves the situation by incorporating ability in guessing. However, it does not model non-monotone function. This paper proposes to study guessing from a subject-related aspect which is guessing test-taking behavior. Mixture Rasch model is employed to detect latent groups. A hybrid of mixture Rasch and 3-parameter logistic IRT model is proposed to model the behavior based guessing from the subjects' ways of responding the items. The subjects are assumed to simply choose a response at random. An information criterion is proposed to identify the behavior based guessing group. Results show that the proposed model selection criterion provides a promising method to identify the guessing group modeled by the hybrid model.
Latent Transition Analysis with a Mixture Item Response Theory Measurement Model
ERIC Educational Resources Information Center
Cho, Sun-Joo; Cohen, Allan S.; Kim, Seock-Ho; Bottge, Brian
2010-01-01
A latent transition analysis (LTA) model was described with a mixture Rasch model (MRM) as the measurement model. Unlike the LTA, which was developed with a latent class measurement model, the LTA-MRM permits within-class variability on the latent variable, making it more useful for measuring treatment effects within latent classes. A simulation…
Modeling Working Memory Tasks on the Item Level
ERIC Educational Resources Information Center
Luo, Dasen; Chen, Guopeng; Zen, Fanlin; Murray, Bronwyn
2010-01-01
Item responses to Digit Span and Letter-Number Sequencing were analyzed to develop a better-refined model of the two working memory tasks using the finite mixture (FM) modeling method. Models with ordinal latent traits were found to better account for the independent sources of the variability in the tasks than those with continuous traits, and…
An Investigation of Item Fit Statistics for Mixed IRT Models
ERIC Educational Resources Information Center
Chon, Kyong Hee
2009-01-01
The purpose of this study was to investigate procedures for assessing model fit of IRT models for mixed format data. In this study, various IRT model combinations were fitted to data containing both dichotomous and polytomous item responses, and the suitability of the chosen model mixtures was evaluated based on a number of model fit procedures.…
ERIC Educational Resources Information Center
Fischer, Sebastian; Freund, Philipp Alexander
2014-01-01
The Adaption-Innovation Inventory (AII), originally developed by Kirton (1976), is a widely used self-report instrument for measuring problem-solving styles at work. The present study investigates how scores on the AII are affected by different response styles. Data are collected from a combined sample (N = 738) of students, employees, and…
A Zero- and K-Inflated Mixture Model for Health Questionnaire Data
Finkelman, Matthew D.; Green, Jennifer Greif; Gruber, Michael J.; Zaslavsky, Alan M.
2011-01-01
In psychiatric assessment, Item Response Theory (IRT) is a popular tool to formalize the relation between the severity of a disorder and associated responses to questionnaire items. Practitioners of IRT sometimes make the assumption of normally distributed severities within a population; while convenient, this assumption is often violated when measuring psychiatric disorders. Specifically, there may be a sizable group of respondents whose answers place them at an extreme of the latent trait spectrum. In this article, a zero- and K-inflated mixture model is developed to account for the presence of such respondents. The model is fitted using an expectation-maximization (E-M) algorithm to estimate the percentage of the population at each end of the continuum, concurrently analyzing the remaining “graded component” via IRT. A method to perform factor analysis for only the graded component is introduced. In assessments of oppositional defiant disorder and conduct disorder, the zero- and K-inflated model exhibited better fit than the standard IRT model. PMID:21365673
A Mixture Approach to Vagueness and Ambiguity
Verheyen, Steven; Storms, Gert
2013-01-01
When asked to indicate which items from a set of candidates belong to a particular natural language category inter-individual differences occur: Individuals disagree which items should be considered category members. The premise of this paper is that these inter-individual differences in semantic categorization reflect both ambiguity and vagueness. Categorization differences are said to be due to ambiguity when individuals employ different criteria for categorization. For instance, individuals may disagree whether hiking or darts is the better example of sports because they emphasize respectively whether an activity is strenuous and whether rules apply. Categorization differences are said to be due to vagueness when individuals employ different cut-offs for separating members from non-members. For instance, the decision to include hiking in the sports category or not, may hinge on how strenuous different individuals require sports to be. This claim is supported by the application of a mixture model to categorization data for eight natural language categories. The mixture model can identify latent groups of categorizers who regard different items likely category members (i.e., ambiguity) with categorizers within each of the groups differing in their propensity to provide membership responses (i.e., vagueness). The identified subgroups are shown to emphasize different sets of category attributes when making their categorization decisions. PMID:23667627
CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS
McParland, Damien; Gormley, Isobel Claire; McCormick, Tyler H.; Clark, Samuel J.; Kabudula, Chodziwadziwa Whiteson; Collinson, Mark A.
2014-01-01
The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure—this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region. PMID:25485026
Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model
ERIC Educational Resources Information Center
Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal
2012-01-01
This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…
ERIC Educational Resources Information Center
Zhang, Danhui; Orrill, Chandra; Campbell, Todd
2015-01-01
The purpose of this study was to investigate whether mixture Rasch models followed by qualitative item-by-item analysis of selected Programme for International Student Assessment (PISA) mathematics and science items offered insight into knowledge students invoke in mathematics and science separately and combined. The researchers administered an…
Grouping Influences Output Interference in Short-term Memory: A Mixture Modeling Study.
Kang, Min-Suk; Oh, Byung-Il
2016-01-01
Output interference is a source of forgetting induced by recalling. We investigated how grouping influences output interference in short-term memory. In Experiment 1, the participants were asked to remember four colored items. Those items were grouped by temporal coincidence as well as spatial alignment: two items were presented in the first memory array and two were presented in the second, and the items in both arrays were either vertically or horizontally aligned as well. The participants then performed two recall tasks in sequence by selecting a color presented at a cued location from a color wheel. In the same-group condition, the participants reported both items from the same memory array; however, in the different-group condition, the participants reported one item from each memory array. We analyzed participant responses with a mixture model, which yielded two measures: guess rate and precision of recalled memories. The guess rate in the second recall was higher for the different-group condition than for the same-group condition; however, the memory precisions obtained for both conditions were similarly degraded in the second recall. In Experiment 2, we varied the probability of the same- and different-group conditions with a ratio of 3 to 7. We expected output interference to be higher in the same-group condition than in the different-group condition. This is because items of the other group are more likely to be probed in the second recall phase and, thus, protecting those items during the first recall phase leads to a better performance. Nevertheless, the same pattern of results was robustly reproduced, suggesting grouping shields the grouped items from output interference because of the secured accessibility. We discussed how grouping influences output interference.
A signal detection-item response theory model for evaluating neuropsychological measures.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
2018-02-05
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Rasch Mixture Models for DIF Detection
Strobl, Carolin; Zeileis, Achim
2014-01-01
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch mixture models is sensitive to the specification of the ability distribution even when the conditional maximum likelihood approach is used. It is demonstrated in a simulation study how differences in ability can influence the latent classes of a Rasch mixture model. If the aim is only DIF detection, it is not of interest to uncover such ability differences as one is only interested in a latent group structure regarding the item difficulties. To avoid any confounding effect of ability differences (or impact), a new score distribution for the Rasch mixture model is introduced here. It ensures the estimation of the Rasch mixture model to be independent of the ability distribution and thus restricts the mixture to be sensitive to latent structure in the item difficulties only. Its usefulness is demonstrated in a simulation study, and its application is illustrated in a study of verbal aggression. PMID:29795819
ERIC Educational Resources Information Center
Wu, Pei-Chen; Huang, Tsai-Wei
2010-01-01
This study was to apply the mixed Rasch model to investigate person heterogeneity of Beck Depression Inventory-II-Chinese version (BDI-II-C) and its effects on dimensionality and construct validity. Person heterogeneity was reflected by two latent classes that differ qualitatively. Additionally, person heterogeneity adversely affected the…
A Mixture Rasch Model-Based Computerized Adaptive Test for Latent Class Identification
ERIC Educational Resources Information Center
Jiao, Hong; Macready, George; Liu, Junhui; Cho, Youngmi
2012-01-01
This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback-Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was…
A New Model for Acquiescence at the Interface of Psychometrics and Cognitive Psychology.
Plieninger, Hansjörg; Heck, Daniel W
2018-05-29
When measuring psychological traits, one has to consider that respondents often show content-unrelated response behavior in answering questionnaires. To disentangle the target trait and two such response styles, extreme responding and midpoint responding, Böckenholt ( 2012a ) developed an item response model based on a latent processing tree structure. We propose a theoretically motivated extension of this model to also measure acquiescence, the tendency to agree with both regular and reversed items. Substantively, our approach builds on multinomial processing tree (MPT) models that are used in cognitive psychology to disentangle qualitatively distinct processes. Accordingly, the new model for response styles assumes a mixture distribution of affirmative responses, which are either determined by the underlying target trait or by acquiescence. In order to estimate the model parameters, we rely on Bayesian hierarchical estimation of MPT models. In simulations, we show that the model provides unbiased estimates of response styles and the target trait, and we compare the new model and Böckenholt's model in a recovery study. An empirical example from personality psychology is used for illustrative purposes.
Code of Federal Regulations, 2010 CFR
2010-07-01
... nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. 126.28 Section 126..., ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. (a) When any item of ammonium nitrate, ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate...
Code of Federal Regulations, 2014 CFR
2014-07-01
... nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. 126.28 Section 126..., ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. (a) When any item of ammonium nitrate, ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate...
Code of Federal Regulations, 2013 CFR
2013-07-01
... nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. 126.28 Section 126..., ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. (a) When any item of ammonium nitrate, ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate...
Code of Federal Regulations, 2011 CFR
2011-07-01
... nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. 126.28 Section 126..., ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. (a) When any item of ammonium nitrate, ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate...
Code of Federal Regulations, 2012 CFR
2012-07-01
... nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. 126.28 Section 126..., ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate; general provisions. (a) When any item of ammonium nitrate, ammonium nitrate fertilizers, fertilizer mixtures, or nitro carbo nitrate...
ERIC Educational Resources Information Center
Aryadoust, Vahid
2015-01-01
The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental…
Evaluation of Student Performance through a Multidimensional Finite Mixture IRT Model.
Bacci, Silvia; Bartolucci, Francesco; Grilli, Leonardo; Rampichini, Carla
2017-01-01
In the Italian academic system, a student can enroll for an exam immediately after the end of the teaching period or can postpone it; in this second case the exam result is missing. We propose an approach for the evaluation of a student performance throughout the course of study, accounting also for nonattempted exams. The approach is based on an item response theory model that includes two discrete latent variables representing student performance and priority in selecting the exams to take. We explicitly account for nonignorable missing observations as the indicators of attempted exams also contribute to measure the performance (within-item multidimensionality). The model also allows for individual covariates in its structural part.
Failure of self-consistency in the discrete resource model of visual working memory.
Bays, Paul M
2018-06-03
The discrete resource model of working memory proposes that each individual has a fixed upper limit on the number of items they can store at one time, due to division of memory into a few independent "slots". According to this model, responses on short-term memory tasks consist of a mixture of noisy recall (when the tested item is in memory) and random guessing (when the item is not in memory). This provides two opportunities to estimate capacity for each observer: first, based on their frequency of random guesses, and second, based on the set size at which the variability of stored items reaches a plateau. The discrete resource model makes the simple prediction that these two estimates will coincide. Data from eight published visual working memory experiments provide strong evidence against such a correspondence. These results present a challenge for discrete models of working memory that impose a fixed capacity limit. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.
Precision of working memory for speech sounds.
Joseph, Sabine; Iverson, Paul; Manohar, Sanjay; Fox, Zoe; Scott, Sophie K; Husain, Masud
2015-01-01
Memory for speech sounds is a key component of models of verbal working memory (WM). But how good is verbal WM? Most investigations assess this using binary report measures to derive a fixed number of items that can be stored. However, recent findings in visual WM have challenged such "quantized" views by employing measures of recall precision with an analogue response scale. WM for speech sounds might rely on both continuous and categorical storage mechanisms. Using a novel speech matching paradigm, we measured WM recall precision for phonemes. Vowel qualities were sampled from a formant space continuum. A probe vowel had to be adjusted to match the vowel quality of a target on a continuous, analogue response scale. Crucially, this provided an index of the variability of a memory representation around its true value and thus allowed us to estimate how memories were distorted from the original sounds. Memory load affected the quality of speech sound recall in two ways. First, there was a gradual decline in recall precision with increasing number of items, consistent with the view that WM representations of speech sounds become noisier with an increase in the number of items held in memory, just as for vision. Based on multidimensional scaling (MDS), the level of noise appeared to be reflected in distortions of the formant space. Second, as memory load increased, there was evidence of greater clustering of participants' responses around particular vowels. A mixture model captured both continuous and categorical responses, demonstrating a shift from continuous to categorical memory with increasing WM load. This suggests that direct acoustic storage can be used for single items, but when more items must be stored, categorical representations must be used.
2016-01-01
We investigated whether intentional forgetting impacts only the likelihood of later retrieval from long-term memory or whether it also impacts the fidelity of those representations that are successfully retrieved. We accomplished this by combining an item-method directed forgetting task with a testing procedure and modeling approach inspired by the delayed-estimation paradigm used in the study of visual short-term memory (STM). Abstract or concrete colored images were each followed by a remember (R) or forget (F) instruction and sometimes by a visual probe requiring a speeded detection response (E1–E3). Memory was tested using an old–new (E1–E2) or remember-know-no (E3) recognition task followed by a continuous color judgment task (E2–E3); a final experiment included only the color judgment task (E4). Replicating the existing literature, more “old” or “remember” responses were made to R than F items and RTs to postinstruction visual probes were longer following F than R instructions. Color judgments were more accurate for successfully recognized or recollected R than F items (E2–E3); a mixture model confirmed a decrease to both the probability of retrieving the F items as well as the fidelity of the representation of those F items that were retrieved (E4). We conclude that intentional forgetting is an effortful process that not only reduces the likelihood of successfully encoding an item for later retrieval, but also produces an impoverished memory trace even when those items are retrieved; these findings draw a parallel between the control of memory representations within working and long-term memory. PMID:26709589
Fawcett, Jonathan M; Lawrence, Michael A; Taylor, Tracy L
2016-01-01
We investigated whether intentional forgetting impacts only the likelihood of later retrieval from long-term memory or whether it also impacts the fidelity of those representations that are successfully retrieved. We accomplished this by combining an item-method directed forgetting task with a testing procedure and modeling approach inspired by the delayed-estimation paradigm used in the study of visual short-term memory (STM). Abstract or concrete colored images were each followed by a remember (R) or forget (F) instruction and sometimes by a visual probe requiring a speeded detection response (E1-E3). Memory was tested using an old-new (E1-E2) or remember-know-no (E3) recognition task followed by a continuous color judgment task (E2-E3); a final experiment included only the color judgment task (E4). Replicating the existing literature, more "old" or "remember" responses were made to R than F items and RTs to postinstruction visual probes were longer following F than R instructions. Color judgments were more accurate for successfully recognized or recollected R than F items (E2-E3); a mixture model confirmed a decrease to both the probability of retrieving the F items as well as the fidelity of the representation of those F items that were retrieved (E4). We conclude that intentional forgetting is an effortful process that not only reduces the likelihood of successfully encoding an item for later retrieval, but also produces an impoverished memory trace even when those items are retrieved; these findings draw a parallel between the control of memory representations within working and long-term memory. (c) 2015 APA, all rights reserved).
Attention mediates the flexible allocation of visual working memory resources.
Emrich, Stephen M; Lockhart, Holly A; Al-Aidroos, Naseem
2017-07-01
Though it is clear that it is impossible to store an unlimited amount of information in visual working memory (VWM), the limiting mechanisms remain elusive. While several models of VWM limitations exist, these typically characterize changes in performance as a function of the number of to-be-remembered items. Here, we examine whether changes in spatial attention could better account for VWM performance, independent of load. Across 2 experiments, performance was better predicted by the prioritization of memory items (i.e., attention) than by the number of items to be remembered (i.e., memory load). This relationship followed a power law, and held regardless of whether performance was assessed based on overall precision or any of 3 measures in a mixture model. Moreover, at large set sizes, even minimally attended items could receive a small proportion of resources, without any evidence for a discrete-capacity on the number of items that could be maintained in VWM. Finally, the observed data were best fit by a variable-precision model in which response error was related to the proportion of resources allocated to each item, consistent with a model of VWM in which performance is determined by the continuous allocation of attentional resources during encoding. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Balanced Rotating Spray Tank and Pipe Cleaning and Cleanliness Verification System
NASA Technical Reports Server (NTRS)
Caimi, Raoul E. B. (Inventor); Thaxton, Eric A. (Inventor)
1998-01-01
A system for cleaning and verifying the cleanliness of the interior surfaces of hollow items, such as small bottles, tanks, pipes and tubes, employs a rotating spray head for supplying a gas-liquid cleaning mixture to the item's surface at a supersonic velocity. The spray head incorporates a plurality of nozzles having diverging cross sections so that the incoming gas-liquid mixture is first converged within the spray head and then diverged through the nozzles, thereby accelerating the mixture to a supersonic velocity. In the preferred embodiment, three nozzles are employed; one forwardly facing nozzle at the end of the spray head and two oppositely facing angled nozzles exiting on opposite sides of the spray head which balance each other, and therefore impart no net side load on the spray head. A drive mechanism is provided to rotate the spray head and at the same time move the head back and forth within the item to be cleaned. The drive mechanism acts on a long metal tube to which the spray head is fixed, and thus no moving parts are exposed to the interior surfaces of the items to be cleaned, thereby reducing the risk of contamination.
ERIC Educational Resources Information Center
de Jong, Martijn G.; Steenkamp, Jan-Benedict E. M.
2010-01-01
We present a class of finite mixture multilevel multidimensional ordinal IRT models for large scale cross-cultural research. Our model is proposed for confirmatory research settings. Our prior for item parameters is a mixture distribution to accommodate situations where different groups of countries have different measurement operations, while…
Glycaemic Response to Quality Protein Maize Grits
Panlasigui, Leonora N.; Bayaga, Cecile L. T.; Barrios, Erniel B.; Cochon, Kim L.
2010-01-01
Background. Carbohydrates have varied rates of digestion and absorption that induces different hormonal and metabolic responses in the body. Given the abundance of carbohydrate sources in the Philippines, the determination of the glycaemic index (GI) of local foods may prove beneficial in promoting health and decreasing the risk of diabetes in the country. Methods. The GI of Quality Protein Maize (QPM) grits, milled rice, and the mixture of these two food items were determined in ten female subjects. Using a randomized crossover design, the control bread and three test foods were given on separate occasions after an overnight fast. Blood samples were collected through finger prick at time intervals of 0, 15, 30, 45, 60, 90, and 120 min and analyzed for glucose concentrations. Results. The computed incremental area under the glucose response curve (IAUC) varies significantly across test foods (P < .0379) with the pure QPM grits yielding the lowest IAUC relative to the control by 46.38. Resulting GI values of the test foods (bootstrapped) were 80.36 (SEM 14.24), 119.78 (SEM 18.81), and 93.17 (SEM 27.27) for pure QPM grits, milled rice, and rice-QPM grits mixture, respectively. Conclusion. Pure QPM corn grits has a lower glycaemic response compared to milled rice and the rice-corn grits mixture, which may be related in part to differences in their dietary fibre composition and physicochemical characteristics. Pure QPM corn grits may be a more health beneficial food for diabetic and hyperlipidemic individuals. PMID:20862364
Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Egeland, Thore
2018-01-12
The traditional understanding of data from Likert scales is that the quantifications involved result from measures of attitude strength. Applying a recently proposed semantic theory of survey response, we claim that survey responses tap two different sources: a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics, we hypothesized that in many cases, information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. We developed a procedure to separate the semantic influence from attitude strength in individual response patterns, and compared these results to, respectively, the observed sample correlation matrices and the semantic similarity structures arising from text analysis algorithms. This was done with four datasets, comprising a total of 7,787 subjects and 27,461,502 observed item pair responses. As we argued, attitude strength seemed to account for much information about the individual respondents. However, this information did not seem to carry over into the observed sample correlation matrices, which instead converged around the semantic structures offered by the survey items. This is potentially disturbing for the traditional understanding of what survey data represent. We argue that this approach contributes to a better understanding of the cognitive processes involved in survey responses. In turn, this could help us make better use of the data that such methods provide.
Discrete-Slots Models of Visual Working-Memory Response Times
Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.
2014-01-01
Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
Development and assessment of the Quality of Life in Childhood Epilepsy Questionnaire (QOLCE-16).
Goodwin, Shane W; Ferro, Mark A; Speechley, Kathy N
2018-03-01
The aim of this study was to develop and validate a brief version of the Quality of Life in Childhood Epilepsy Questionnaire (QOLCE). A secondary aim was to compare the results described in previously published studies using the QOLCE-55 with those obtained using the new brief version. Data come from 373 children involved in the Health-related Quality of Life in Children with Epilepsy Study, a multicenter prospective cohort study. Item response theory (IRT) methods were used to assess dimensionality and item properties and to guide the selection of items. Replication of results using the brief measure was conducted with multiple regression, multinomial regression, and latent mixture modeling techniques. IRT methods identified a bi-factor graded response model that best fits the data. Thirty-nine items were removed, resulting in a 16-item QOLCE (QOLCE-16) with an equal number of items in all 4 domains of functioning (Cognitive, Emotional, Social, and Physical). Model fit was excellent: Comparative Fit Index = 0.99; Tucker-Lewis Index = 0.99; root mean square error of approximation = 0.052 (90% confidence interval [CI] 0.041-0.064); weighted root mean square = 0.76. Results that were reported previously using the QOLCE-55 and QOLCE-76 were comparable to those generated using the QOLCE-16. The QOLCE-16 is a multidimensional measure of health-related quality of life (HRQoL) with good psychometric properties and a short-estimated completion time. It is notable that the items were calibrated using multidimensional IRT methods to create a measure that conforms to conventional definitions of HRQoL. The QOLCE-16 is an appropriate measure for both clinicians and researchers wanting to record HRQoL information in children with epilepsy. Wiley Periodicals, Inc. © 2018 International League Against Epilepsy.
A Mixed Effects Randomized Item Response Model
ERIC Educational Resources Information Center
Fox, J.-P.; Wyrick, Cheryl
2008-01-01
The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
19 CFR 127.28 - Special merchandise.
Code of Federal Regulations, 2010 CFR
2010-04-01
... a chemical substance or mixture, as these items are defined in section 3, Toxic Substances Control... pest or any other form of plant or animal life (other than man or other than bacteria, virus, or other... upon public notice of not less than 6 or more than 10 days. (i) Chemical substances, mixtures, and...
19 CFR 127.28 - Special merchandise.
Code of Federal Regulations, 2012 CFR
2012-04-01
... a chemical substance or mixture, as these items are defined in section 3, Toxic Substances Control... pest or any other form of plant or animal life (other than man or other than bacteria, virus, or other... upon public notice of not less than 6 or more than 10 days. (i) Chemical substances, mixtures, and...
19 CFR 127.28 - Special merchandise.
Code of Federal Regulations, 2011 CFR
2011-04-01
... a chemical substance or mixture, as these items are defined in section 3, Toxic Substances Control... pest or any other form of plant or animal life (other than man or other than bacteria, virus, or other... upon public notice of not less than 6 or more than 10 days. (i) Chemical substances, mixtures, and...
19 CFR 127.28 - Special merchandise.
Code of Federal Regulations, 2013 CFR
2013-04-01
... a chemical substance or mixture, as these items are defined in section 3, Toxic Substances Control... pest or any other form of plant or animal life (other than man or other than bacteria, virus, or other... upon public notice of not less than 6 or more than 10 days. (i) Chemical substances, mixtures, and...
19 CFR 127.28 - Special merchandise.
Code of Federal Regulations, 2014 CFR
2014-04-01
... a chemical substance or mixture, as these items are defined in section 3, Toxic Substances Control... pest or any other form of plant or animal life (other than man or other than bacteria, virus, or other... upon public notice of not less than 6 or more than 10 days. (i) Chemical substances, mixtures, and...
Detecting Social Desirability Bias Using Factor Mixture Models
ERIC Educational Resources Information Center
Leite, Walter L.; Cooper, Lou Ann
2010-01-01
Based on the conceptualization that social desirable bias (SDB) is a discrete event resulting from an interaction between a scale's items, the testing situation, and the respondent's latent trait on a social desirability factor, we present a method that makes use of factor mixture models to identify which examinees are most likely to provide…
Rasch Mixture Models for DIF Detection: A Comparison of Old and New Score Specifications
ERIC Educational Resources Information Center
Frick, Hannah; Strobl, Carolin; Zeileis, Achim
2015-01-01
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…
ERIC Educational Resources Information Center
Lee, HwaYoung; Beretvas, S. Natasha
2014-01-01
Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…
Using a Mixture IRT Model to Understand English Learner Performance on Large-Scale Assessments
ERIC Educational Resources Information Center
Shea, Christine A.
2013-01-01
The purpose of this study was to determine whether an eighth grade state-level math assessment contained items that function differentially (DIF) for English Learner students (EL) as compared to English Only students (EO) and if so, what factors might have caused DIF. To determine this, Differential Item Functioning (DIF) analysis was employed.…
Whatever you do, don't look at the...: Evaluating guidance by an exclusionary attentional template.
Beck, Valerie M; Luck, Steven J; Hollingworth, Andrew
2018-04-01
People can use a target template consisting of one or more features to guide attention and gaze to matching objects in a search array. But can we also use feature information to guide attention away from known irrelevant items? Some studies found a benefit from foreknowledge of a distractor feature, whereas others found a cost. Importantly, previous work has largely relied on end-of-trial manual responses; it is unclear how feature-guided avoidance might unfold as candidate objects are inspected. In the current experiments, participants were cued with a distractor feature to avoid, then performed a visual search task while eye movements were recorded. Participants initially fixated a to-be-avoided object more frequently than predicted by chance, but they also demonstrated avoidance of cue-matching objects later in the trial. When provided more time between cue stimulus and search array, participants continued to be initially captured by a cued-color item. Furthermore, avoidance of cue-matching objects later in the trial was not contingent on initial capture by a cue-matching object. These results suggest that the conflicting findings in previous negative-cue experiments may be explained by a mixture of two independent processes: initial attentional capture by memory-matching items and later avoidance of known irrelevant items. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1976-10-01
The Environmental Protection Agency is currently undertaking programs that measure the exhaust emissions of in-use vehicles. One of these programs, the Emission Factors Program (EFP), has generated data indicating that a high percentage of in-use 1975 automobiles have exhaust emissions exceeding the Federal emission standards for 1975-1976 light-duty vehicles. Typical failing vehicles have very high CO emissions. High CO emissions may be indicative of improper adjustment of either the idle mixture or the choke. Since idle mixture and choke adjustments are easily accessible and adjusted on most cars, it seems probable that the maladjustment of these two items may bemore » responsible for some of the high emission levels measured in the EFP. In order to further investigate these possibilities, a test program was conducted by the EPA to quantify the effects of various engine maladjustments on exhaust emissions. This test program would help identify maladjustments resulting in the types of failures encountered in the EFP.« less
2000-12-01
A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Stephan-Otto, Christian; Siddi, Sara; Senior, Carl; Cuevas-Esteban, Jorge; Cambra-Martí, Maria Rosa; Ochoa, Susana; Brébion, Gildas
2017-09-01
Previous research suggests that visual hallucinations in schizophrenia consist of mental images mistaken for percepts due to failure of the reality-monitoring processes. However, the neural substrates that underpin such dysfunction are currently unknown. We conducted a brain imaging study to investigate the role of visual mental imagery in visual hallucinations. Twenty-three patients with schizophrenia and 26 healthy participants were administered a reality-monitoring task whilst undergoing an fMRI protocol. At the encoding phase, a mixture of pictures of common items and labels designating common items were presented. On the memory test, participants were requested to remember whether a picture of the item had been presented or merely its label. Visual hallucination scores were associated with a liberal response bias reflecting propensity to erroneously remember pictures of the items that had in fact been presented as words. At encoding, patients with visual hallucinations differentially activated the right fusiform gyrus when processing the words they later remembered as pictures, which suggests the formation of visual mental images. On the memory test, the whole patient group activated the anterior cingulate and medial superior frontal gyrus when falsely remembering pictures. However, no differential activation was observed in patients with visual hallucinations, whereas in the healthy sample, the production of visual mental images at encoding led to greater activation of a fronto-parietal decisional network on the memory test. Visual hallucinations are associated with enhanced visual imagery and possibly with a failure of the reality-monitoring processes that enable discrimination between imagined and perceived events. Copyright © 2017 Elsevier Ltd. All rights reserved.
Massof, Robert W
2014-10-01
A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Felton, D.L.
1985-02-01
Research progress is reported in the following areas: (1) evaluation of possible health effects among nuclear workers; (2) dose-effect relationship studies of carcinogenesis from both nuclear materials and complex mixtures; (3) microbial mutagenesis studies with 6-aminochrysene and benzo(a)pyrene in coal-derived complex mixtures; and (4) a variety of studies relating to noncarcinogenic and nonmutagenic endpoints, including teratology, perinatal studies and studies to determine absorption, metabolism, and doses to critical tissues and organs of coal-derived mixtures and radionuclides. Items have been individually abstracted for the data base. (ACR)
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
2013-07-01
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
7 CFR 1000.40 - Classes of utilization.
Code of Federal Regulations, 2013 CFR
2013-01-01
..., sour half-and-half, sour cream mixtures containing non-milk items; yogurt, including yogurt containing beverages with 20 percent or more yogurt by weight and kefir, and any other semi-solid product resembling a...
7 CFR 1000.40 - Classes of utilization.
Code of Federal Regulations, 2012 CFR
2012-01-01
..., sour half-and-half, sour cream mixtures containing non-milk items; yogurt, including yogurt containing beverages with 20 percent or more yogurt by weight and kefir, and any other semi-solid product resembling a...
7 CFR 1000.40 - Classes of utilization.
Code of Federal Regulations, 2014 CFR
2014-01-01
..., sour half-and-half, sour cream mixtures containing non-milk items; yogurt, including yogurt containing beverages with 20 percent or more yogurt by weight and kefir, and any other semi-solid product resembling a...
An NCME Instructional Module on Polytomous Item Response Theory Models
ERIC Educational Resources Information Center
Penfield, Randall David
2014-01-01
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gillispie, Obie William; Worl, Laura Ann; Veirs, Douglas Kirk
A mixture of chlorine-containing, impure plutonium oxides has been produced and has been given the name Master Blend. This large quantity of well-characterized chlorinecontaining material is available for use in the Integrated Surveillance and Monitoring Program for shelf-life experiments. It is intended to be representative of materials packaged to meet DOE-STD-3013.1 The Master Blend contains a mixture of items produced in Los Alamos National Laboratory’s (LANL) electro-refining pyrochemical process in the late 1990s. Twenty items were crushed and sieved, calcined to 800ºC for four hours, and blended multiple times. This process resulted in four batches of Master Blend. Calorimetry andmore » density data on material from the four batches indicate homogeneity.« less
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
ERIC Educational Resources Information Center
Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.
2011-01-01
The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
A competitive binding model predicts the response of mammalian olfactory receptors to mixtures
NASA Astrophysics Data System (ADS)
Singh, Vijay; Murphy, Nicolle; Mainland, Joel; Balasubramanian, Vijay
Most natural odors are complex mixtures of many odorants, but due to the large number of possible mixtures only a small fraction can be studied experimentally. To get a realistic understanding of the olfactory system we need methods to predict responses to complex mixtures from single odorant responses. Focusing on mammalian olfactory receptors (ORs in mouse and human), we propose a simple biophysical model for odor-receptor interactions where only one odor molecule can bind to a receptor at a time. The resulting competition for occupancy of the receptor accounts for the experimentally observed nonlinear mixture responses. We first fit a dose-response relationship to individual odor responses and then use those parameters in a competitive binding model to predict mixture responses. With no additional parameters, the model predicts responses of 15 (of 18 tested) receptors to within 10 - 30 % of the observed values, for mixtures with 2, 3 and 12 odorants chosen from a panel of 30. Extensions of our basic model with odorant interactions lead to additional nonlinearities observed in mixture response like suppression, cooperativity, and overshadowing. Our model provides a systematic framework for characterizing and parameterizing such mixing nonlinearities from mixture response data.
ERIC Educational Resources Information Center
Fukuhara, Hirotaka; Kamata, Akihito
2011-01-01
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item Response Models for Examinee-Selected Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei
2012-01-01
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
ERIC Educational Resources Information Center
Lee, Woo-yeol; Cho, Sun-Joo
2017-01-01
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Guenole, Nigel; Brown, Anna A; Cooper, Andrew J
2018-06-01
This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions
ERIC Educational Resources Information Center
Liang, Longjuan; Browne, Michael W.
2015-01-01
If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks
Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.
2013-01-01
Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
van Wijk, Michiel; de Bruijn, Paulien J A; Sabelis, Maurice W
2010-11-01
Phytoseiulus persimilis is a predatory mite that in absence of vision relies on the detection of herbivore-induced plant odors to locate its prey, the two-spotted spider-mite Tetranychus urticae. This herbivorous prey is feeding on leaves of a wide variety of plant species in different families. The predatory mites respond to numerous structurally different compounds. However, typical spider-mite induced plant compounds do not attract more predatory mites than plant compounds not associated with prey. Because the mites are sensitive to many compounds, components of odor mixtures may affect each other's perception. Although the response to pure compounds has been well documented, little is known how interactions among compounds affect the response to odor mixtures. We assessed the relation between the mites' responses elicited by simple mixtures of two compounds and by the single components of these mixtures. The preference for the mixture was compared to predictions under three conceptual models, each based on one of the following assumptions: (1) the responses elicited by each of the individual components can be added to each other; (2) they can be averaged; or (3) one response overshadows the other. The observed response differed significantly from the response predicted under the additive response, average response, and overshadowing response model in 52, 36, and 32% of the experimental tests, respectively. Moreover, the behavioral responses elicited by individual compounds and their binary mixtures were determined as a function of the odor concentration. The relative contribution of each component to the behavioral response elicited by the mixture varied with the odor concentration, even though the ratio of both compounds in the mixture was kept constant. Our experiments revealed that compounds that elicited no response had an effect on the response elicited by binary mixtures that they were part of. The results are not consistent with the hypothesis that P. persimilis perceives odor mixtures as a collection of strictly elemental objects. They suggest that odor mixtures rather are perceived as one synthetic whole.
Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J
2004-01-01
Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka
2016-01-01
Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items
ERIC Educational Resources Information Center
Aybek, Eren Can; Demirtasli, R. Nukhet
2017-01-01
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
ERIC Educational Resources Information Center
Ito, Kyoko; Sykes, Robert C.
This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
49 CFR 174.9 - Safety and security inspection and acceptance.
Code of Federal Regulations, 2010 CFR
2010-10-01
... of this subchapter, rail carload quantities of ammonium nitrate or ammonium nitrate mixtures in solid... accordance with § 174.50. (d) Where an indication of tampering or suspicious item is found, a carrier must...
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
Writing, Evaluating and Assessing Data Response Items in Economics.
ERIC Educational Resources Information Center
Trotman-Dickenson, D. I.
1989-01-01
Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores
ERIC Educational Resources Information Center
Johnson, Timothy R.
2013-01-01
One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means
ERIC Educational Resources Information Center
Polak, Marike; De Rooij, Mark; Heiser, Willem J.
2012-01-01
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Sensitivity of the immature rat uterotrophic assay to mixtures of estrogens.
Tinwell, Helen; Ashby, John
2004-01-01
We have evaluated whether mixtures of estrogens, present in the mix at doses that are individually inactive in the immature rat uterotrophic assay, can give a uterotrophic response. Seven chemicals were evaluated: nonylphenol, bisphenol A (BPA), methoxychlor, genistein (GEN), estradiol, diethylstilbestrol, and ethinyl estradiol. Dose responses in the uterotrophic assay were constructed for each chemical. The first series of experiments involved evaluating binary mixtures of BPA and GEN at dose levels that gave moderate uterotrophic responses when tested individually. The mixtures generally showed an intermediate or reduced uterotrophic effect compared with when the components of the mixture were tested alone at the dose used in the mixture. The next series of experiments used a multicomponent (complex) mixture of all seven chemicals evaluated at doses that gave either weakly positive or inactive uterotrophic responses when tested individually in the assay. Doses that were nominally equi-uterotrophic ranged over approximately six orders of magnitude for the seven chemicals. Doses of agents that gave a weak uterotrophic response when tested individually gave a marginally enhanced positive response in the assay when tested combined as a mixture. Doses of agents that gave a negative uterotrophic response when tested individually gave a positive response when tested as a mixture. These data indicate that a variety of different estrogen receptor (ER) agonists, present individually at subeffective doses, can act simultaneously to evoke an ER-regulated response. However, translating these findings into the process of environmental hazard assessment will be difficult. The simple addition of the observed, or predicted, activities for the components of a mixture is confirmed here to be inappropriate and to overestimate the actual effect induced by the mixture. Equally, isobole analysis is only suitable for two- or three-component mixtures, and concentration addition requires access to dose-response data and EC50 values (concentration giving 50% of the maximum response) for the individual components of the mixture--requirements that will rarely be fulfilled for complex environmental samples. Given these uncertainties, we conclude that it may be most expedient to select and bioassay whole environmental mixtures of potential concern. PMID:15064164
Effects of binary taste stimuli on the neural activity of the hamster chorda tympani
1980-01-01
Binary mixtures of taste stimuli were applied to the tongue of the hamster and the reaction of the whole corda tympani was recorded. Some of the chemicals that were paired in mixtures (HCl, NH4Cl, NaCl, CaCl2, sucrose, and D-phenylalanine) have similar tastes to human and/or hamster, and/or common stimulatory effects on individual fibers of the hamster chorda tympani; other pairs of these chemicals have dissimilar tastes and/or distinct neural stimulatory effects. The molarity of each chemical with approximately the same effect on the activity of the nerve as 0.01 M NaCl was selected, and an established relation between stimulus concentration and response allowed estimation of the effect of a "mixture" of two concentrations of one chemical. Each mixture elicited a response that was smaller than the sum of the responses to its components. However, responses to some mixtures approached this sum, and responses to other mixtures closely approached the response to a "mixture" of two concentrations of one chemical. Responses of the former variety were generated by mixtures of an electrolyte and a nonelectrolyte and the latter by mixtures of two electrolytes or two nonelectrolytes. But, beyond the distinction between electrolytes and nonelectrolytes, the whole-nerve response to a mixture could not be predicted from the known neural or psychophysical effects of its components. PMID:7411114
Structure-reactivity modeling using mixture-based representation of chemical reactions.
Polishchuk, Pavel; Madzhidov, Timur; Gimadiev, Timur; Bodrov, Andrey; Nugmanov, Ramil; Varnek, Alexandre
2017-09-01
We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn't need an explicit labeling of a reaction center. The rigorous "product-out" cross-validation (CV) strategy has been suggested. Unlike the naïve "reaction-out" CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new "mixture" approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.
Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J
2002-06-01
This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
Rademaker, Rosanne L; van de Ven, Vincent G; Tong, Frank; Sack, Alexander T
2017-01-01
Neuroimaging studies have demonstrated that activity patterns in early visual areas predict stimulus properties actively maintained in visual working memory. Yet, the mechanisms by which such information is represented remain largely unknown. In this study, observers remembered the orientations of 4 briefly presented gratings, one in each quadrant of the visual field. A 10Hz Transcranial Magnetic Stimulation (TMS) triplet was applied directly at stimulus offset, or midway through a 2-second delay, targeting early visual cortex corresponding retinotopically to a sample item in the lower hemifield. Memory for one of the four gratings was probed at random, and participants reported this orientation via method of adjustment. Recall errors were smaller when the visual field location targeted by TMS overlapped with that of the cued memory item, compared to errors for stimuli probed diagonally to TMS. This implied topographic storage of orientation information, and a memory-enhancing effect at the targeted location. Furthermore, early pulses impaired performance at all four locations, compared to late pulses. Next, response errors were fit empirically using a mixture model to characterize memory precision and guess rates. Memory was more precise for items proximal to the pulse location, irrespective of pulse timing. Guesses were more probable with early TMS pulses, regardless of stimulus location. Thus, while TMS administered at the offset of the stimulus array might disrupt early-phase consolidation in a non-topographic manner, TMS also boosts the precise representation of an item at its targeted retinotopic location, possibly by increasing attentional resources or by injecting a beneficial amount of noise.
van de Ven, Vincent G.; Tong, Frank; Sack, Alexander T.
2017-01-01
Neuroimaging studies have demonstrated that activity patterns in early visual areas predict stimulus properties actively maintained in visual working memory. Yet, the mechanisms by which such information is represented remain largely unknown. In this study, observers remembered the orientations of 4 briefly presented gratings, one in each quadrant of the visual field. A 10Hz Transcranial Magnetic Stimulation (TMS) triplet was applied directly at stimulus offset, or midway through a 2-second delay, targeting early visual cortex corresponding retinotopically to a sample item in the lower hemifield. Memory for one of the four gratings was probed at random, and participants reported this orientation via method of adjustment. Recall errors were smaller when the visual field location targeted by TMS overlapped with that of the cued memory item, compared to errors for stimuli probed diagonally to TMS. This implied topographic storage of orientation information, and a memory-enhancing effect at the targeted location. Furthermore, early pulses impaired performance at all four locations, compared to late pulses. Next, response errors were fit empirically using a mixture model to characterize memory precision and guess rates. Memory was more precise for items proximal to the pulse location, irrespective of pulse timing. Guesses were more probable with early TMS pulses, regardless of stimulus location. Thus, while TMS administered at the offset of the stimulus array might disrupt early-phase consolidation in a non-topographic manner, TMS also boosts the precise representation of an item at its targeted retinotopic location, possibly by increasing attentional resources or by injecting a beneficial amount of noise. PMID:28384347
de Bruijn, Paulien J. A.; Sabelis, Maurice W.
2010-01-01
Phytoseiulus persimilis is a predatory mite that in absence of vision relies on the detection of herbivore-induced plant odors to locate its prey, the two-spotted spider-mite Tetranychus urticae. This herbivorous prey is feeding on leaves of a wide variety of plant species in different families. The predatory mites respond to numerous structurally different compounds. However, typical spider-mite induced plant compounds do not attract more predatory mites than plant compounds not associated with prey. Because the mites are sensitive to many compounds, components of odor mixtures may affect each other’s perception. Although the response to pure compounds has been well documented, little is known how interactions among compounds affect the response to odor mixtures. We assessed the relation between the mites’ responses elicited by simple mixtures of two compounds and by the single components of these mixtures. The preference for the mixture was compared to predictions under three conceptual models, each based on one of the following assumptions: (1) the responses elicited by each of the individual components can be added to each other; (2) they can be averaged; or (3) one response overshadows the other. The observed response differed significantly from the response predicted under the additive response, average response, and overshadowing response model in 52, 36, and 32% of the experimental tests, respectively. Moreover, the behavioral responses elicited by individual compounds and their binary mixtures were determined as a function of the odor concentration. The relative contribution of each component to the behavioral response elicited by the mixture varied with the odor concentration, even though the ratio of both compounds in the mixture was kept constant. Our experiments revealed that compounds that elicited no response had an effect on the response elicited by binary mixtures that they were part of. The results are not consistent with the hypothesis that P. persimilis perceives odor mixtures as a collection of strictly elemental objects. They suggest that odor mixtures rather are perceived as one synthetic whole. Electronic supplementary material The online version of this article (doi:10.1007/s10886-010-9858-3) contains supplementary material, which is available to authorized users. PMID:20872172
Item Response Data Analysis Using Stata Item Response Theory Package
ERIC Educational Resources Information Center
Yang, Ji Seung; Zheng, Xiaying
2018-01-01
The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Item Response Models for Local Dependence among Multiple Ratings
ERIC Educational Resources Information Center
Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan
2014-01-01
Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Item response theory - A first approach
NASA Astrophysics Data System (ADS)
Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar
2017-07-01
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
A Multidimensional Ideal Point Item Response Theory Model for Binary Data
ERIC Educational Resources Information Center
Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.
2006-01-01
We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
1989-01-01
In vivo electrophysiological recordings from populations of olfactory receptor neurons in the channel catfish, Ictalurus punctatus, clearly showed that responses to binary and trinary mixtures of amino acids were predictable with knowledge obtained from previous cross-adaptation studies of the relative independence of the respective binding sites of the component stimuli. All component stimuli, from which equal aliquots were drawn to form the mixtures, were adjusted in concentration to provide for approximately equal response magnitudes. The magnitude of the response to a mixture whose component amino acids showed significant cross-reactivity was equivalent to the response to any single component used to form that mixture. A mixture whose component amino acids showed minimal cross-adaptation produced a significantly larger relative response than a mixture whose components exhibited considerable cross-reactivity. This larger response approached the sum of the responses to the individual component amino acids tested at the resulting concentrations in the mixture, even though olfactory receptor dose-response functions for amino acids in this species are characterized by extreme sensory compression (i.e., successive concentration increments produce progressively smaller physiological responses). Thus, the present study indicates that the response to sensory stimulation of olfactory receptor sites is more enhanced by the activation of different receptor site types than by stimulus interaction at a single site type. PMID:2703818
0-6621 : developing a mixture-based specification for flexible base.
DOT National Transportation Integrated Search
2012-08-01
The Texas Department of Transportation : (TxDOT) currently utilizes Item 247, Flexible : Base, to specify a foundation course of flexible : base utilized in a pavement. Base materials are : not allowed to be used by the contractors until : the ...
ADDITIVITY ASSESSMENT OF TRIHALOMETHANE MIXTURES BY PROPORTIONAL RESPONSE ADDITION
If additivity is known or assumed, the toxicity of a chemical mixture may be predicted from the dose response curves of the individual chemicals comprising the mixture. As single chemical data are abundant and mixture data sparse, mixture risk methods that utilize single chemical...
Spacing and lag effects in free recall of pure lists.
Kahana, Michael J; Howard, Marc W
2005-02-01
Repeating list items leads to better recall when the repetitions are separated by several unique items than when they are presented successively; the spacing effect refers to improved recall for spaced versus successive repetition (lag > 0 vs. lag = 0); the lag effect refers to improved recall for long lags versus short lags. Previous demonstrations of the lag effect have utilized lists containing a mixture of items with varying degrees of spacing. Because differential rehearsal of items in mixed lists may exaggerate any effects of spacing, it is important to demonstrate these effects in pure lists. As in Toppino and Schneider (1999), we found an overall advantage for recall of spaced lists. We further report the first demonstration of a lag effect in pure lists, with significantly better recall for lists with widely spaced repetitions than for those with moderately spaced repetitions.
A Two-Decision Model for Responses to Likert-Type Items
ERIC Educational Resources Information Center
Thissen-Roe, Anne; Thissen, David
2013-01-01
Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…
Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E
2016-04-01
We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
Raykov, Tenko; Marcoulides, George A
2016-04-01
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].
Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto
2013-06-01
To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models
ERIC Educational Resources Information Center
Lee, Wooyeol; Cho, Sun-Joo
2017-01-01
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
ERIC Educational Resources Information Center
Tay, Louis; Vermunt, Jeroen K.; Wang, Chun
2013-01-01
We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30
ERIC Educational Resources Information Center
Antal, Tamás
2007-01-01
A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
ERIC Educational Resources Information Center
Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.
2016-01-01
Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
1990-01-01
The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Liquid class predictor for liquid handling of complex mixtures
Seglke, Brent W [San Ramon, CA; Lekin, Timothy P [Livermore, CA
2008-12-09
A method of establishing liquid classes of complex mixtures for liquid handling equipment. The mixtures are composed of components and the equipment has equipment parameters. The first step comprises preparing a response curve for the components. The next step comprises using the response curve to prepare a response indicator for the mixtures. The next step comprises deriving a model that relates the components and the mixtures to establish the liquid classes.
Processing of odor mixtures in the zebrafish olfactory bulb.
Tabor, Rico; Yaksi, Emre; Weislogel, Jan-Marek; Friedrich, Rainer W
2004-07-21
Components of odor mixtures often are not perceived individually, suggesting that neural representations of mixtures are not simple combinations of the representations of the components. We studied odor responses to binary mixtures of amino acids and food extracts at different processing stages in the olfactory bulb (OB) of zebrafish. Odor-evoked input to the OB was measured by imaging Ca2+ signals in afferents to olfactory glomeruli. Activity patterns evoked by mixtures were predictable within narrow limits from the component patterns, indicating that mixture interactions in the peripheral olfactory system are weak. OB output neurons, the mitral cells (MCs), were recorded extra- and intracellularly and responded to odors with stimulus-dependent temporal firing rate modulations. Responses to mixtures of amino acids often were dominated by one of the component responses. Responses to mixtures of food extracts, in contrast, were more distinct from both component responses. These results show that mixture interactions can result from processing in the OB. Moreover, our data indicate that mixture interactions in the OB become more pronounced with increasing overlap of input activity patterns evoked by the components. Emerging from these results are rules of mixture interactions that may explain behavioral data and provide a basis for understanding the processing of natural odor stimuli in the OB.
Perceived freedom-responsibility covariation among Cypriot adolescents.
Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R
2008-04-01
Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.
ERIC Educational Resources Information Center
Pohl, Steffi; Gräfe, Linda; Rose, Norman
2014-01-01
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
ERIC Educational Resources Information Center
DeMars, Christine E.
2012-01-01
In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
Estimation of Item Response Theory Parameters in the Presence of Missing Data
ERIC Educational Resources Information Center
Finch, Holmes
2008-01-01
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Examination of Different Item Response Theory Models on Tests Composed of Testlets
ERIC Educational Resources Information Center
Kogar, Esin Yilmaz; Kelecioglu, Hülya
2017-01-01
The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing
ERIC Educational Resources Information Center
Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.
2013-01-01
The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Bi-dimensional acculturation and cultural response set in CES-D among Korean immigrants
Kim, Eunjung; Seo, Kumin; Cain, Kevin C.
2017-01-01
This study examined a cultural response set to positive affect items and depressive symptom items in CES-D among 172 Korean immigrants. Bi-dimensional acculturation approach, which considers maintenance of Korean Orientation and adoption of American Orientation, was utilized. As Korean immigrants increased American Orientation, they tended to score higher on positive affect items, while no changes occurred in depressive symptom items. Korean Orientation was not related to either positive affect items or depressive symptom items. Korean immigrants have response bias toward positive affect items in CES-D, which decreases as they adopt more American Orientation. CES-D lacks cultural equivalence for Korean immigrants. PMID:20701420
An analysis of high school students' perceptions and academic performance in laboratory experiences
NASA Astrophysics Data System (ADS)
Mirchin, Robert Douglas
This research study is an investigation of student-laboratory (i.e., lab) learning based on students' perceptions of experiences using questionnaire data and evidence of their science-laboratory performance based on paper-and-pencil assessments using Maryland-mandated criteria, Montgomery County Public Schools (MCPS) criteria, and published laboratory questions. A 20-item questionnaire consisting of 18 Likert-scale items and 2 open-ended items that addressed what students liked most and least about lab was administered to students before labs were observed. A pre-test and post-test assessing laboratory achievement were administered before and after the laboratory experiences. The three labs observed were: soda distillation, stoichiometry, and separation of a mixture. Five significant results or correlations were found. For soda distillation, there were two positive correlations. Student preference for analyzing data was positively correlated with achievement on the data analysis dimension of the lab rubric. A student preference for using numbers and graphs to analyze data was positively correlated with achievement on the analysis dimension of the lab rubric. For the separating a mixture lab data the following pairs of correlations were significant. Student preference for doing chemistry labs where numbers and graphs were used to analyze data had a positive correlation with writing a correctly worded hypothesis. Student responses that lab experiences help them learn science positively correlated with achievement on the data dimension of the lab rubric. The only negative correlation found related to the first result where students' preference for computers was inversely correlated to their performance on analyzing data on their lab report. Other findings included the following: students like actual experimental work most and the write-up and analysis of a lab the least. It is recommended that lab science instruction be inquiry-based, hands-on, and that students be tested for lab content acquisition. The final conclusion of the study is that students expressed a preference for working in groups and working with materials and equipment as opposed to individual, non-group work and analyzing data.
Vegetable parenting practices scale. Item response modeling analyses
Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom
2015-01-01
Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.
Polak, Marike; de Rooij, Mark; Heiser, Willem J
2012-09-01
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Item response theory analysis of the Pain Self-Efficacy Questionnaire.
Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K
2017-01-01
The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.
2014-01-01
Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item Response Theory Equating Using Bayesian Informative Priors.
ERIC Educational Resources Information Center
de la Torre, Jimmy; Patz, Richard J.
This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Instrument Formatting with Computer Data Entry in Mind.
ERIC Educational Resources Information Center
Boser, Judith A.; And Others
Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
Makovski, Tal; Pertzov, Yoni
2015-01-01
Visual working memory (VWM) and attention have a number of features in common, but despite extensive research it is still unclear how the two interact. Can focused attention improve VWM precision? Can it protect VWM from interference? Here we used a partial-report, continuous-response orientation memory task to examine how attention and interference affect different aspects of VWM and how they interact with one another. Both attention and interference were orthogonally manipulated during the retention interval. Attention was manipulated by presenting informative retro-cues, whereas interference was manipulated by introducing a secondary interfering task. Mixture-model analyses revealed that retro-cues, compared to uninformative cues, improved all aspects of performance: Attention increased recall precision and decreased guessing rate and swap-errors (reporting a wrong item in memory). Similarly, performing a secondary task impaired all aspects of the VWM task. In particular, an interaction between retro-cue and secondary task interference was found primarily for swap-errors. Together these results suggest that both the quantity and quality of VWM representations are sensitive to attention cueing and interference modulations, and they highlight the role of attention in protecting the feature-location associations needed to access the correct items in memory.
Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…
ERIC Educational Resources Information Center
Jones, Douglas H.
The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…
Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10
ERIC Educational Resources Information Center
Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip
2006-01-01
Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model.…
Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R
2015-01-01
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.
2016-01-01
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Systematic Proteomic Approach to Characterize the Impacts of ...
Chemical interactions have posed a big challenge in toxicity characterization and human health risk assessment of environmental mixtures. To characterize the impacts of chemical interactions on protein and cytotoxicity responses to environmental mixtures, we established a systems biology approach integrating proteomics, bioinformatics, statistics, and computational toxicology to measure expression or phosphorylation levels of 21 critical toxicity pathway regulators and 445 downstream proteins in human BEAS-28 cells treated with 4 concentrations of nickel, 2 concentrations each of cadmium and chromium, as well as 12 defined binary and 8 defined ternary mixtures of these metals in vitro. Multivariate statistical analysis and mathematical modeling of the metal-mediated proteomic response patterns showed a high correlation between changes in protein expression or phosphorylation and cellular toxic responses to both individual metals and metal mixtures. Of the identified correlated proteins, only a small set of proteins including HIF-1a is likely to be responsible for selective cytotoxic responses to different metals and metals mixtures. Furthermore, support vector machine learning was utilized to computationally predict protein responses to uncharacterized metal mixtures using experimentally generated protein response profiles corresponding to known metal mixtures. This study provides a novel proteomic approach for characterization and prediction of toxicities of
Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris
2016-04-01
The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items
ERIC Educational Resources Information Center
Cher Wong, Cheow
2015-01-01
Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory
ERIC Educational Resources Information Center
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
2016-01-01
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
ERIC Educational Resources Information Center
Sengul Avsar, Asiye; Tavsancil, Ezel
2017-01-01
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Item Response Theory Models for Wording Effects in Mixed-Format Scales
ERIC Educational Resources Information Center
Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu
2015-01-01
Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Vegetable parenting practices scale: Item response modeling analyses
USDA-ARS?s Scientific Manuscript database
Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
A HO-IRT Based Diagnostic Assessment System with Constructed Response Items
ERIC Educational Resources Information Center
Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei
2011-01-01
The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
ERIC Educational Resources Information Center
Sen, Rohini
2012-01-01
In the last five decades, research on the uses of response time has extended into the field of psychometrics (Schnikpe & Scrams, 1999; van der Linden, 2006; van der Linden, 2007), where interest has centered around the usefulness of response time information in item calibration and person measurement within an item response theory. framework.…
A Primer on the 2- and 3-Parameter Item Response Theory Models.
ERIC Educational Resources Information Center
Thornton, Artist
Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…
ERIC Educational Resources Information Center
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-01-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Enhanced sensitization and elicitation responses caused by mixtures of common fragrance allergens.
Bonefeld, Charlotte Menné; Nielsen, Morten Milek; Rubin, Ingrid Maria Cecilia; Vennegaard, Marie Torp; Dabelsteen, Sally; Gimenéz-Arnau, Elena; Lepoittevin, Jean-Pierre; Geisler, Carsten; Johansen, Jeanne Duus
2011-12-01
Perfumes are complex mixtures composed of many fragrance ingredients, many of which are known to be only weak allergens when tested individually. It is therefore surprising that fragrance contact allergy is one of the most common forms of contact allergy. To investigate whether mixing different fragrance allergens leads to increased sensitization potency, and to examine the difference in the challenge response to one chemical in mice sensitized either with the mixture of allergens or with only the relevant allergen. CBA mice were sensitized with three different concentrations of three fragrance allergens alone or as a mixture. The sensitization and elicitation responses were measured by ear thickness plus infiltration of B and T cells and T cell proliferation in the draining lymph nodes. We found a dose-dependent sensitization response for each of the allergens. An increased response was seen when the allergens were mixed. A stronger challenge response to cinnamal was seen in mice sensitized with the allergen mixture than in mice sensitized with cinnamal alone. Our findings suggest that mixtures of allergens increase the primary response that potentiates the generation of memory T cells in response to the specific allergen. Thus, allergen mixtures enhance both induction and elicitation of contact allergy. © 2011 John Wiley & Sons A/S.
Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M
2017-05-01
In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Sekely, Angela; Taylor, Graeme J; Bagby, R Michael
2018-03-17
The Toronto Structured Interview for Alexithymia (TSIA) was developed to provide a structured interview method for assessing alexithymia. One drawback of this instrument is the amount of time it takes to administer and score. The current study used item response theory (IRT) methods to analyze data from a large heterogeneous multi-language sample (N = 842) to investigate whether a subset of items could be selected to create a short version of the instrument. Samejima's (1969) graded response model was used to fit the item responses. Items providing maximum information were retained in the short model, resulting in the elimination of 12-items from the original 24-items. Despite the 50% reduction in the number of items, 65.22% of the information was retained. Further studies are needed to validate the short version. A short version of the TSIA is potentially of practical value to clinicians and researchers with time constraints. Copyright © 2018. Published by Elsevier B.V.
EVALUATING QUANTITATIVE FORMULAS FOR DOSE-RESPONSE ASSESSMENT OF CHEMICAL MIXTURES
Risk assessment formulas are often distinguished from dose-response models by being rough but necessary. The evaluation of these rough formulas is described here, using the example of mixture risk assessment. Two conditions make the dose-response part of mixture risk assessment d...
Contextual behavior and neural circuits
Lee, Inah; Lee, Choong-Hee
2013-01-01
Animals including humans engage in goal-directed behavior flexibly in response to items and their background, which is called contextual behavior in this review. Although the concept of context has long been studied, there are differences among researchers in defining and experimenting with the concept. The current review aims to provide a categorical framework within which not only the neural mechanisms of contextual information processing but also the contextual behavior can be studied in more concrete ways. For this purpose, we categorize contextual behavior into three subcategories as follows by considering the types of interactions among context, item, and response: contextual response selection, contextual item selection, and contextual item–response selection. Contextual response selection refers to the animal emitting different types of responses to the same item depending on the context in the background. Contextual item selection occurs when there are multiple items that need to be chosen in a contextual manner. Finally, when multiple items and multiple contexts are involved, contextual item–response selection takes place whereby the animal either chooses an item or inhibits such a response depending on item–context paired association. The literature suggests that the rhinal cortical regions and the hippocampal formation play key roles in mnemonically categorizing and recognizing contextual representations and the associated items. In addition, it appears that the fronto-striatal cortical loops in connection with the contextual information-processing areas critically control the flexible deployment of adaptive action sets and motor responses for maximizing goals. We suggest that contextual information processing should be investigated in experimental settings where contextual stimuli and resulting behaviors are clearly defined and measurable, considering the dynamic top-down and bottom-up interactions among the neural systems for contextual behavior. PMID:23675321
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
2017-06-01
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
ERIC Educational Resources Information Center
Arce-Ferrer, Alvaro J.; Bulut, Okan
2017-01-01
This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
ERIC Educational Resources Information Center
Tian, Wei; Cai, Li; Thissen, David; Xin, Tao
2013-01-01
In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.
2011-01-01
Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Generalizability in Item Response Modeling
ERIC Educational Resources Information Center
Briggs, Derek C.; Wilson, Mark
2007-01-01
An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…
Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model
ERIC Educational Resources Information Center
Andrich, David; Humphry, Stephen M.; Marais, Ida
2012-01-01
Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…
Using Response Times for Item Selection in Adaptive Testing
ERIC Educational Resources Information Center
van der Linden, Wim J.
2008-01-01
Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the…
The Influence of Item Response Indecision on the Self-Directed Search
ERIC Educational Resources Information Center
Sampson, James P., Jr.; Shy, Jonathan D.; Hartley, Sarah Lucas; Reardon, Robert C.; Peterson, Gary W.
2009-01-01
Students (N = 247) responded to Self-Directed Search (SDS) per the standard response format and were also instructed to record a question mark (?) for items about which they were uncertain (item response indecision [IRI]). The initial responses of the 114 participants with a (?) were then reversed and a second SDS summary code was obtained and…
Improving measurement of injection drug risk behavior using item response theory.
Janulis, Patrick
2014-03-01
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.
Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen
2004-10-01
To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
Hettinger, Thomas P.; Savoy, Lawrence D.; Frank, Marion E.
2012-01-01
Component signaling in taste mixtures containing both beneficial and dangerous chemicals depends on peripheral processing. Unidirectional mixture suppression of chorda tympani (CT) nerve responses to sucrose by quinine and acid is documented for golden hamsters (Mesocricetus auratus). To investigate mixtures of NaCl and acids, we recorded multifiber responses to 50 mM NaCl, 1 and 3 mM citric acid and acetic acid, 250 μM citric acid, 20 mM acetic acid, and all binary combinations of each acid with NaCl (with and without 30 μM amiloride added). By blocking epithelial Na+ channels, amiloride treatment separated amiloride-sensitive NaCl-specific responses from amiloride-insensitive electrolyte-generalist responses, which encompass all of the CT response to the acids as well as responses to NaCl. Like CT sucrose responses, the amiloride-sensitive NaCl responses were suppressed by as much as 50% by citric acid (P = 0.001). The amiloride-insensitive electrolyte-generalist responses to NaCl + acid mixtures approximated the sum of NaCl and acid component responses. Thus, although NaCl-specific responses to NaCl were weakened in NaCl–acid mixtures, electrolyte-generalist responses to acid and NaCl, which tastes KCl-like, were transmitted undiminished in intensity to the central nervous system. The 2 distinct CT pathways are consistent with known rodent behavioral discriminations. PMID:22451526
40 CFR 1065.720 - Liquefied petroleum gas.
Code of Federal Regulations, 2011 CFR
2011-07-01
... CONTROLS ENGINE-TESTING PROCEDURES Engine Fluids, Test Fuels, Analytical Gases and Other Calibration....720—Test Fuel Specifications for Liquefied Petroleum Gas Item Value Reference procedure 1 Propane... test fuel must not yield a persistent oil ring when you add 0.3 ml of solvent residue mixture to a...
40 CFR 1065.720 - Liquefied petroleum gas.
Code of Federal Regulations, 2013 CFR
2013-07-01
... CONTROLS ENGINE-TESTING PROCEDURES Engine Fluids, Test Fuels, Analytical Gases and Other Calibration....720—Test Fuel Specifications for Liquefied Petroleum Gas Item Value Reference procedure 1 Propane... test fuel must not yield a persistent oil ring when you add 0.3 ml of solvent residue mixture to a...
40 CFR 1065.720 - Liquefied petroleum gas.
Code of Federal Regulations, 2012 CFR
2012-07-01
... CONTROLS ENGINE-TESTING PROCEDURES Engine Fluids, Test Fuels, Analytical Gases and Other Calibration....720—Test Fuel Specifications for Liquefied Petroleum Gas Item Value Reference procedure 1 Propane... test fuel must not yield a persistent oil ring when you add 0.3 ml of solvent residue mixture to a...
ERIC Educational Resources Information Center
Li, Yanmei; Li, Shuhong; Wang, Lin
2010-01-01
Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd
The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
ERIC Educational Resources Information Center
Eignor, Daniel R.; Douglass, James B.
This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
ERIC Educational Resources Information Center
Magnus, Brooke E.; Thissen, David
2017-01-01
Questionnaires that include items eliciting count responses are becoming increasingly common in psychology. This study proposes methodological techniques to overcome some of the challenges associated with analyzing multivariate item response data that exhibit zero inflation, maximum inflation, and heaping at preferred digits. The modeling…
Nested Logit Models for Multiple-Choice Item Response Data
ERIC Educational Resources Information Center
Suh, Youngsuk; Bolt, Daniel M.
2010-01-01
Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all…
The Dutch Identity: A New Tool for the Study of Item Response Models.
ERIC Educational Resources Information Center
Holland, Paul W.
1990-01-01
The Dutch Identity is presented as a useful tool for expressing the basic equations of item response models that relate the manifest probabilities to the item response functions and the latent trait distribution. Ways in which the identity may be exploited are suggested and illustrated. (SLD)
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Sample Invariance of the Structural Equation Model and the Item Response Model: A Case Study.
ERIC Educational Resources Information Center
Breithaupt, Krista; Zumbo, Bruno D.
2002-01-01
Evaluated the sample invariance of item discrimination statistics in a case study using real data, responses of 10 random samples of 500 people to a depression scale. Results lend some support to the hypothesized superiority of a two-parameter item response model over the common form of structural equation modeling, at least when responses are…
A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments
ERIC Educational Resources Information Center
Wolkowitz, Amanda A.; Skorupski, William P.
2013-01-01
When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
Detecting interaction in chemical mixtures can be complicated by differences in the shapes of the dose-response curves of the individual components (e.g. mixtures of full and partial agonists with differing response maxima). We present an analysis scheme where flexible single che...
NASA Astrophysics Data System (ADS)
Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth
The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
Park, Jong Cook; Kim, Kwang Sig
2012-03-01
The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)
ERIC Educational Resources Information Center
Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn
2018-01-01
The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Measuring Student Learning with Item Response Theory
ERIC Educational Resources Information Center
Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.
2008-01-01
We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Higher-Order Item Response Models for Hierarchical Latent Traits
ERIC Educational Resources Information Center
Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming
2013-01-01
Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…
Evaluating Item Fit for Multidimensional Item Response Models
ERIC Educational Resources Information Center
Zhang, Bo; Stone, Clement A.
2008-01-01
This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…
An Item Response Theory Model for Test Bias.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Wilkerson, Keith; McGahan, Joseph R; Stevens, Rick; Williamson, David; Low, Jean
2009-12-01
The goal of this study was to determine whether differential response formats to covariation problems influence corresponding response latencies. The authors provided participants with 3 trials of 16 statements addressing positive and negative relations between freedom and responsibility. The authors framed half of the items around responsibility given freedom and the other half around freedom given responsibility. Response formats comprised true-false, agree-disagree, and yes-no answers as a between-participants factor. Results indicated that the manipulation of response format did not affect latencies. However, latencies differed according to the framing of the items. For items framed around freedom given responsibility, latencies were shorter. In addition, participants were more likely to report a positive relation between freedom and responsibility when items were framed around freedom given responsibility. The authors discuss implications relative to previous research in this area and give recommendations for future research.
Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong
2018-06-01
Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.
ERIC Educational Resources Information Center
Martinez, Michael E.; And Others
Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models
ERIC Educational Resources Information Center
Sulis, Isabella; Toland, Michael D.
2017-01-01
Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…
An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model
ERIC Educational Resources Information Center
Tao, Wei; Cao, Yi
2016-01-01
Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
ERIC Educational Resources Information Center
Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.
2012-01-01
This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
IRTPRO 2.1 for Windows (Item Response Theory for Patient-Reported Outcomes)
ERIC Educational Resources Information Center
Paek, Insu; Han, Kyung T.
2013-01-01
This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data. (Contains 1 figure and 2 notes.)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adebambo, Oluwadamilare A.; Ray, Paul D.; Shea, Damian
Exposure to elevated levels of the toxic metals inorganic arsenic (iAs) and cadmium (Cd) represents a major global health problem. These metals often occur as mixtures in the environment, creating the potential for interactive or synergistic biological effects different from those observed in single exposure conditions. In the present study, environmental mixtures collected from two waste sites in China and comparable mixtures prepared in the laboratory were tested for toxicogenomic response in placental JEG-3 cells. These cells serve as a model for evaluating cellular responses to exposures during pregnancy. One of the mixtures was predominated by iAs and one bymore » Cd. Six gene biomarkers were measured in order to evaluate the effects from the metal mixtures using dose and time-course experiments including: heme oxygenase 1 (HO-1) and metallothionein isoforms (MT1A, MT1F and MT1G) previously shown to be preferentially induced by exposure to either iAs or Cd, and metal transporter genes aquaporin-9 (AQP9) and ATPase, Cu{sup 2+} transporting, beta polypeptide (ATP7B). There was a significant increase in the mRNA expression levels of ATP7B, HO-1, MT1A, MT1F, and MT1G in mixture-treated cells compared to the iAs or Cd only-treated cells. Notably, the genomic responses were observed at concentrations significantly lower than levels found at the environmental collection sites. These data demonstrate that metal mixtures increase the expression of gene biomarkers in placental JEG-3 cells in a synergistic manner. Taken together, the data suggest that toxic metals that co-occur may induce detrimental health effects that are currently underestimated when analyzed as single metals. - Highlights: • Toxicogenomic responses of environmental metal mixtures assessed • Induction of ATP7B, HO-1, MT1A, MT1F and MT1G by metal mixtures observed in placental cells • Higher gene induction in response to metal mixtures versus single metal treatments.« less
The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence.
ERIC Educational Resources Information Center
Ackerman, Terry A.
One of the important underlying assumptions of all item response theory (IRT) models is that of local independence. This assumption requires that the response to an item on a test not be influenced by the response to any other items. This assumption is often taken for granted, with little or no scrutiny of the response process required to answer…
Item response theory scoring and the detection of curvilinear relationships.
Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A
2017-03-01
Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Assessing Construct Validity Using Multidimensional Item Response Theory.
ERIC Educational Resources Information Center
Ackerman, Terry A.
The concept of a user-specified validity sector is discussed. The idea of the validity sector combines the work of M. D. Reckase (1986) and R. Shealy and W. Stout (1991). Reckase developed a methodology to represent an item in a multidimensional latent space as a vector. Item vectors are computed using multidimensional item response theory item…
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2007-01-01
The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…
Item Response Theory and Health Outcomes Measurement in the 21st Century
Hays, Ron D.; Morales, Leo S.; Reise, Steve P.
2006-01-01
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
Predicting the shock compression response of heterogeneous powder mixtures
NASA Astrophysics Data System (ADS)
Fredenburg, D. A.; Thadhani, N. N.
2013-06-01
A model framework for predicting the dynamic shock-compression response of heterogeneous powder mixtures using readily obtained measurements from quasi-static tests is presented. Low-strain-rate compression data are first analyzed to determine the region of the bulk response over which particle rearrangement does not contribute to compaction. This region is then fit to determine the densification modulus of the mixture, σD, an newly defined parameter describing the resistance of the mixture to yielding. The measured densification modulus, reflective of the diverse yielding phenomena that occur at the meso-scale, is implemented into a rate-independent formulation of the P-α model, which is combined with an isobaric equation of state to predict the low and high stress dynamic compression response of heterogeneous powder mixtures. The framework is applied to two metal + metal-oxide (thermite) powder mixtures, and good agreement between the model and experiment is obtained for all mixtures at stresses near and above those required to reach full density. At lower stresses, rate-dependencies of the constituents, and specifically those of the matrix constituent, determine the ability of the model to predict the measured response in the incomplete compaction regime.
Formation of dioxins from incineration of foods found in domestic garbage.
Katami, Takeo; Yasuhara, Akio; Shibamoto, Takayuki
2004-02-15
There has been great concern about the large amounts of garbage produced by domestic households in the modern world. One of the major sources of dioxins (PCDDs, PCDFs, and coplanar PCBs) in the environment is the combustion of domestic waste materials. Exhaust gases from an incinerator, in which mixtures of 67 food items--including fruits, vegetables, pasta, seafoods, meats, and processed foods and seasoned foods--were analyzed for dioxins. Gases collected at the chimney port (9.15 ng/g) contained less total dioxins than those collected at the chamber port (29.1 ng/g). The levels of Cl1-Cl6-PCDDs and Cl1-Cl5-PCDFs were much lower in the gas collected at the chimney port than in the gas collected at the chamber port. The levels of Cl7-Cl8-PCDDs and Cl6-Cl8-PCDFs were higher in the gas collected at the chimney port than in the gas collected at the chamber port. A total of Cl4-Cl8-PCDDs (1.84-3.04 ng/g) comprised over 80% of the total PCDDs formed (2.24-4.00 ng/g). Total PCDFs (16.2-22.6 ng/g) comprised 78-86% of the total dioxins formed (18.9-29.1 ng/g). The PCDFs formed in the greatest amounts were M1CDFs (9.68-10.7 ng/g). Mixtures of commonly consumed food items produced ppb levels of total dioxins in exhaust gases upon combustion, suggesting that incineration of domestic food wastes is one of the sources of dioxins in the environment. A mixture containing some seasoned foods, such as mayonnaise spread on bread, produced more dioxins (29.1 ng/g) than a mixture without seasoned foods did (18.9 ng/g).
Mixture toxicity revisited from a toxicogenomic perspective.
Altenburger, Rolf; Scholz, Stefan; Schmitt-Jansen, Mechthild; Busch, Wibke; Escher, Beate I
2012-03-06
The advent of new genomic techniques has raised expectations that central questions of mixture toxicology such as for mechanisms of low dose interactions can now be answered. This review provides an overview on experimental studies from the past decade that address diagnostic and/or mechanistic questions regarding the combined effects of chemical mixtures using toxicogenomic techniques. From 2002 to 2011, 41 studies were published with a focus on mixture toxicity assessment. Primarily multiplexed quantification of gene transcripts was performed, though metabolomic and proteomic analysis of joint exposures have also been undertaken. It is now standard to explicitly state criteria for selecting concentrations and provide insight into data transformation and statistical treatment with respect to minimizing sources of undue variability. Bioinformatic analysis of toxicogenomic data, by contrast, is still a field with diverse and rapidly evolving tools. The reported combined effect assessments are discussed in the light of established toxicological dose-response and mixture toxicity models. Receptor-based assays seem to be the most advanced toward establishing quantitative relationships between exposure and biological responses. Often transcriptomic responses are discussed based on the presence or absence of signals, where the interpretation may remain ambiguous due to methodological problems. The majority of mixture studies design their studies to compare the recorded mixture outcome against responses for individual components only. This stands in stark contrast to our existing understanding of joint biological activity at the levels of chemical target interactions and apical combined effects. By joining established mixture effect models with toxicokinetic and -dynamic thinking, we suggest a conceptual framework that may help to overcome the current limitation of providing mainly anecdotal evidence on mixture effects. To achieve this we suggest (i) to design studies to establish quantitative relationships between dose and time dependency of responses and (ii) to adopt mixture toxicity models. Moreover, (iii) utilization of novel bioinformatic tools and (iv) stress response concepts could be productive to translate multiple responses into hypotheses on the relationships between general stress and specific toxicity reactions of organisms.
The effect of response modality on immediate serial recall in dementia of the Alzheimer type.
Macé, Anne-Laure; Ergis, Anne-Marie; Caza, Nicole
2012-09-01
Contrary to traditional models of verbal short-term memory (STM), psycholinguistic accounts assume that temporary retention of verbal materials is an intrinsic property of word processing. Therefore, memory performance will depend on the nature of the STM tasks, which vary according to the linguistic representations they engage. The aim of this study was to explore the effect of response modality on verbal STM performance in individuals with dementia of the Alzheimer Type (DAT), and its relationship with the patients' word-processing deficits. Twenty individuals with mild DAT and 20 controls were tested on an immediate serial recall (ISR) task using the same items across two response modalities (oral and picture pointing) and completed a detailed language assessment. When scoring of ISR performance was based on item memory regardless of item order, a response modality effect was found for all participants, indicating that they recalled more items with picture pointing than with oral response. However, this effect was less marked in patients than in controls, resulting in an interaction. Interestingly, when recall of both item and order was considered, results indicated similar performance between response modalities in controls, whereas performance was worse for pointing than for oral response in patients. Picture-naming performance was also reduced in patients relative to controls. However, in the word-to-picture matching task, a similar pattern of responses was found between groups for incorrectly named pictures of the same items. The finding of a response modality effect in item memory for all participants is compatible with the assumption that semantic influences are greater in picture pointing than in oral response, as predicted by psycholinguistic models. Furthermore, patients' performance was modulated by their word-processing deficits, showing a reduced advantage relative to controls. Overall, the response modality effect observed in this study for item memory suggests that verbal STM performance is intrinsically linked with word processing capacities in both healthy controls and individuals with mild DAT, supporting psycholinguistic models of STM.
ERIC Educational Resources Information Center
Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.
2012-01-01
This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
ERIC Educational Resources Information Center
Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.
Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
A Polytomous Item Response Theory Analysis of Social Physique Anxiety Scale
ERIC Educational Resources Information Center
Fletcher, Richard B.; Crocker, Peter
2014-01-01
The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…
Item Response Theory at Subject- and Group-Level. Research Report 90-1.
ERIC Educational Resources Information Center
Tobi, Hilde
This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…
ERIC Educational Resources Information Center
Schilling, Stephen G.
2007-01-01
In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
ERIC Educational Resources Information Center
Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald
2017-01-01
Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…
ERIC Educational Resources Information Center
Crino, Michael D.; And Others
1985-01-01
The random response technique was compared to a direct questionnaire, administered to college students, to investigate whether or not the responses predicted the social desirability of the item. Results suggest support for the hypothesis. A 33-item version of the Marlowe-Crowne Social Desirability Scale which was used is included. (GDC)
Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea
2016-12-01
To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.
Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi
2018-04-17
The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.
Development and validation of an item response theory-based Social Responsiveness Scale short form.
Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T
2017-09-01
Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L
2015-12-01
Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).
Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian
2017-07-01
The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Analyzing force concept inventory with item response theory
NASA Astrophysics Data System (ADS)
Wang, Jing; Bao, Lei
2010-10-01
Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Item Response Theory Models for Performance Decline during Testing
ERIC Educational Resources Information Center
Jin, Kuan-Yu; Wang, Wen-Chung
2014-01-01
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.
ERIC Educational Resources Information Center
Kaskowitz, Gary S.; De Ayala, R. J.
2001-01-01
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
ERIC Educational Resources Information Center
Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.
2011-01-01
Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…
Doherty-Torstrick, Emily R; Walton, Kate E; Barsky, Arthur J; Fallon, Brian A
2016-10-01
The DSM-5 diagnosis of illness anxiety disorder adds avoidance as a component of a behavioral response to illness fears - one that was not present in prior DSM criteria of hypochondriasis. However, maladaptive avoidance as a necessary or useful criterion has yet to be empirically supported. 195 individuals meeting DSM-IV criteria for hypochondriasis based on structured interview completed a variety of self-report and clinician-administered assessments. Data on maladaptive avoidance were obtained using the six-item subscale of the clinician-administered Hypochondriasis - Yale Brown Obsessive Compulsive Scale - Modified. To determine if avoidance emerged as a useful indicator in hypochondriasis, we compared the relative fit of continuous latent trait, categorical latent class, and hybrid factor mixture models. A two-class factor mixture model fit the data best, with Class 1 (n=147) exhibiting a greater level of severity of avoidance than Class 2 (n=48). The more severely avoidant group was found to have higher levels of hypochondriacal symptom severity, functional impairment, and anxiety, as well as lower quality of life. These results suggest that avoidance may be a valid behavioral construct and a useful component of the new diagnostic criteria of illness anxiety in the DSM-5, with implications for somatic symptom disorder. Copyright © 2016 Elsevier Inc. All rights reserved.
Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd
2017-01-01
The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Shedden-Mora, Meike C.; Löwe, Bernd
2017-01-01
Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Do large-scale assessments measure students' ability to integrate scientific knowledge?
NASA Astrophysics Data System (ADS)
Lee, Hee-Sun
2010-03-01
Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Accounting for Local Dependence with the Rasch Model: The Paradox of Information Increase.
Andrich, David
Test theories imply statistical, local independence. Where local independence is violated, models of modern test theory that account for it have been proposed. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation between two items in the dichotomous Rasch model, this paper derives three related implications. First, it formalises how the polytomous Rasch model for an item constituted by summing the scores of the dependent items absorbs the dependence in its threshold structure. Second, it shows that as a consequence the unit when the dependence is accounted for is not the same as if the items had no response dependence. Third, it explains the paradox, known, but not explained in the literature, that the greater the dependence of the constituent items the greater the apparent information in the constituted polytomous item when it should provide less information.
Sun, Sol Z; Fidalgo, Celia; Barense, Morgan D; Lee, Andy C H; Cant, Jonathan S; Ferber, Susanne
2017-11-01
Interference disrupts information processing across many timescales, from immediate perception to memory over short and long durations. The widely held similarity assumption states that as similarity between interfering information and memory contents increases, so too does the degree of impairment. However, information is lost from memory in different ways. For instance, studied content might be erased in an all-or-nothing manner. Alternatively, information may be retained but the precision might be degraded or blurred. Here, we asked whether the similarity of interfering information to memory contents might differentially impact these 2 aspects of forgetting. Observers studied colored images of real-world objects, each followed by a stream of interfering objects. Across 4 experiments, we manipulated the similarity between the studied object and the interfering objects in circular color space. After interference, memory for object color was tested continuously on a color wheel, which in combination with mixture modeling, allowed for estimation of how erasing and blurring differentially contribute to forgetting. In contrast to the similarity assumption, we show that highly dissimilar interfering items caused the greatest increase in random guess responses, suggesting a greater frequency of memory erasure (Experiments 1-3). Moreover, we found that observers were generally able to resist interference from highly similar items, perhaps through surround suppression (Experiments 1 and 4). Finally, we report that interference from items of intermediate similarity tended to blur or decrease memory precision (Experiments 3 and 4). These results reveal that the nature of visual similarity can differentially alter how information is lost from memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Kleinke, David J.
Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
Person Response Functions and the Definition of Units in the Social Sciences
ERIC Educational Resources Information Center
Engelhard, George, Jr.; Perkins, Aminah F.
2011-01-01
Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2016-01-01
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
ERIC Educational Resources Information Center
Fu, Jianbin
2016-01-01
The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
ERIC Educational Resources Information Center
Tsutakawa, Robert K.; Lin, Hsin Ying
Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model
ERIC Educational Resources Information Center
Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim
2017-01-01
We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
Mallinckrodt, Brent; Tekie, Yacob T
2016-11-01
The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
Khorramdel, Lale; von Davier, Matthias
2014-01-01
This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.
Prisciandaro, James J; Tolliver, Bryan K
2016-11-15
The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10
ERIC Educational Resources Information Center
Livingston, Samuel A.; Dorans, Neil J.
2004-01-01
This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
ERIC Educational Resources Information Center
Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia
2016-01-01
The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…
ERIC Educational Resources Information Center
Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.
2016-01-01
Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
ERIC Educational Resources Information Center
Swygert, Kimberly A.
In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
Item response theory in personality assessment: a demonstration using the MMPI-2 depression scale.
Childs, R A; Dahlstrom, W G; Kemp, S M; Panter, A T
2000-03-01
Item response theory (IRT) analyses have, over the past 3 decades, added much to our understanding of the relationships among and characteristics of test items, as revealed in examinees response patterns. Assessment instruments used outside the educational context have only infrequently been analyzed using IRT, however. This study demonstrates the relevance of IRT to personality data through analyses of Scale 2 (the Depression Scale) on the revised Minnesota Multiphasic Personality Inventory (MMPI-2). A rich set of hypotheses regarding the items on this scale, including contrasts among the Harris-Lingoes and Wiener-Harmon subscales and differences in the items measurement characteristics for men and women, are investigated through the IRT analyses.
Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S
2018-05-01
To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Reliability and validity of a short form household food security scale in a Caribbean community.
Gulliford, Martin C; Mahabir, Deepak; Rocke, Brian
2004-06-16
We evaluated the reliability and validity of the short form household food security scale in a different setting from the one in which it was developed. The scale was interview administered to 531 subjects from 286 households in north central Trinidad in Trinidad and Tobago, West Indies. We evaluated the six items by fitting item response theory models to estimate item thresholds, estimating agreement among respondents in the same households and estimating the slope index of income-related inequality (SII) after adjusting for age, sex and ethnicity. Item-score correlations ranged from 0.52 to 0.79 and Cronbach's alpha was 0.87. Item responses gave within-household correlation coefficients ranging from 0.70 to 0.78. Estimated item thresholds (standard errors) from the Rasch model ranged from -2.027 (0.063) for the 'balanced meal' item to 2.251 (0.116) for the 'hungry' item. The 'balanced meal' item had the lowest threshold in each ethnic group even though there was evidence of differential functioning for this item by ethnicity. Relative thresholds of other items were generally consistent with US data. Estimation of the SII, comparing those at the bottom with those at the top of the income scale, gave relative odds for an affirmative response of 3.77 (95% confidence interval 1.40 to 10.2) for the lowest severity item, and 20.8 (2.67 to 162.5) for highest severity item. Food insecurity was associated with reduced consumption of green vegetables after additionally adjusting for income and education (0.52, 0.28 to 0.96). The household food security scale gives reliable and valid responses in this setting. Differing relative item thresholds compared with US data do not require alteration to the cut-points for classification of 'food insecurity without hunger' or 'food insecurity with hunger'. The data provide further evidence that re-evaluation of the 'balanced meal' item is required.
Computerized Adaptive Testing with Item Clones. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; van der Linden, Wim J.
To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard
2017-01-01
We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Development of the Contact Lens User Experience: CLUE Scales
Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.
2016-01-01
ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257
Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander
2014-04-01
An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A.
2017-01-01
Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales. PMID:28289560
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.
Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D
2016-12-01
The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
2013-01-01
Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M
2013-03-04
Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Measuring the quality of life in hypertension according to Item Response Theory
Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia
2017-01-01
ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
ERIC Educational Resources Information Center
Samejima, Fumiko
In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
ERIC Educational Resources Information Center
Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.
2006-01-01
Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…
Honeybees Learn Odour Mixtures via a Selection of Key Odorants
Reinhard, Judith; Sinclair, Michael; Srinivasan, Mandyam V.; Claudianos, Charles
2010-01-01
Background The honeybee has to detect, process and learn numerous complex odours from her natural environment on a daily basis. Most of these odours are floral scents, which are mixtures of dozens of different odorants. To date, it is still unclear how the bee brain unravels the complex information contained in scent mixtures. Methodology/Principal Findings This study investigates learning of complex odour mixtures in honeybees using a simple olfactory conditioning procedure, the Proboscis-Extension-Reflex (PER) paradigm. Restrained honeybees were trained to three scent mixtures composed of 14 floral odorants each, and then tested with the individual odorants of each mixture. Bees did not respond to all odorants of a mixture equally: They responded well to a selection of key odorants, which were unique for each of the three scent mixtures. Bees showed less or very little response to the other odorants of the mixtures. The bees' response to mixtures composed of only the key odorants was as good as to the original mixtures of 14 odorants. A mixture composed of the other, non-key-odorants elicited a significantly lower response. Neither an odorant's volatility or molecular structure, nor learning efficiencies for individual odorants affected whether an odorant became a key odorant for a particular mixture. Odorant concentration had a positive effect, with odorants at high concentration likely to become key odorants. Conclusions/Significance Our study suggests that the brain processes complex scent mixtures by predominantly learning information from selected key odorants. Our observations on key odorant learning lend significant support to previous work on olfactory learning and mixture processing in honeybees. PMID:20161714
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
2011-01-01
We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T
2016-03-01
Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M
2014-07-01
The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Maindal, Helle Terkildsen; Sokolowski, Ineta; Vedsted, Peter
2009-06-29
The Patient Activation Measure (PAM) is a measure that assesses patient knowledge, skill, and confidence for self-management. This study validates the Danish translation of the 13-item Patient Activation Measure (PAM13) in a Danish population with dysglycaemia. 358 people with screen-detected dysglycaemia participating in a primary care health education study responded to PAM13. The PAM13 was translated into Danish by a standardised forward-backward translation. Data quality was assessed by mean, median, item response, missing values, floor and ceiling effects, internal consistency (Cronbach's alpha and average inter-item correlation) and item-rest correlations. Scale properties were assessed by Rasch Rating Scale models. The item response was high with a small number of missing values (0.8-4.2%). Floor effect was small (range 0.6-3.6%), but the ceiling effect was above 15% for all items (range 18.6-62.7%). The alpha-coefficient was 0.89 and the average inter-item correlation 0.38. The Danish version formed a unidimensional, probabilistic Guttman-like scale explaining 43.2% of the variance. We did however, find a different item sequence compared to the original scale. A Danish version of PAM13 with acceptable validity and reliability is now available. Further development should focus on single items, response categories in relation to ceiling effects and further validation of reproducibility and responsiveness.
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C
2016-03-12
Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Sequential Computerized Mastery Tests--Three Simulation Studies
ERIC Educational Resources Information Center
Wiberg, Marie
2006-01-01
A simulation study of a sequential computerized mastery test is carried out with items modeled with the 3 parameter logistic item response theory model. The examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The…
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data.
Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L J; Maris, Gunter
2016-01-01
We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two 'one-process' models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a 'two-process' model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses.
What can we learn from PISA?: Investigating PISA's approach to scientific literacy
NASA Astrophysics Data System (ADS)
Schwab, Cheryl Jean
This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-01-01
Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica
2018-02-01
The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Ceglie, Francesco Giovanni; Bustamante, Maria Angeles; Ben Amara, Mouna; Tittarelli, Fabio
2015-01-01
Peat replacement is an increasing demand in containerized and transplant production, due to the environmental constraints associated to peat use. However, despite the wide information concerning the use of alternative materials as substrates, it is very complex to establish the best materials and mixtures. This work evaluates the use of mixture design and surface response methodology in a peat substitution experiment using two alternative materials (green compost and palm fibre trunk waste) for transplant production of tomato (Lycopersicon esculentum Mill.); melon, (Cucumis melo L.); and lettuce (Lactuca sativa L.) in organic farming conditions. In general, the substrates showed suitable properties for their use in seedling production, showing the best plant response the mixture of 20% green compost, 39% palm fibre and 31% peat. The mixture design and applied response surface methodology has shown to be an useful approach to optimize substrate formulations in peat substitution experiments to standardize plant responses. PMID:26070163
ERIC Educational Resources Information Center
Arffman, Inga
2016-01-01
Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
ERIC Educational Resources Information Center
Cao, Yi; Lu, Ru; Tao, Wei
2014-01-01
The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
ERIC Educational Resources Information Center
Ferrando, Pere J.
2004-01-01
This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
ERIC Educational Resources Information Center
Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.
2006-01-01
This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…
The Structure of the Narcissistic Personality Inventory With Binary and Rating Scale Items.
Boldero, Jennifer M; Bell, Richard C; Davies, Richard C
2015-01-01
Narcissistic Personality Inventory (NPI) items typically have a forced-choice format, comprising a narcissistic and a nonnarcissistic statement. Recently, some have presented the narcissistic statements and asked individuals to either indicate whether they agree or disagree that the statements are self-descriptive (i.e., a binary response format) or to rate the extent to which they agree or disagree that these statements are self-descriptive on a Likert scale (i.e., a rating response format). The current research demonstrates that when NPI items have a binary or a rating response format, the scale has a bifactor structure (i.e., the items load on a general factor and on 6 specific group factors). Indexes of factor strength suggest that the data are unidimensional enough for the NPI's general factor to be considered a measure of a narcissism latent trait. However, the rating item general factor assessed more narcissism components than the binary item one. The positive correlations of the NPI's general factor, assessed when items have a rating response format, were moderate with self-esteem, strong with a measure of narcissistic grandiosity, and weak with 2 measures of narcissistic vulnerability. Together, the results suggest that using a rating format for items enhances the information provided by the NPI.
Accurate and scalable social recommendation using mixed-membership stochastic block models.
Godoy-Lorite, Antonia; Guimerà, Roger; Moore, Cristopher; Sales-Pardo, Marta
2016-12-13
With increasing amounts of information available, modeling and predicting user preferences-for books or articles, for example-are becoming more important. We present a collaborative filtering model, with an associated scalable algorithm, that makes accurate predictions of users' ratings. Like previous approaches, we assume that there are groups of users and of items and that the rating a user gives an item is determined by their respective group memberships. However, we allow each user and each item to belong simultaneously to mixtures of different groups and, unlike many popular approaches such as matrix factorization, we do not assume that users in each group prefer a single group of items. In particular, we do not assume that ratings depend linearly on a measure of similarity, but allow probability distributions of ratings to depend freely on the user's and item's groups. The resulting overlapping groups and predicted ratings can be inferred with an expectation-maximization algorithm whose running time scales linearly with the number of observed ratings. Our approach enables us to predict user preferences in large datasets and is considerably more accurate than the current algorithms for such large datasets.
Dose addition is the most frequently-used component-based approach for predicting dose response for a mixture of toxicologically-similar chemicals and for statistical evaluation of whether the mixture response is consistent with dose additivity and therefore predictable from the ...
Chemical interactions have posed a big challenge in toxicity characterization and human health risk assessment of environmental mixtures. To characterize the impacts of chemical interactions on protein and cytotoxicity responses to environmental mixtures, we established a systems...
Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis
2015-01-01
Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Practical Guide to Conducting an Item Response Theory Analysis
ERIC Educational Resources Information Center
Toland, Michael D.
2014-01-01
Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…
Analyzing Longitudinal Item Response Data via the Pairwise Fitting Method
ERIC Educational Resources Information Center
Fu, Zhi-Hui; Tao, Jian; Shi, Ning-Zhong; Zhang, Ming; Lin, Nan
2011-01-01
Multidimensional item response theory (MIRT) models can be applied to longitudinal educational surveys where a group of individuals are administered different tests over time with some common items. However, computational problems typically arise as the dimension of the latent variables increases. This is especially true when the latent variable…
Item Construction and Psychometric Models Appropriate for Constructed Responses
1991-08-01
which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory
ERIC Educational Resources Information Center
Lee, Won-Chan
2010-01-01
In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…
Robust Estimation of Latent Ability in Item Response Models
ERIC Educational Resources Information Center
Schuster, Christof; Yuan, Ke-Hai
2011-01-01
Because of response disturbances such as guessing, cheating, or carelessness, item response models often can only approximate the "true" individual response probabilities. As a consequence, maximum-likelihood estimates of ability will be biased. Typically, the nature and extent to which response disturbances are present is unknown, and, therefore,…
Theoretical and Empirical Comparisons between Two Models for Continuous Item Responses.
ERIC Educational Resources Information Center
Ferrando, Pere J.
2002-01-01
Analyzed the relations between two continuous response models intended for typical response items: the linear congeneric model and Samejima's continuous response model (CRM). Illustrated the relations described using an empirical example and assessed the relations through a simulation study. (SLD)
Rhodes, Matthew G; Jacoby, Larry L
2007-03-01
The authors examined whether participants can shift their criterion for recognition decisions in response to the probability that an item was previously studied. Participants in 3 experiments were given recognition tests in which the probability that an item was studied was correlated with its location during the test. Results from all 3 experiments indicated that participants' response criteria were sensitive to the probability that an item was previously studied and that shifts in criterion were robust. In addition, awareness of the bases for criterion shifts and feedback on performance were key factors contributing to the observed shifts in decision criteria. These data suggest that decision processes can operate in a dynamic fashion, shifting from item to item.
ERIC Educational Resources Information Center
Rudner, Lawrence
This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…
Huang, Yueng-Hsiang; Lee, Jin; Chen, Zhuo; Perry, MacKenna; Cheung, Janelle H; Wang, Mo
2017-06-01
Zohar and Luria's (2005) safety climate (SC) scale, measuring organization- and group- level SC each with 16 items, is widely used in research and practice. To improve the utility of the SC scale, we shortened the original full-length SC scales. Item response theory (IRT) analysis was conducted using a sample of 29,179 frontline workers from various industries. Based on graded response models, we shortened the original scales in two ways: (1) selecting items with above-average discriminating ability (i.e. offering more than 6.25% of the original total scale information), resulting in 8-item organization-level and 11-item group-level SC scales; and (2) selecting the most informative items that together retain at least 30% of original scale information, resulting in 4-item organization-level and 4-item group-level SC scales. All four shortened scales had acceptable reliability (≥0.89) and high correlations (≥0.95) with the original scale scores. The shortened scales will be valuable for academic research and practical survey implementation in improving occupational safety. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J
2011-09-01
In three experiments search termination decisions were examined as a function of response type (correct vs. incorrect) and confidence. It was found that the time between the last retrieved item and the decision to terminate search (exit latency) was related to the type of response and confidence in the last item retrieved. Participants were willing to search longer when the last retrieved item was a correct item vs. an incorrect item and when the confidence was high in the last retrieved item. It was also found that the number of errors retrieved during the recall period was related to search termination decisions such that the more errors retrieved, the more likely participants were to terminate the search. Finally, it was found that knowledge of overall search set size influenced the time needed to search for items, but did not influence search termination decisions. Copyright © 2011 Elsevier B.V. All rights reserved.
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
2015-06-01
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
ERIC Educational Resources Information Center
Ding, Kele; Olds, R. Scott; Thombs, Dennis L.
2009-01-01
This retrospective case study assessed the influence of item non-response error on subsequent response to questionnaire items assessing adolescent alcohol and marijuana use. Post-hoc analyses were conducted on survey results obtained from 4,371 7th to 12th grade students in Ohio in 2005. A skip pattern design in a conventional questionnaire…
ERIC Educational Resources Information Center
Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.
2010-01-01
The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…
Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU-QOL) instrument.
Gorecki, C; Lamping, D L; Nixon, J; Brown, J M; Cano, S
2012-04-01
Pretesting is key in the development of patient-reported outcome (PRO) instruments. We describe a mixed-methods approach based on interviews and Rasch measurement methods in the pretesting of the Pressure Ulcer Quality of Life (PU-QOL) instrument. We used cognitive interviews to pretest the PU-QOL in 35 patients with pressure ulcers with the view to identifying problematic items, followed by Rasch analysis to examine response options, appropriateness of the item series and biases due to question ordering (item fit). We then compared findings in an interactive and iterative process to identify potential strengths and weaknesses of PU-QOL items, and guide decision-making about further revisions to items and design/layout. Although cognitive interviews largely supported items, they highlighted problems with layout, response options and comprehension. Findings from the Rasch analysis identified problems with response options through reversed thresholds. The use of a mixed-methods approach in pretesting the PU-QOL instrument proved beneficial for identifying problems with scale layout, response options and framing/wording of items. Rasch measurement methods are a useful addition to standard qualitative pretesting for evaluating strengths and weaknesses of early stage PRO instruments.
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.
Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland
2014-04-01
To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.
2003-01-01
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Humphries, Mark D; Gurney, Kevin
2012-07-01
Deep brain stimulation (DBS) is a remarkably successful treatment for the motor symptoms of Parkinson's disease. High-frequency stimulation of the subthalamic nucleus (STN) within the basal ganglia is a main clinical target, but the physiological mechanisms of therapeutic STN DBS at the cellular and network level are unclear. We set out to begin to address the hypothesis that a mixture of responses in the basal ganglia output nuclei, combining regularized firing and inhibition, is a key contributor to the effectiveness of STN DBS. We used our computational model of the complete basal ganglia circuit to show how such a mixture of responses in basal ganglia output naturally arises from the network effects of STN DBS. We replicated the diversification of responses recorded in a primate STN DBS study to show that the model's predicted mixture of responses is consistent with therapeutic STN DBS. We then showed how this 'mixture of response' perspective suggests new ideas for DBS mechanisms: first, that the therapeutic frequency of STN DBS is above 100 Hz because the diversification of responses exhibits a step change above this frequency; and second, that optogenetic models of direct STN stimulation during DBS have proven therapeutically ineffective because they do not replicate the mixture of basal ganglia output responses evoked by electrical DBS. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.
Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra
2012-03-13
Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Converging evidence for control of color-word Stroop interference at the item level.
Bugg, Julie M; Hutchison, Keith A
2013-04-01
Prior studies have shown that cognitive control is implemented at the list and context levels in the color-word Stroop task. At first blush, the finding that Stroop interference is reduced for mostly incongruent items as compared with mostly congruent items (i.e., the item-specific proportion congruence [ISPC] effect) appears to provide evidence for yet a third level of control, which modulates word reading at the item level. However, evidence to date favors the view that ISPC effects reflect the rapid prediction of high-contingency responses and not item-specific control. In Experiment 1, we first show that an ISPC effect is obtained when the relevant dimension (i.e., color) signals proportion congruency, a problematic pattern for theories based on differential response contingencies. In Experiment 2, we replicate and extend this pattern by showing that item-specific control settings transfer to new stimuli, ruling out alternative frequency-based accounts. In Experiment 3, we revert to the traditional design in which the irrelevant dimension (i.e., word) signals proportion congruency. Evidence for item-specific control, including transfer of the ISPC effect to new stimuli, is apparent when 4-item sets are employed but not when 2-item sets are employed. We attribute this pattern to the absence of high-contingency responses on incongruent trials in the 4-item set. These novel findings provide converging evidence for reactive control of color-word Stroop interference at the item level, reveal theoretically important factors that modulate reliance on item-specific control versus contingency learning, and suggest an update to the item-specific control account (Bugg, Jacoby, & Chanani, 2011).
Differential item functioning magnitude and impact measures from item response theory models.
Kleinman, Marjorie; Teresi, Jeanne A
2016-01-01
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.
This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A
2018-04-01
A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
2016-12-01
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Adebambo, Oluwadamilare A.; Ray, Paul D.; Shea, Damian; Fry, Rebecca C.
2016-01-01
Exposure to elevated levels of the toxic metals inorganic arsenic (iAs) and cadmium (Cd) represents a major global health problem. These metals often occur as mixtures in the environment, creating the potential for interactive or synergistic biological effects different from those observed in single exposure conditions. In the present study, environmental mixtures collected from two waste sites in China and comparable mixtures prepared in the laboratory were tested for toxicogenomic response in placental JEG-3 cells. These cells serve as a model for evaluating cellular responses to exposures during pregnancy. One of the mixtures was predominated by iAs and one by Cd. Six gene biomarkers were measured in order to evaluate the effects from the metals mixtures using dose and time-course experiments including: heme oxygenase 1 (HO-1) and metallothionein isoforms (MT1A, MT1F and MT1G) previously shown to be preferentially induced by exposure to either iAs or Cd, and metal transporter genes aquaporin-9 (AQP9) and ATPase, Cu2+ transporting, beta polypeptide (ATP7B). There was a significant increase in the mRNA expression levels of ATP7B, HO-1, MT1A, MT1F, and MT1G in mixture-treated cells compared to the iAs or Cd only-treated cells. Notably, the genomic responses were observed at concentrations significantly lower than levels found at the environmental collection sites. These data demonstrate that metal mixtures increase the expression of gene biomarkers in placental JEG-3 cells in a synergistic manner. Taken together, the data suggest that toxic metals that co-occur may induce detrimental health effects that are currently underestimated when analyzed as single metals. PMID:26472158
THE CARCINOGENIC RESPONSE TO A MIXTURE OF DRINKING WATER DISINFECTION BY -PRODUCTS (DBP) W AS LESS THAN ADDITIVE.
Current default risk assessments for chemical mixtures assume additivity of carcinogenic effects but this may under or over represent the actual biological res...
Chemical mixtures in the environment are often the result of a dynamic process. When dose-response data are available on random samples throughout the process, equivalence testing can be used to determine whether the mixtures are sufficiently similar based on a pre-specified biol...
Using Data Augmentation and Markov Chain Monte Carlo for the Estimation of Unfolding Response Models
ERIC Educational Resources Information Center
Johnson, Matthew S.; Junker, Brian W.
2003-01-01
Unfolding response models, a class of item response theory (IRT) models that assume a unimodal item response function (IRF), are often used for the measurement of attitudes. Verhelst and Verstralen (1993)and Andrich and Luo (1993) independently developed unfolding response models by relating the observed responses to a more common monotone IRT…
A Study of Bayesian Estimation and Comparison of Response Time Models in Item Response Theory
ERIC Educational Resources Information Center
Suh, Hongwook
2010-01-01
Response time has been regarded as an important source for investigating the relationship between human performance and response speed. It is important to examine the relationship between response time and item characteristics, especially in the perspective of the relationship between response time and various factors that affect examinee's…
Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju
2017-01-01
The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Fitting measurement models to vocational interest data: are dominance models ideal?
Tay, Louis; Drasgow, Fritz; Rounds, James; Williams, Bruce A
2009-09-01
In this study, the authors examined the item response process underlying 3 vocational interest inventories: the Occupational Preference Inventory (C.-P. Deng, P. I. Armstrong, & J. Rounds, 2007), the Interest Profiler (J. Rounds, T. Smith, L. Hubert, P. Lewis, & D. Rivkin, 1999; J. Rounds, C. M. Walker, et al., 1999), and the Interest Finder (J. E. Wall & H. E. Baker, 1997; J. E. Wall, L. L. Wise, & H. E. Baker, 1996). Item response theory (IRT) dominance models, such as the 2-parameter and 3-parameter logistic models, assume that item response functions (IRFs) are monotonically increasing as the latent trait increases. In contrast, IRT ideal point models, such as the generalized graded unfolding model, have IRFs that peak where the latent trait matches the item. Ideal point models are expected to fit better because vocational interest inventories ask about typical behavior, as opposed to requiring maximal performance. Results show that across all 3 interest inventories, the ideal point model provided better descriptions of the response process. The importance of specifying the correct item response model for precise measurement is discussed. In particular, scores computed by a dominance model were shown to be sometimes illogical: individuals endorsing mostly realistic or mostly social items were given similar scores, whereas scores based on an ideal point model were sensitive to which type of items respondents endorsed.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-05-25
The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
2018-07-01
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya
2009-06-01
The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Update on the Child's Challenging Behaviour Scale following evaluation using Rasch analysis.
Bourke-Taylor, H M; Pallant, J F; Law, M
2014-03-01
The Child's Challenging Behaviour Scale (CCBS) was designed to measure a mother's rating of her child's challenging behaviours. The CCBS was initially developed for mothers of school-aged children with developmental disability and has previously been shown to have good psychometric properties using classical test theory techniques. The aim of this study was to use Rasch analysis to fully evaluate all aspects of the scale, including response format, item fit, dimensionality and targeting. The sample consisted of 152 mothers of a school-aged child (aged 5-18 years) with a disability. Mothers were recruited via websites and mail-out newsletters through not-for-profit organizations that supported families with disabilities. Respondents completed a survey which included the 11 items of the CCBS. Rasch analysis was conducted on these responses using the RUMM2030 package. Rasch analysis of the CCBS revealed serious threshold disordering for nine of the 11 items, suggesting problems with the 5-point response format used for the scale. The neutral midpoint of the response format was subsequently removed to create a 4-point scale. High levels of local dependency were detected among two pairs of items, resulting in the removal of two items (item 7 and item 1). The final nine-item version of the scale (CCBS Version 2) was unidimensional, well targeted, showed good fit to the Rasch model, and strong internal consistency. To achieve fit to the Rasch model it was necessary to make two modifications to the CCBS scale. The resulting nine-item scale with a 4-point response format showed excellent psychometric properties, supporting its internal validity. © 2013 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Chiu, Tina
This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift
ERIC Educational Resources Information Center
Guo, Rui; Zheng, Yi; Chang, Hua-Hua
2015-01-01
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Optimal Item Selection with Credentialing Examinations.
ERIC Educational Resources Information Center
Hambleton, Ronald K.; And Others
The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…
ERIC Educational Resources Information Center
Aaro, Leif E.; Breivik, Kyrre; Klepp, Knut-Inge; Kaaya, Sylvia; Onya, Hans E.; Wubs, Annegreet; Helleve, Arnfinn; Flisher, Alan J.
2011-01-01
A 14-item human immunodeficiency virus/acquired immunodeficiency syndrome knowledge scale was used among school students in 80 schools in 3 sites in Sub-Saharan Africa (Cape Town and Mankweng, South Africa, and Dar es Salaam, Tanzania). For each item, an incorrect or don't know response was coded as 0 and correct response as 1. Exploratory factor…
Using SAS PROC MCMC for Item Response Theory Models
Samonte, Kelli
2014-01-01
Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models. PMID:29795834
Effects of Aging and IQ on Item and Associative Memory
ERIC Educational Resources Information Center
Ratcliff, Roger; Thapar, Anjali; McKoon, Gail
2011-01-01
The effects of aging and IQ on performance were examined in 4 memory tasks: item recognition, associative recognition, cued recall, and free recall. For item and associative recognition, accuracy and the response time (RT) distributions for correct and error responses were explained by Ratcliff's (1978) diffusion model at the level of individual…
A Bayesian Semiparametric Item Response Model with Dirichlet Process Priors
ERIC Educational Resources Information Center
Miyazaki, Kei; Hoshino, Takahiro
2009-01-01
In Item Response Theory (IRT), item characteristic curves (ICCs) are illustrated through logistic models or normal ogive models, and the probability that examinees give the correct answer is usually a monotonically increasing function of their ability parameters. However, since only limited patterns of shapes can be obtained from logistic models…
Item Response Theory Modeling of the Philadelphia Naming Test
ERIC Educational Resources Information Center
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.
2015-01-01
Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.
ERIC Educational Resources Information Center
Marco, Gary L.; And Others
Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…
Empirical Histograms in Item Response Theory with Ordinal Data
ERIC Educational Resources Information Center
Woods, Carol M.
2007-01-01
The purpose of this research is to describe, test, and illustrate a new implementation of the empirical histogram (EH) method for ordinal items. The EH method involves the estimation of item response model parameters simultaneously with the approximation of the distribution of the random latent variable (theta) as a histogram. Software for the EH…
Discussion of David Thissen's Bad Questions: An Essay Involving Item Response Theory
ERIC Educational Resources Information Center
Ackerman, Terry
2016-01-01
In this commentary, University of North Carolina's associate dean of research and assessment at the School of Education Terry Ackerman poses questions and shares his thoughts on David Thissen's essay, "Bad Questions: An Essay Involving Item Response Theory" (this issue). Ackerman begins by considering the two purposes of Item Response…
Data Visualization of Item-Total Correlation by Median Smoothing
ERIC Educational Resources Information Center
Yu, Chong Ho; Douglas, Samantha; Lee, Anna; An, Min
2016-01-01
This paper aims to illustrate how data visualization could be utilized to identify errors prior to modeling, using an example with multi-dimensional item response theory (MIRT). MIRT combines item response theory and factor analysis to identify a psychometric model that investigates two or more latent traits. While it may seem convenient to…
Some Issues in Item Response Theory: Dimensionality Assessment and Models for Guessing
ERIC Educational Resources Information Center
Smith, Jessalyn
2009-01-01
Currently, standardized tests are widely used as a method to measure how well schools and students meet academic standards. As a result, measurement issues have become an increasingly popular topic of study. Unidimensional item response models are used to model latent abilities and specific item characteristics. This class of models makes…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.
ERIC Educational Resources Information Center
Reckase, Mark D.; McKinley, Robert L.
A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
Measurement Properties of Two Innovative Item Formats in a Computer-Based Test
ERIC Educational Resources Information Center
Wan, Lei; Henly, George A.
2012-01-01
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Semiparametric Item Response Functions in the Context of Guessing
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2016-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
A New Procedure for Detection of Students' Rapid Guessing Responses Using Response Time
ERIC Educational Resources Information Center
Guo, Hongwen; Rios, Joseph A.; Haberman, Shelby; Liu, Ou Lydia; Wang, Jing; Paek, Insu
2016-01-01
Unmotivated test takers using rapid guessing in item responses can affect validity studies and teacher and institution performance evaluation negatively, making it critical to identify these test takers. The authors propose a new nonparametric method for finding response-time thresholds for flagging item responses that result from rapid-guessing…
Barata, Carlos; Markich, Scott J; Baird, Donald J; Taylor, Graeme; Soares, Amadeu M V M
2002-10-02
To date, studies on genetic variability in the tolerance of aquatic biota to chemicals have focused on exposure to single chemicals. In the field, metals occur as elemental mixtures, and thus it is essential to study whether the genetic consequences of exposure to such mixtures differs from response to single chemicals. This study determined the feeding responses of three Daphnia magna Straus clones exposed to Cd and Zn, both individually and as mixtures. Tolerance to mixtures of Cd and Zn was expressed as the proportional feeding depression of D. magna to Cd at increasing zinc concentrations. A quantitative genetic analysis revealed that genotype and genotype x environmental factors governed population responses to mixtures of both metals. More specifically, genetic variation in tolerance to sublethal levels of Cd decreased at those Zn concentrations where there were no effects on feeding, and increased again at Zn concentrations that affected feeding. The existence of genotype x environmental interactions indicated that the genetic consequences of exposing D. magna to mixtures of Cd and Zn cannot be predicted from the animals' response to single metals alone. Therefore, current ecological risk assessment methodologies for predicting the effects of chemical mixtures may wish to incorporate the concept of genetic variability. Furthermore, exposure to low and moderate concentrations of Zn increased the sublethal tolerance to Cd. This induction of tolerance to Cd by Zn was also observed for D. magna fed algae pre-loaded with both metals. Furthermore, in only one clone, physiological acclimatization to zinc also induced tolerance to cadmium. These results suggest that the feeding responses of D. magna may be related to gut poisoning induced by the release of metals from algae under low pH conditions. In particular, both induction of metallothionein synthesis by Zn and competition between Zn and Cd ions for uptake at target sites on the gut wall may be involved in determining sublethal responses to mixtures of both metals.
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Extreme Response Style: Which Model Is Best?
ERIC Educational Resources Information Center
Leventhal, Brian
2017-01-01
More robust and rigorous psychometric models, such as multidimensional Item Response Theory models, have been advocated for survey applications. However, item responses may be influenced by construct-irrelevant variance factors such as preferences for extreme response options. Through empirical and simulation methods, this study evaluates the use…
2012-01-01
Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12) – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension. Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware. PMID:22686586
Stochl, Jan; Jones, Peter B; Croudace, Tim J
2012-06-11
Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension.Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware.
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data
Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L. J.; Maris, Gunter
2016-01-01
We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two ‘one-process’ models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a ‘two-process’ model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses. PMID:27167518
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).
Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul
2018-03-01
Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Pilkonis, Paul A.; Choi, Seung W.; Reise, Steven P.; Stover, Angela M.; Riley, William T.; Cella, David
2011-01-01
The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately −1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items. PMID:21697139
Pilkonis, Paul A; Choi, Seung W; Reise, Steven P; Stover, Angela M; Riley, William T; Cella, David
2011-09-01
The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately -1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items.
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
2015-12-01
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Can Item Keyword Feedback Help Remediate Knowledge Gaps?
Feinberg, Richard A; Clauser, Amanda L
2016-10-01
In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.
Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi
2014-01-01
Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W
2015-05-01
To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Iwata, Noboru; Kikuchi, Kenichi; Fujihara, Yuya
2016-08-01
An innovative measurement system using a computerized adaptive testing technique based on the item response theory (CAT) has been expanding to measure mental health status. However, little is known about details in its measurement properties based on the empirical data. Moreover, the response time (RT) data, which are not available by a paper-and-pencil measurement but available by a computerized measurement, would be worth investigating for exploring the response behavior. We aimed at constructing the CAT to measure depressive symptomatology in a community population and exploring its measurement properties. Also, we examined the relationships between RTs, individual item responses, and depressive levels. For constructing the CAT system, responses of 2061 workers and university students to 24 depression scale plus four negatively revised positive affect items were subjected to a polytomous IRT analysis. The stopping rule was set for standard error of estimation < 0.30 or the maximum 15 items displayed. The CAT and non-adaptive computer-based test (CBT) were administered to 209 undergraduates, and 168 of them administered again after 1 week. On average, the CAT was converged by 10.4 items. The θ values estimated by CAT and CBT were highly correlated (r = 0.94 and 0.95 for the 1st and 2nd measurements) and with the traditional scoring procedures (r's > 0.90). The test-retest reliability was at a satisfactory level (r = 0.86). RTs to some items significantly correlated with the θ estimates. The mean RT varied by the item contents and wording, i.e., the RT to positive affect items required additional 2 s or longer than the other subscale items. The CAT would be a reliable and practical measurement tool for various purposes including stress check at workplace.
ERIC Educational Resources Information Center
Bowles, Ben; Harlow, Iain M.; Meeking, Melissa M.; Kohler, Stefan
2012-01-01
It is widely accepted that signal-detection mechanisms contribute to item-recognition memory decisions that involve discriminations between targets and lures based on a controlled laboratory study episode. Here, the authors employed mathematical modeling of receiver operating characteristics (ROC) to determine whether and how a signal-detection…
Tassé, Marc J; Schalock, Robert L; Thissen, David; Balboni, Giulia; Bersani, Henry Hank; Borthwick-Duffy, Sharon A; Spreat, Scott; Widaman, Keith F; Zhang, Dalun; Navas, Patricia
2016-03-01
The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT modeling and a nationally representative standardization sample, the item set was reduced to 75 items that provide the most precise adaptive behavior information at the cutoff area determining the presence or not of significant adaptive behavior deficits across conceptual, social, and practical skills. The standardization of the DABS is described and discussed.
Maloney, Erin M; Morrissey, Christy A; Headley, John V; Peru, Kerry M; Liber, Karsten
2017-11-01
Extensive agricultural use of neonicotinoid insecticide products has resulted in the presence of neonicotinoid mixtures in surface waters worldwide. Although many aquatic insect species are known to be sensitive to neonicotinoids, the impact of neonicotinoid mixtures is poorly understood. In the present study, the cumulative toxicities of binary and ternary mixtures of select neonicotinoids (imidacloprid, clothianidin, and thiamethoxam) were characterized under acute (96-h) exposure scenarios using the larval midge Chironomus dilutus as a representative aquatic insect species. Using the MIXTOX approach, predictive parametric models were fitted and statistically compared with observed toxicity in subsequent mixture tests. Single-compound toxicity tests yielded median lethal concentration (LC50) values of 4.63, 5.93, and 55.34 μg/L for imidacloprid, clothianidin, and thiamethoxam, respectively. Because of the similar modes of action of neonicotinoids, concentration-additive cumulative mixture toxicity was the predicted model. However, we found that imidacloprid-clothianidin mixtures demonstrated response-additive dose-level-dependent synergism, clothianidin-thiamethoxam mixtures demonstrated concentration-additive synergism, and imidacloprid-thiamethoxam mixtures demonstrated response-additive dose-ratio-dependent synergism, with toxicity shifting from antagonism to synergism as the relative concentration of thiamethoxam increased. Imidacloprid-clothianidin-thiamethoxam ternary mixtures demonstrated response-additive synergism. These results indicate that, under acute exposure scenarios, the toxicity of neonicotinoid mixtures to C. dilutus cannot be predicted using the common assumption of additive joint activity. Indeed, the overarching trend of synergistic deviation emphasizes the need for further research into the ecotoxicological effects of neonicotinoid insecticide mixtures in field settings, the development of better toxicity models for neonicotinoid mixture exposures, and the consideration of mixture effects when setting water quality guidelines for this class of pesticides. Environ Toxicol Chem 2017;36:3091-3101. © 2017 SETAC. © 2017 SETAC.
Hoffmann, Krista Callinan; Deanovic, Linda; Werner, Inge; Stillway, Marie; Fong, Stephanie; Teh, Swee
2016-10-01
A novel 2-tiered analytical approach was used to characterize and quantify interactions between type I and type II pyrethroids in Hyalella azteca using standardized water column toxicity tests. Bifenthrin, permethrin, cyfluthrin, and lambda-cyhalothrin were tested in all possible binary combinations across 6 experiments. All mixtures were analyzed for 4-d lethality, and 2 of the 6 mixtures (permethrin-bifenthrin and permethrin-cyfluthrin) were tested for subchronic 10-d lethality and sublethal effects on swimming motility and growth. Mixtures were initially analyzed for interactions using regression analyses, and subsequently compared with the additive models of concentration addition and independent action to further characterize mixture responses. Negative interactions (antagonistic) were significant in 2 of the 6 mixtures tested, including cyfluthrin-bifenthrin and cyfluthrin-permethrin, but only on the acute 4-d lethality endpoint. In both cases mixture responses fell between the additive models of concentration addition and independent action. All other mixtures were additive across 4-d lethality, and bifenthrin-permethrin and cyfluthrin-permethrin were also additive in terms of subchronic 10-d lethality and sublethal responses. Environ Toxicol Chem 2016;35:2542-2549. © 2016 SETAC. © 2016 SETAC.
Modelling Mathematics Problem Solving Item Responses Using a Multidimensional IRT Model
ERIC Educational Resources Information Center
Wu, Margaret; Adams, Raymond
2006-01-01
This research examined students' responses to mathematics problem-solving tasks and applied a general multidimensional IRT model at the response category level. In doing so, cognitive processes were identified and modelled through item response modelling to extract more information than would be provided using conventional practices in scoring…
Bayesian Estimation of Multi-Unidimensional Graded Response IRT Models
ERIC Educational Resources Information Center
Kuo, Tzu-Chun
2015-01-01
Item response theory (IRT) has gained an increasing popularity in large-scale educational and psychological testing situations because of its theoretical advantages over classical test theory. Unidimensional graded response models (GRMs) are useful when polytomous response items are designed to measure a unified latent trait. They are limited in…
Mixture design procedure for flexible base.
DOT National Transportation Integrated Search
2013-04-01
This document provides information on mixture design requirements for a flexible base course. Sections : design requirements, job mix formula, contractor's responsibility, and engineer's responsibility. Tables : material requirements; requirements fo...
Toumi, Héla; Boumaiza, Moncef; Millet, Maurice; Radetski, Claudemir Marcos; Camara, Baba Issa; Felten, Vincent; Masfaraud, Jean-François; Férard, Jean-François
2018-04-19
We studied the combined acute effect (i.e., after 48 h) of deltamethrin (a pyrethroid insecticide) and malathion (an organophosphate insecticide) on Daphnia magna. Two approaches were used to examine the potential interaction effects of eight mixtures of deltamethrin and malathion: (i) calculation of mixture toxicity index (MTI) and safety factor index (SFI) and (ii) response surface methodology coupled with isobole-based statistical model (using generalized linear model). According to the calculation of MTI and SFI, one tested mixture was found additive while the two other tested mixtures were found no additive (MTI) or antagonistic (SFI), but these differences between index responses are only due to differences in terminology related to these two indexes. Through the surface response approach and isobologram analysis, we concluded that there was a significant antagonistic effect of the binary mixtures of deltamethrin and malathion that occurs on D. magna immobilization, after 48 h of exposure. Index approaches and surface response approach with isobologram analysis are complementary. Calculation of mixture toxicity index and safety factor index allows identifying punctually the type of interaction for several tested mixtures, while the surface response approach with isobologram analysis integrates all the data providing a global outcome about the type of interactive effect. Only the surface response approach and isobologram analysis allowed the statistical assessment of the ecotoxicological interaction. Nevertheless, we recommend the use of both approaches (i) to identify the combined effects of contaminants and (ii) to improve risk assessment and environmental management.
Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.
2015-01-01
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178
Development of a PROMIS item bank to measure pain interference.
Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei
2010-07-01
This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B
2015-01-01
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
The aftermath of memory retrieval for recycling visual working memory representations.
Park, Hyung-Bum; Zhang, Weiwei; Hyun, Joo-Seok
2017-07-01
We examined the aftermath of accessing and retrieving a subset of information stored in visual working memory (VWM)-namely, whether detection of a mismatch between memory and perception can impair the original memory of an item while triggering recognition-induced forgetting for the remaining, untested items. For this purpose, we devised a consecutive-change detection task wherein two successive testing probes were displayed after a single set of memory items. Across two experiments utilizing different memory-testing methods (whole vs. single probe), we observed a reliable pattern of poor performance in change detection for the second test when the first test had exhibited a color change. The impairment after a color change was evident even when the same memory item was repeatedly probed; this suggests that an attention-driven, salient visual change made it difficult to reinstate the previously remembered item. The second change detection, for memory items untested during the first change detection, was also found to be inaccurate, indicating that recognition-induced forgetting had occurred for the unprobed items in VWM. In a third experiment, we conducted a task that involved change detection plus continuous recall, wherein a memory recall task was presented after the change detection task. The analyses of the distributions of recall errors with a probabilistic mixture model revealed that the memory impairments from both visual changes and recognition-induced forgetting are explained better by the stochastic loss of memory items than by their degraded resolution. These results indicate that attention-driven visual change and recognition-induced forgetting jointly influence the "recycling" of VWM representations.
Response Times to Gustatory–Olfactory Flavor Mixtures: Role of Congruence
Shepard, Timothy G.; Veldhuizen, Maria G.
2015-01-01
A mixture of perceptually congruent gustatory and olfactory flavorants (sucrose and citral) was previously shown to be detected faster than predicted by a model of probability summation that assumes stochastically independent processing of the individual gustatory and olfactory signals. This outcome suggests substantial integration of the signals. Does substantial integration also characterize responses to mixtures of incongruent flavorants? Here, we report simple response times (RTs) to detect brief pulses of 3 possible flavorants: monosodium glutamate, MSG (gustatory: “umami” quality), citral (olfactory: citrus quality), and a mixture of MSG and citral (gustatory–olfactory). Each stimulus (and, on a fraction of trials, water) was presented orally through a computer-operated, automated flow system, and subjects were instructed to press a button as soon as they detected any of the 3 non-water stimuli. Unlike responses previously found to the congruent mixture of sucrose and citral, responses here to the incongruent mixture of MSG and citral took significantly longer (RTs were greater) and showed lower detection rates than the values predicted by probability summation. This outcome suggests that the integration of gustatory and olfactory flavor signals is less extensive when the component flavors are perceptually incongruent rather than congruent, perhaps because incongruent flavors are less familiar. PMID:26304508
Defining an additivity framework for mixture research in inducible whole-cell biosensors
NASA Astrophysics Data System (ADS)
Martin-Betancor, K.; Ritz, C.; Fernández-Piñas, F.; Leganés, F.; Rodea-Palomares, I.
2015-11-01
A novel additivity framework for mixture effect modelling in the context of whole cell inducible biosensors has been mathematically developed and implemented in R. The proposed method is a multivariate extension of the effective dose (EDp) concept. Specifically, the extension accounts for differential maximal effects among analytes and response inhibition beyond the maximum permissive concentrations. This allows a multivariate extension of Loewe additivity, enabling direct application in a biphasic dose-response framework. The proposed additivity definition was validated, and its applicability illustrated by studying the response of the cyanobacterial biosensor Synechococcus elongatus PCC 7942 pBG2120 to binary mixtures of Zn, Cu, Cd, Ag, Co and Hg. The novel method allowed by the first time to model complete dose-response profiles of an inducible whole cell biosensor to mixtures. In addition, the approach also allowed identification and quantification of departures from additivity (interactions) among analytes. The biosensor was found to respond in a near additive way to heavy metal mixtures except when Hg, Co and Ag were present, in which case strong interactions occurred. The method is a useful contribution for the whole cell biosensors discipline and related areas allowing to perform appropriate assessment of mixture effects in non-monotonic dose-response frameworks
Item Response Theory Modeling of the Philadelphia Naming Test.
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D
2015-06-01
In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Developing an African youth psychosocial assessment: an application of item response theory.
Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise
2014-06-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Developing an African youth psychosocial assessment: an application of item response theory
BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE
2014-01-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
A Note on Item-Restscore Association in Rasch Models
ERIC Educational Resources Information Center
Kreiner, Svend
2011-01-01
To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?
ERIC Educational Resources Information Center
DeMars, Christine
Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Bornia, Antonio Cezar; de Andrade, Dalton Francisco; Campos, Lucila Maria de Souza
2012-01-01
Growing challenges with respect to preserving the environment have forced changes in company operational structures. Thus, the objective of this article is to measure the evidence of Environmental Management using the Item Response Theory, based on website analysis from Brazilian industrial companies from sectors defined through the scope of the research. This is a qualitative, exploratory, and descriptive study related to an information collection and analysis instrument. The general view of the research problem with respect to the phenomenon under study in based on multi-case studies, with the methodological outline based on the theoretical reference used. Primary data was gathered from 270 company websites from 7 different Brazilian sectors and led to the creation of 26 items approved by environmental specialists. The results were attained with the measuring of Environmental Management evidence via the Item Response Theory, providing a clear order of the items involved based on each item's level of difficulty, quality, and propriety. This permitted the measurement of each item's quality and propriety, as well as that of the respondents, placing them on the same analysis scale. Increasing the number of items and companies involved is suggested fEor future research in order to permit broader sector analysis.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert
2016-01-01
Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
Addante, Richard, J.; Ranganath, Charan; Yonelinas, Andrew, P.
2012-01-01
Recollection is typically associated with high recognition confidence and accurate source memory. However, subjects sometimes make accurate source memory judgments even for items that are not confidently recognized, and it is not known whether these responses are based on recollection or some other memory process. In the current study, we measured event related potentials (ERPs) while subjects made item and source memory confidence judgments in order to determine whether recollection supported accurate source recognition responses for items that were not confidently recognized. In line with previous studies, we found that recognition memory was associated with two ERP effects: an early on-setting FN400 effect, and a later parietal old-new effect [Late Positive Component (LPC)], which have been associated with familiarity and recollection, respectively. The FN400 increased gradually with item recognition confidence, whereas the LPC was only observed for highly confident recognition responses. The LPC was also related to source accuracy, but only for items that had received a high confidence item recognition response; accurate source judgments to items that were less confidently recognized did not exhibit the typical ERP correlate of recollection or familiarity, but rather showed a late, broadly distributed negative ERP difference. The results indicate that accurate source judgments of episodic context can occur even when recollection fails. PMID:22548808
ERIC Educational Resources Information Center
Roberts, James S.; Laughlin, James E.
1996-01-01
A parametric item response theory model for unfolding binary or graded responses is developed. The graded unfolding model (GUM) is a generalization of the hyperbolic cosine model for binary data of D. Andrich and G. Luo (1993). Applicability of the GUM to attitude testing is illustrated with real data. (SLD)
Conjunctive and Disjunctive Extensions of the Least Squares Distance Model of Cognitive Diagnosis
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.; Atanasov, Dimitar V.
2012-01-01
Many models of cognitive diagnosis, including the "least squares distance model" (LSDM), work under the "conjunctive" assumption that a correct item response occurs when all latent attributes required by the item are correctly performed. This article proposes a "disjunctive" version of the LSDM under which the correct item response occurs when "at…
ERIC Educational Resources Information Center
Yao, Lihua; Schwarz, Richard D.
2006-01-01
Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…
ERIC Educational Resources Information Center
Matlock Cole, Ki Lynn; Turner, Ronna C.; Gitchel, W. Dent
2018-01-01
The generalized partial credit model (GPCM) is often used for polytomous data; however, the nominal response model (NRM) allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. Ten items from three different self-reported scales were used (anxiety, depression, and…
ERIC Educational Resources Information Center
Watson, Kathy; Baranowski, Tom; Thompson, Debbe
2006-01-01
Perceived self-efficacy (SE) for eating fruit and vegetables (FV) is a key variable mediating FV change in interventions. This study applies item response modeling (IRM) to a fruit, juice and vegetable self-efficacy questionnaire (FVSEQ) previously validated with classical test theory (CTT) procedures. The 24-item (five-point Likert scale) FVSEQ…
Characterizing Sources of Uncertainty in Item Response Theory Scale Scores
ERIC Educational Resources Information Center
Yang, Ji Seung; Hansen, Mark; Cai, Li
2012-01-01
Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…
Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2015-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R.
2005-01-01
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the…
ERIC Educational Resources Information Center
Bontempo, Robert
1993-01-01
Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)
Using the Item Response Theory (IRT) for Educational Evaluation through Games
ERIC Educational Resources Information Center
Euzébio Batista, Marcelo Henrique; Victória Barbosa, Jorge Luis; da Rosa Tavares, João Elison; Hackenhaar, Jonathan Luis
2013-01-01
This article shows the application of Item Response Theory (IRT) for educational evaluation using games. The article proposes a computational model to create user profiles, called Psychometric Profile Generator (PPG). PPG uses the IRT mathematical model for exploring the levels of skills and behaviors in the form of items and/or stimuli. The model…
A Person Fit Test for IRT Models for Polytomous Items
ERIC Educational Resources Information Center
Glas, C. A. W.; Dagohoy, Anna Villa T.
2007-01-01
A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier…
Observed Score and True Score Equating Procedures for Multidimensional Item Response Theory
ERIC Educational Resources Information Center
Brossman, Bradley Grant
2010-01-01
The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the Multidimensional Item Response Theory (MIRT) framework. Currently, MIRT scale linking procedures exist to place item parameter estimates and ability estimates on the same scale after separate calibrations are conducted.…
Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
A Note on the Reliability Coefficients for Item Response Model-Based Ability Estimates
ERIC Educational Resources Information Center
Kim, Seonghoon
2012-01-01
Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true…
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test
ERIC Educational Resources Information Center
Kahraman, Nilüfer
2014-01-01
Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items
ERIC Educational Resources Information Center
Mariano, Louis T.; Junker, Brian W.
2007-01-01
When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…
The Effect of the Multiple-Choice Item Format on the Measurement of Knowledge of Language Structure
ERIC Educational Resources Information Center
Currie, Michael; Chiramanee, Thanyapa
2010-01-01
Noting the widespread use of multiple-choice items in tests in English language education in Thailand, this study compared their effect against that of constructed-response items. One hundred and fifty-two university undergraduates took a test of English structure first in constructed-response format, and later in three, stem-equivalent…
Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua
2018-03-08
A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
Application of Item Response Theory to Tests of Substance-related Associative Memory
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
2015-01-01
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Various selected vegetables, fruits, mushrooms and red wine residue inhibit bone resorption in rats.
Mühlbauer, Roman C; Lozano, Annemarie; Reinli, Andreas; Wetli, Herbert
2003-11-01
To make a broad survey of the effect of components of the human diet on bone resorption, a few items from the following categories were added to rat diets: vegetables, fruits, beans, nuts and seeds, mushrooms, carbohydrate sources and beverages. The effect on bone resorption was measured by the urinary excretion of tritium released from bones of 9-wk-old rats prelabeled with tritiated tetracycline from weeks 1 to 6. The number of rats per experiment was 26--6, 5, 5, 5 and 5 in the untreated control group fed the plain semipurified diet, the positive control group fed onions and three groups fed one of the newly investigated items, respectively. New experiments were added until 10 rats were fed each item in each of two separate experiments. The results for each item were compared to those for the untreated control group (n = 12) investigated simultaneously. We found that feeding rats 1 g/d of dry fennel, celeriac, oranges, prunes, French beans and farmed and wild mushrooms (Agaricus hortensis and Boletus edulis) as well as the freeze-dried residue from red wine significantly (P < 0.05 or lower) inhibited bone resorption. Eighteen items had no significant effect. To date we have found 25/53 items that exhibit inhibitory activity. Activity appears to be restricted to the following categories: vegetables, salads, herbs, mushrooms, fruits and red wine residue (25/36 items effective). Furthermore, as assessed in a similar experimental design with various doses of a mixture of active items, we determined the minimum effective dose of the dry items to be 170 mg/d. These results open the possibility for targeted interventions in humans.
Metillo, Ephrime B; Ritz, David A
2003-02-01
Three mysid species showed differences in chemosensory feeding as judged from stereotyped food capturing responses to dissolved mixtures of feeding stimulant (either betaine-HCl or glycine) and suppressant (ammonium). The strongest responses were to 50:50 mixtures of both betaine-ammonium and glycine-ammonium solutions. In general, the response curve to the different mixtures tested was bell-shaped. Anisomysis mixta australis only showed the normal curve in response to the glycine-ammonium mixture. The platykurtic curve for Tenagomysis tasmaniae suggests a less optimal response to the betaine-HCl-ammonium solution. Paramesopodopsis rufa reacted more strongly to the betaine-ammonium than to the glycine-ammonium solutions, and more individuals of this species responded to both solutions than the other two species. It is suggested that these contrasting chemosensitivities of the three coexisting mysid species serve as a means of partitioning the feeding niche.
Unidimensional Interpretations for Multidimensional Test Items
ERIC Educational Resources Information Center
Kahraman, Nilufer
2013-01-01
This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Altenburger, Rolf; Scholze, Martin; Busch, Wibke; Escher, Beate I; Jakobs, Gianina; Krauss, Martin; Krüger, Janet; Neale, Peta A; Ait-Aissa, Selim; Almeida, Ana Catarina; Seiler, Thomas-Benjamin; Brion, François; Hilscherová, Klára; Hollert, Henner; Novák, Jiří; Schlichting, Rita; Serra, Hélène; Shao, Ying; Tindall, Andrew; Tolefsen, Knut-Erik; Umbuzeiro, Gisela; Williams, Tim D; Kortenkamp, Andreas
2018-05-01
Chemicals in the environment occur in mixtures rather than as individual entities. Environmental quality monitoring thus faces the challenge to comprehensively assess a multitude of contaminants and potential adverse effects. Effect-based methods have been suggested as complements to chemical analytical characterisation of complex pollution patterns. The regularly observed discrepancy between chemical and biological assessments of adverse effects due to contaminants in the field may be either due to unidentified contaminants or result from interactions of compounds in mixtures. Here, we present an interlaboratory study where individual compounds and their mixtures were investigated by extensive concentration-effect analysis using 19 different bioassays. The assay panel consisted of 5 whole organism assays measuring apical effects and 14 cell- and organism-based bioassays with more specific effect observations. Twelve organic water pollutants of diverse structure and unique known modes of action were studied individually and as mixtures mirroring exposure scenarios in freshwaters. We compared the observed mixture effects against component-based mixture effect predictions derived from additivity expectations (assumption of non-interaction). Most of the assays detected the mixture response of the active components as predicted even against a background of other inactive contaminants. When none of the mixture components showed any activity by themselves then the mixture also was without effects. The mixture effects observed using apical endpoints fell in the middle of a prediction window defined by the additivity predictions for concentration addition and independent action, reflecting well the diversity of the anticipated modes of action. In one case, an unexpectedly reduced solubility of one of the mixture components led to mixture responses that fell short of the predictions of both additivity mixture models. The majority of the specific cell- and organism-based endpoints produced mixture responses in agreement with the additivity expectation of concentration addition. Exceptionally, expected (additive) mixture response did not occur due to masking effects such as general toxicity from other compounds. Generally, deviations from an additivity expectation could be explained due to experimental factors, specific limitations of the effect endpoint or masking side effects such as cytotoxicity in in vitro assays. The majority of bioassays were able to quantitatively detect the predicted non-interactive, additive combined effect of the specifically bioactive compounds against a background of complex mixture of other chemicals in the sample. This supports the use of a combination of chemical and bioanalytical monitoring tools for the identification of chemicals that drive a specific mixture effect. Furthermore, we demonstrated that a panel of bioassays can provide a diverse profile of effect responses to a complex contaminated sample. This could be extended towards representing mixture adverse outcome pathways. Our findings support the ongoing development of bioanalytical tools for (i) compiling comprehensive effect-based batteries for water quality assessment, (ii) designing tailored surveillance methods to safeguard specific water uses, and (iii) devising strategies for effect-based diagnosis of complex contamination. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
A validation study of public health knowledge, skills, social responsibility and applied learning.
Vackova, Dana; Chen, Coco K; Lui, Juliana N M; Johnston, Janice M
2018-06-22
To design and validate a questionnaire to measure medical students' Public Health (PH) knowledge, skills, social responsibility and applied learning as indicated in the four domains recommended by the Association of Schools & Programmes of Public Health (ASPPH). A cross-sectional study was conducted to develop an evaluation tool for PH undergraduate education through item generation, reduction, refinement and validation. The 74 preliminary items derived from the existing literature were reduced to 55 items based on expert panel review which included those with expertise in PH, psychometrics and medical education, as well as medical students. Psychometric properties of the preliminary questionnaire were assessed as follows: frequency of endorsement for item variance; principal component analysis (PCA) with varimax rotation for item reduction and factor estimation; Cronbach's Alpha, item-total correlation and test-retest validity for internal consistency and reliability. PCA yielded five factors: PH Learning Experience (6 items); PH Risk Assessment and Communication (5 items); Future Use of Evidence in Practice (6 items); Recognition of PH as a Scientific Discipline (4 items); and PH Skills Development (3 items), explaining 72.05% variance. Internal consistency and reliability tests were satisfactory (Cronbach's Alpha ranged from 0.87 to 0.90; item-total correlation > 0.59). Lower paired test-retest correlations reflected instability in a social science environment. An evaluation tool for community-centred PH education has been developed and validated. The tool measures PH knowledge, skills, social responsibilities and applied learning as recommended by the internationally recognised Association of Schools & Programmes of Public Health (ASPPH).
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
2014-01-01
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Mechanisms of Choice Behavior Shift Using Cue-approach Training.
Bakkour, Akram; Leuker, Christina; Hover, Ashleigh M; Giles, Nathan; Poldrack, Russell A; Schonberg, Tom
2016-01-01
Cue-approach training has been shown to effectively shift choices for snack food items by associating a cued button-press motor response to particular food items. Furthermore, attention was biased toward previously cued items, even when the cued item is not chosen for real consumption during a choice phase. However, the exact mechanism by which preferences shift during cue-approach training is not entirely clear. In three experiments, we shed light on the possible underlying mechanisms at play during this novel paradigm: (1) Uncued, wholly predictable motor responses paired with particular food items were not sufficient to elicit a preference shift; (2) Cueing motor responses early - concurrently with food item onset - and thus eliminating the need for heightened top-down attention to the food stimulus in preparation for a motor response also eliminated the shift in food preferences. This finding reinforces our hypothesis that heightened attention at behaviorally relevant points in time is key to changing choice behavior in the cue-approach task; (3) Crucially, indicating choice using eye movements rather than manual button presses preserves the effect, thus demonstrating that the shift in preferences is not governed by a learned motor response but more likely via modulation of subjective value in higher associative regions, consistent with previous neuroimaging results. Cue-approach training drives attention at behaviorally relevant points in time to modulate the subjective value of individual items, providing a mechanism for behavior change that does not rely on external reinforcement and that holds great promise for developing real world behavioral interventions.
Predicting the response of olfactory sensory neurons to odor mixtures from single odor response
NASA Astrophysics Data System (ADS)
Marasco, Addolorata; de Paris, Alessandro; Migliore, Michele
2016-04-01
The response of olfactory receptor neurons to odor mixtures is not well understood. Here, using experimental constraints, we investigate the mathematical structure of the odor response space and its consequences. The analysis suggests that the odor response space is 3-dimensional, and predicts that the dose-response curve of an odor receptor can be obtained, in most cases, from three primary components with specific properties. This opens the way to an objective procedure to obtain specific olfactory receptor responses by manipulating mixtures in a mathematically predictable manner. This result is general and applies, independently of the number of odor components, to any olfactory sensory neuron type with a response curve that can be represented as a sigmoidal function of the odor concentration.
Cappelleri, J C; Althof, S E; Siegel, R L; Shpilsky, A; Bell, S S; Duttagupta, S
2004-02-01
Development and validation of a patient-reported measure of psychosocial variables in men with erectile dysfunction (ED) is described. Literature review, focus groups, and medical specialists identified 86 potential items. Redundant, ambiguous, or low item-to-total correlation items were removed. Data from 98 men reporting diagnosed ED and 94 controls assisted in final item selection and psychometric evaluation. Treatment responsiveness was evaluated in 93 men with ED in a 10-week open-label trial of sildenafil citrate (Viagra). The 14 chosen items resolved into two domains: Sexual Relationship (eight items) and Confidence (six items), the latter comprising Self-Esteem (four items) and Overall Relationship (two items) subscales. The resulting Self-Esteem And Relationship (SEAR) questionnaire demonstrated validity and reliability. The intervention study demonstrated responsiveness to beneficial treatment with significant improvement in scores (P=0.0001). The SEAR questionnaire possesses strong psychometric properties that support its validity and reliability for measuring sexual relationship, confidence, and particularly self-esteem.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.
Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri
2017-03-01
Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Three tests and three corrections: Comment on Koen and Yonelinas (2010)
Jang, Yoonhee; Mickes, Laura; Wixted, John T.
2012-01-01
The slope of the z-transformed receiver-operating characteristic (zROC) in recognition memory experiments is usually less than 1, which has long been interpreted to mean that the variance of the target distribution is greater than the variance of the lure distribution. The greater variance of the target distribution could arise because the different items on a list receive different increments in memory strength during study (the “encoding variability” hypothesis). In a test of that interpretation, J. Koen and A. Yonelinas (2010, K&Y) attempted to further increase encoding variability to see if it would further decrease the slope of the zROC. To do so, they presented items on a list for two different durations and then mixed the weak and strong targets together. After performing three tests on the mixed-strength data, K&Y concluded that encoding variability does not explain why the slope of the zROC is typically less than one. However, we show that their tests have no bearing on the encoding variability account. Instead, they bear on the mixture-UVSD model that corresponds to their experimental design. On the surface, the results reported by K&Y appear to be inconsistent with the predictions of the mixture-UVSD model (though they were taken to be inconsistent with the predictions of the encoding variability hypothesis). However, all three of the tests they performed contained errors. When those errors are corrected, the same three tests show that their data support, rather than contradict, the mixture-UVSD model (but they still have no bearing on the encoding variability hypothesis). PMID:22390323
Randomized Item Response Theory Models
ERIC Educational Resources Information Center
Fox, Jean-Paul
2005-01-01
The randomized response (RR) technique is often used to obtain answers on sensitive questions. A new method is developed to measure latent variables using the RR technique because direct questioning leads to biased results. Within the RR technique is the probability of the true response modeled by an item response theory (IRT) model. The RR…
ERIC Educational Resources Information Center
Qian, Xiaoyu; Nandakumar, Ratna; Glutting, Joseoph; Ford, Danielle; Fifield, Steve
2017-01-01
In this study, we investigated gender and minority achievement gaps on 8th-grade science items employing a multilevel item response methodology. Both gaps were wider on physics and earth science items than on biology and chemistry items. Larger gender gaps were found on items with specific topics favoring male students than other items, for…
Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar
2015-05-03
Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar
2015-01-01
Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen
2017-07-01
The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Wilmot, Michael P; Kostal, Jack W; Stillwell, David; Kosinski, Michal
2017-07-01
For the past 40 years, the conventional univariate model of self-monitoring has reigned as the dominant interpretative paradigm in the literature. However, recent findings associated with an alternative bivariate model challenge the conventional paradigm. In this study, item response theory is used to develop measures of the bivariate model of acquisitive and protective self-monitoring using original Self-Monitoring Scale (SMS) items, and data from two large, nonstudent samples ( Ns = 13,563 and 709). Results indicate that the new acquisitive (six-item) and protective (seven-item) self-monitoring scales are reliable, unbiased in terms of gender and age, and demonstrate theoretically consistent relations to measures of personality traits and cognitive ability. Additionally, by virtue of using original SMS items, previously collected responses can be reanalyzed in accordance with the alternative bivariate model. Recommendations for the reanalysis of archival SMS data, as well as directions for future research, are provided.
Pheromone modulates plant odor responses in the antennal lobe of a moth.
Chaffiol, Antoine; Dupuy, Fabienne; Barrozo, Romina B; Kropf, Jan; Renou, Michel; Rospars, Jean-Pierre; Anton, Sylvia
2014-06-01
In nature, male moths are exposed to a complex plant odorant environment when they fly upwind to a sex pheromone source in their search for mates. Plant odors have been shown to affect responses to pheromone at various levels but how does pheromone affects plant odor perception? We recorded responses from neurons within the non-pheromonal "ordinary glome ruli" of the primary olfactory center, the antennal lobe (AL), to single and pulsed stimulations with the plant odorant heptanal, the pheromone, and their mixture in the male moth Agrotis ipsilon. We identified 3 physiological types of neurons according to their activity patterns combining excitatory and inhibitory phases. Both local and projection neurons were identified in each physiological type. Neurons with excitatory responses to heptanal responded also frequently to the pheromone and showed additive responses to the mixture. Moreover, the neuron's ability of resolving successive pulses generally improved with the mixture. Only some neurons with combined excitatory/inhibitory, or purely inhibitory responses to heptanal, also responded to the pheromone. Although individual mixture responses were not significantly different from heptanal responses in these neurons, pulse resolution was improved with the mixture as compared with heptanal alone. These results demonstrate that the pheromone and the general odorant subsystems interact more intensely in the moth AL than previously thought. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
2018-02-02
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Restricted interests and teacher presentation of items.
Stocco, Corey S; Thompson, Rachel H; Rodriguez, Nicole M
2011-01-01
Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning), the types of items or activities they select (e.g., preoccupation with a phone book), or the range of items or activities they select (i.e., narrow range of items). We sought to describe the relation between restricted interests and teacher presentation of items. Overall, we observed 5 teachers interacting with 2 pairs of students diagnosed with an ASD. Each pair included 1 student with restricted interests. During these observations, teachers were free to present any items from an array of 4 stimuli selected by experimenters. We recorded student responses to teacher presentation of items and analyzed the data to determine the relation between teacher presentation of items and the consequences for presentation provided by the students. Teacher presentation of items corresponded with differential responses provided by students with ASD, and those with restricted preferences experienced a narrower array of items.
Gainer, Amy; Cousins, Mark; Hogan, Natacha; Siciliano, Steven D
2018-05-05
Although petroleum hydrocarbons (PHCs) released to the environment typically occur as mixtures, PHC remediation guidelines often reflect individual substance toxicity. It is well documented that groups of aliphatic PHCs act via the same mechanism of action, nonpolar narcosis and, theoretically, concentration addition mixture toxicity principles apply. To assess this theory, ten standardized acute and chronic soil invertebrate toxicity tests on a range of organisms (Eisenia fetida, Lumbricus terrestris, Enchytraeus crypticus, Folsomia candida, Oppia nitens and Hypoaspis aculeifer) were conducted with a refined PHC binary mixture. Reference models for concentration addition and independent action were applied to the mixture toxicity data with consideration of synergism, antagonism and dose level toxicity. Both concentration addition and independent action, without further interactions, provided the best fit with observed response to the mixture. Individual fraction effective concentration values were predicted from optimized, fitted reference models. Concentration addition provided a better estimate than independent action of individual fraction effective concentrations based on comparison with available literature and species trends observed in toxic responses to the mixture. Interspecies differences in standardized laboratory soil invertebrate species responses to PHC contaminated soil was reflected in unique traits. Diets that included soil, large body size, permeable cuticle, low lipid content, lack of ability to molt and no maternal transfer were traits linked to a sensitive survival response to PHC contaminated soil in laboratory tests. Traits linked to sensitive reproduction response in organisms tested were long life spans with small clutch sizes. By deriving single fraction toxicity endpoints considerate of mixtures, we reduce resources and time required in conducting site specific risk assessments for the protection of soil organism's exposure pathway. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Chapman, Peter J; Vogt, Frank; Dutta, Pampa; Datskos, Panos G; Devault, Gerald L; Sepaniak, Michael J
2007-01-01
The very simple coupling of a standard, packed-column gas chromatograph with a microcantilever array (MCA) is demonstrated for enhanced selectivity and potential analyte identification in the analysis of volatile organic compounds (VOCs). The cantilevers in MCAs are differentially coated on one side with responsive phases (RPs) and produce bending responses of the cantilevers due to analyte-induced surface stresses. Generally, individual components are difficult to elucidate when introduced to MCA systems as mixtures, although pattern recognition techniques are helpful in identifying single components, binary mixtures, or composite responses of distinct mixtures (e.g., fragrances). In the present work, simple test VOC mixtures composed of acetone, ethanol, and trichloroethylene (TCE) in pentane and methanol and acetonitrile in pentane are first separated using a standard gas chromatograph and then introduced into a MCA flow cell. Significant amounts of response diversity to the analytes in the mixtures are demonstrated across the RP-coated cantilevers of the array. Principal component analysis is used to demonstrate that only three components of a four-component VOC mixture could be identified without mixture separation. Calibration studies are performed, demonstrating a good linear response over 2 orders of magnitude for each component in the primary study mixture. Studies of operational parameters including column temperature, column flow rate, and array cell temperature are conducted. Reproducibility studies of VOC peak areas and peak heights are also carried out showing RSDs of less than 4 and 3%, respectively, for intra-assay studies. Of practical significance is the facile manner by which the hyphenation of a mature separation technique and the burgeoning sensing approach is accomplished, and the potential to use pattern recognition techniques with MCAs as a new type of detector for chromatography with analyte-identifying capabilities.
Looking Closer at the Effects of Framing on Risky Choice: An Item Response Theory Analysis.
Sickar; Highhouse
1998-07-01
Item response theory (IRT) methodology allowed an in-depth examination of several issues that would be difficult to explore using traditional methodology. IRT models were estimated for 4 risky-choice items, answered by students under either a gain or loss frame. Results supported the typical framing finding of risk-aversion for gains and risk-seeking for losses but also suggested that a latent construct we label preference for risk was influential in predicting risky choice. Also, the Asian Disease item, most often used in framing research, was found to have anomalous statistical properties when compared to other framing items. Copyright 1998 Academic Press.
2012-01-01
Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Optimal Linking Design for Response Model Parameters
ERIC Educational Resources Information Center
Barrett, Michelle D.; van der Linden, Wim J.
2017-01-01
Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…
Conceptualization and measurement of celebrity worship.
McCutcheon, Lynn E; Lange, Rense; Houran, James
2002-02-01
Celebrity worship has been conceptualized as having pathological and nonpathological forms. To avoid problems associated with item-level factor analysis, 'top-down purification' was used to test the validity of this conceptualization. The respondents (N = 249) completed items modelled after existing celebrity worship questionnaires. A subset of 17 unidimensional and Rasch scalable items was discovered (the local reliability ranged from.71 to.96), which showed no biases related to age and gender. This subset was dubbed the Celebrity Worship Scale (CWS). The items also showed no celebrity bias, indicating that CWS applies equally to acting, music, sports, and 'other' celebrities. The Rasch nature of the items defines celebrity worship as consisting of three qualitatively different stages. Low worship involves individualistic behaviours such as watching and reading about a celebrity. At slightly higher levels, celebrity worship takes on a social character. Lastly, the highest levels are characterized by a mixture of empathy with the celebrity's successes and failures, over-identification with the celebrity, compulsive behaviours, as well as obsession with details of the celebrity's life. Based on these findings, the authors propose a model of celebrity worship based on psychological absorption (leading to delusions of actual relationships with celebrities) and addiction (fostering the need for progressively stronger involvement to feel connected with the celebrity).
A Conditional Joint Modeling Approach for Locally Dependent Item Responses and Response Times
ERIC Educational Resources Information Center
Meng, Xiang-Bin; Tao, Jian; Chang, Hua-Hua
2015-01-01
The assumption of conditional independence between the responses and the response times (RTs) for a given person is common in RT modeling. However, when the speed of a test taker is not constant, this assumption will be violated. In this article we propose a conditional joint model for item responses and RTs, which incorporates a covariance…
The Psychometric Properties of Classroom Response System Data: A Case Study
NASA Astrophysics Data System (ADS)
Kortemeyer, Gerd
2016-08-01
Classroom response systems (often referred to as "clickers") have slowly gained adoption over the recent decade; however, critics frequently doubt their pedagogical value starting with the validity of the gathered responses: There is concern that students simply "click" random answers. This case study looks at different measures of response reliability, starting from a global look at correlations between formative clicker responses and summative examination performance to how clicker questions are used in context. It was found that clicker performance is a moderate indicator of course performance as a whole, and that while the psychometric properties of clicker items are more erratic than those of examination data, they still have acceptable internal consistency and include items with high discrimination. It was also found that clicker responses and item properties do provide highly meaningful feedback within a lecture context, i.e., when their position and function within lecture sessions are taken into consideration. Within this framework, conceptual questions provide measurably more meaningful feedback than items that require calculations.
ERIC Educational Resources Information Center
Turner, Brandon M.; Betz, Nancy E.; Edwards, Michael C.; Borgen, Fred H.
2010-01-01
The psychometric properties of measures of self-efficacy for the six themes of Holland's theory were examined using item response theory. Item and scale quality were compared across levels of the trait continuum; all the scales were highly reliable but differentiated better at some levels of the continuum than others. Applications for adaptive…
ERIC Educational Resources Information Center
Sass, D. A.; Schmitt, T. A.; Walker, C. M.
2008-01-01
Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…
ERIC Educational Resources Information Center
Magno, Carlo
2009-01-01
The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
ERIC Educational Resources Information Center
Marcoulides, Katerina M.
2018-01-01
This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
ERIC Educational Resources Information Center
Li, Ying; Jiao, Hong; Lissitz, Robert W.
2012-01-01
This study investigated the application of multidimensional item response theory (IRT) models to validate test structure and dimensionality. Multiple content areas or domains within a single subject often exist in large-scale achievement tests. Such areas or domains may cause multidimensionality or local item dependence, which both violate the…
An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models
ERIC Educational Resources Information Center
Ames, Allison J.; Penfield, Randall D.
2015-01-01
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing…
ERIC Educational Resources Information Center
Kohli, Nidhi; Koran, Jennifer; Henn, Lisa
2015-01-01
There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
ERIC Educational Resources Information Center
Finch, Holmes
2011-01-01
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
ERIC Educational Resources Information Center
Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.
2017-01-01
Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
ERIC Educational Resources Information Center
Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun
2002-01-01
Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)
ERIC Educational Resources Information Center
Kim, Jee-Seon; Bolt, Daniel M.
2007-01-01
The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…
ERIC Educational Resources Information Center
Yamamoto, Kentaro; He, Qiwei; Shin, Hyo Jeong; von Davier, Mattias
2017-01-01
Approximately a third of the Programme for International Student Assessment (PISA) items in the core domains (math, reading, and science) are constructed-response items and require human coding (scoring). This process is time-consuming, expensive, and prone to error as often (a) humans code inconsistently, and (b) coding reliability in…
ERIC Educational Resources Information Center
Wei, Youhua; Thompson, Bruce; Cook, C. Colleen
2005-01-01
LibQUAL+[TM] data to date have not been subjected to the modern measurement theory called polytomous item response theory (IRT). The data interpreted here were collected from 42,090 participants who completed the "American English" version of the 22 core LibQUAL+[TM] items, and 12,552 participants from Australia and Europe who…
ERIC Educational Resources Information Center
Beevers, Christopher G.; Strong, David R.; Meyer, Bjorn; Pilkonis, Paul A.; Miller, Ivan R.
2007-01-01
Despite a central role for dysfunctional attitudes in cognitive theories of depression and the widespread use of the Dysfunctional Attitude Scale, form A (DAS-A; A. Weissman, 1979), the psychometric development of the DAS-A has been relatively limited. The authors used nonparametric item response theory methods to examine the DAS-A items and…
ERIC Educational Resources Information Center
Monroe, Scott; Cai, Li
2013-01-01
In Ramsay curve item response theory (RC-IRT, Woods & Thissen, 2006) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's (1981) EM algorithm, which yields maximum marginal likelihood estimates. This method, however,…
ERIC Educational Resources Information Center
Monroe, Scott; Cai, Li
2014-01-01
In Ramsay curve item response theory (RC-IRT) modeling, the shape of the latent trait distribution is estimated simultaneously with the item parameters. In its original implementation, RC-IRT is estimated via Bock and Aitkin's EM algorithm, which yields maximum marginal likelihood estimates. This method, however, does not produce the…
Can Item Keyword Feedback Help Remediate Knowledge Gaps?
Feinberg, Richard A.; Clauser, Amanda L.
2016-01-01
ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D
2014-05-01
The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Gust, M; Fortier, M; Garric, J; Fournier, M; Gagné, F
2013-02-15
Pharmaceuticals are pollutants of potential concern in the aquatic environment where they are commonly introduced as complex mixtures via municipal effluents. Many reports underline the effects of pharmaceuticals on immune system of non target species. Four drug mixtures were tested, and regrouped pharmaceuticals by main therapeutic use: psychiatric (venlafaxine, carbamazepine, diazepam), antibiotic (ciprofloxacine, erythromycin, novobiocin, oxytetracycline, sulfamethoxazole, trimethoprim), hypolipemic (atorvastatin, gemfibrozil, benzafibrate) and antihypertensive (atenolol, furosemide, hydrochlorothiazide, lisinopril). Their effects were then compared with a treated municipal effluent known for its contamination, and its effects on the immune response of Lymnaea stagnalis. Adult L. stagnalis were exposed for 3 days to an environmentally relevant concentration of the four mixtures individually and as a global mixture. Effects on immunocompetence (hemocyte viability and count, ROS and thiol levels, phagocytosis) and gene expression were related to the immune response and oxidative stress: catalase (CAT), superoxide dismutase (SOD), glutathione reductase (GR), Selenium-dependent glutathione peroxidase (SeGPx), two isoforms of the nitric oxide synthetase gene (NOS1 and NOS2), molluscan defensive molecule (MDM), Toll-like receptor 4 (TLR4), allograft inflammatory factor-1 (AIF) and heat-shock protein 70 (HSP70). Immunocompetence was differently affected by the therapeutic class mixtures compared to the global mixture, which increased hemocyte count, ROS levels and phagocytosis, and decreased intracellular thiol levels. TLR4 gene expression was the most strongly increased, especially by psychiatric mixture (19-fold), while AIF-1, GR and CAT genes were downregulated. A decision tree analysis revealed that the immunotoxic responses caused by the municipal effluent were comparable to those obtained with the global pharmaceutical mixture, and the latter shared similarity with the antibiotic mixture. This suggests that pharmaceutical mixtures in municipal effluents represent a risk for gastropods at the immunocompetence levels and the antibiotic group could represent a model therapeutic class for municipal effluent toxicity studies in L. stagnalis. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Oliver, Joseph Steve; Hodges, Georgia W.; Moore, James N.; Cohen, Allan; Jang, Yoonsun; Brown, Scott A.; Kwon, Kyung A.; Jeong, Sophia; Raven, Sara P.; Jurkiewicz, Melissa; Robertson, Tom P.
2017-11-01
Research into the efficacy of modules featuring dynamic visualizations, case studies, and interactive learning environments is reported here. This quasi-experimental 2-year study examined the implementation of three interactive computer-based instructional modules within a curricular unit covering cellular biology concepts in an introductory high school biology course. The modules featured dynamic visualizations and focused on three processes that underlie much of cellular biology: diffusion, osmosis, and filtration. Pre-tests and post-tests were used to assess knowledge growth across the unit. A mixture Rasch model analysis of the post-test data revealed two groups of students. In both years of the study, a large proportion of the students were classified as low-achieving based on their pre-test scores. The use of the modules in the Cell Unit in year 2 was associated with a much larger proportion of the students having transitioned to the high-achieving group than in year 1. In year 2, the same teachers taught the same concepts as year 1 but incorporated the interactive computer-based modules into the cell biology unit of the curriculum. In year 2, 67% of students initially classified as low-achieving were classified as high-achieving at the end of the unit. Examination of responses to assessments embedded within the modules as well as post-test items linked transition to the high-achieving group with correct responses to items that both referenced the visualization and the contextualization of that visualization within the module. This study points to the importance of dynamic visualization within contextualized case studies as a means to support student knowledge acquisition in biology.
Dube, Blaire; Emrich, Stephen M; Al-Aidroos, Naseem
2017-10-01
Across 2 experiments we revisited the filter account of how feature-based attention regulates visual working memory (VWM). Originally drawing from discrete-capacity ("slot") models, the filter account proposes that attention operates like the "bouncer in the brain," preventing distracting information from being encoded so that VWM resources are reserved for relevant information. Given recent challenges to the assumptions of discrete-capacity models, we investigated whether feature-based attention plays a broader role in regulating memory. Both experiments used partial report tasks in which participants memorized the colors of circle and square stimuli, and we provided a feature-based goal by manipulating the likelihood that 1 shape would be probed over the other across a range of probabilities. By decomposing participants' responses using mixture and variable-precision models, we estimated the contributions of guesses, nontarget responses, and imprecise memory representations to their errors. Consistent with the filter account, participants were less likely to guess when the probed memory item matched the feature-based goal. Interestingly, this effect varied with goal strength, even across high probabilities where goal-matching information should always be prioritized, demonstrating strategic control over filter strength. Beyond this effect of attention on which stimuli were encoded, we also observed effects on how they were encoded: Estimates of both memory precision and nontarget errors varied continuously with feature-based attention. The results offer support for an extension to the filter account, where feature-based attention dynamically regulates the distribution of resources within working memory so that the most relevant items are encoded with the greatest precision. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Real and Artificial Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2015-01-01
Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model
Zheng, Yi
2016-01-01
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 14 through 26 include: (14) calculating an average; (15) identifying parts of the scientific method; (16) reading a geological map; (17) identifying elements, mixtures and compounds; (18) using Ohm's law in calculation; (19) interpreting…
Code of Federal Regulations, 2011 CFR
2011-01-01
... come into direct contact with uranium metal vapor or liquid or with process gas consisting of UF6 or a mixture of UF6 and other gases: (1) Uranium vaporization systems (AVLIS). Especially designed or prepared... laser-based enrichment items, the materials resistant to corrosion by the vapor or liquid of uranium...
Code of Federal Regulations, 2014 CFR
2014-01-01
... come into direct contact with uranium metal vapor or liquid or with process gas consisting of UF6 or a mixture of UF6 and other gases: (1) Uranium vaporization systems (AVLIS). Especially designed or prepared... laser-based enrichment items, the materials resistant to corrosion by the vapor or liquid of uranium...
Code of Federal Regulations, 2013 CFR
2013-01-01
... come into direct contact with uranium metal vapor or liquid or with process gas consisting of UF6 or a mixture of UF6 and other gases: (1) Uranium vaporization systems (AVLIS). Especially designed or prepared... laser-based enrichment items, the materials resistant to corrosion by the vapor or liquid of uranium...
Code of Federal Regulations, 2012 CFR
2012-01-01
... come into direct contact with uranium metal vapor or liquid or with process gas consisting of UF6 or a mixture of UF6 and other gases: (1) Uranium vaporization systems (AVLIS). Especially designed or prepared... laser-based enrichment items, the materials resistant to corrosion by the vapor or liquid of uranium...
Code of Federal Regulations, 2010 CFR
2010-01-01
... come into direct contact with uranium metal vapor or liquid or with process gas consisting of UF6 or a mixture of UF6 and other gases: (1) Uranium vaporization systems (AVLIS). Especially designed or prepared... laser-based enrichment items, the materials resistant to corrosion by the vapor or liquid of uranium...
Weaker Ligands Can Dominate an Odor Blend due to Syntopic Interactions
2013-01-01
Most odors in natural environments are mixtures of several compounds. Perceptually, these can blend into a new “perfume,” or some components may dominate as elements of the mixture. In order to understand such mixture interactions, it is necessary to study the events at the olfactory periphery, down to the level of single-odorant receptor cells. Does a strong ligand present at a low concentration outweigh the effect of weak ligands present at high concentrations? We used the fruit fly receptor dOr22a and a banana-like odor mixture as a model system. We show that an intermediate ligand at an intermediate concentration alone elicits the neuron’s blend response, despite the presence of both weaker ligands at higher concentration, and of better ligands at lower concentration in the mixture. Because all of these components, when given alone, elicited significant responses, this reveals specific mixture processing already at the periphery. By measuring complete dose–response curves we show that these mixture effects can be fully explained by a model of syntopic interaction at a single-receptor binding site. Our data have important implications for how odor mixtures are processed in general, and what preprocessing occurs before the information reaches the brain. PMID:23315042
Larval aquatic insect responses to cadmium and zinc in experimental streams
Mebane, Christopher A.; Schmidt, Travis S.; Balistrieri, Laurie S.
2017-01-01
To evaluate the risks of metal mixture effects to natural stream communities under ecologically relevant conditions, the authors conducted 30-d tests with benthic macroinvertebrates exposed to cadmium (Cd) and zinc (Zn) in experimental streams. The simultaneous exposures were with Cd and Zn singly and with Cd+Zn mixtures at environmentally relevant ratios. The tests produced concentration–response patterns that for individual taxa were interpreted in the same manner as classic single-species toxicity tests and for community metrics such as taxa richness and mayfly (Ephemeroptera) abundance were interpreted in the same manner as with stream survey data. Effect concentrations from the experimental stream exposures were usually 2 to 3 orders of magnitude lower than those from classic single-species tests. Relative to a response addition model, which assumes that the joint toxicity of the mixtures can be predicted from the product of their responses to individual toxicants, the Cd+Zn mixtures generally showed slightly less than additive toxicity. The authors applied a modeling approach called Tox to explore the mixture toxicity results and to relate the experimental stream results to field data. The approach predicts the accumulation of toxicants (hydrogen, Cd, and Zn) on organisms using a 2-pKa bidentate model that defines interactions between dissolved cations and biological receptors (biotic ligands) and relates that accumulation through a logistic equation to biological response. The Tox modeling was able to predict Cd+Zn mixture responses from the single-metal exposures as well as responses from field data. The similarity of response patterns between the 30-d experimental stream tests and field data supports the environmental relevance of testing aquatic insects in experimental streams.
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument
Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.
2012-01-01
Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
Petrillo, Jennifer; Bressler, Neil M; Lamoureux, Ecosse; Ferreira, Alberto; Cano, Stefan
2017-08-14
The NEI VFQ-25 has undergone psychometric evaluation in patients with varying ocular conditions and the general population. However, important limitations which may affect the interpretation of clinical trial results have been previously identified, such as concerns with reliability and validity. The purpose of this study was to evaluate the National Eye Institute Visual Functioning Questionnaire (NEI VFQ-25) and make recommendations for a revised scoring structure, with a view to improving its psychometric performance and interpretability. Rasch Measurement Theory analyses were conducted in two stages using pooled baseline NEI VFQ-25 data for 2487 participants with retinal diseases enrolled in six clinical trials. In stage 1, we examined: scale-to-sample targeting; thresholds for item response options; item fit statistics; stability; local dependence; and reliability. In stage 2, a post-hoc revision of the scoring structure (VFQ-28R) was created and psychometrically re-evaluated. In stage 1, we found that the NEI VFQ-25 was mis-targeted to the sample, and had disordered response thresholds (15/25 items) and mis-fitting items (8/25 items). However, items appeared to be stable (differential item functioning for three items), have minimal item dependency (one pair of items) and good reliability (person-separation index, 0.93). In stage 2, the modified Rasch-scored NEI VFQ-28-R was assessed. It comprised two broad domains: Activity Limitation (19 items) and Socio-Emotional Functioning (nine items). The NEI VFQ-28-R demonstrated improved performance with fewer disordered response thresholds (no items), less item misfit (three items) and improved population targeting (reduced ceiling effect) compared with the NEI VFQ-25. Compared with the original version, the proposed NEI VFQ-28-R, with Rasch-based scoring and a two-domain structure, appears to offer improved psychometric performance and interpretability of the vision-related quality of life scale for the population analysed.
Chansirinukor, Wunpen; Maher, Christopher G; Latimer, Jane; Hush, Julia
2005-01-01
Retrospective design. To compare the responsiveness and test-retest reliability of the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire in detecting change in disability in patients with work-related low back pain. Many low back pain-specific disability questionnaires are available, including the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire. No previous study has compared the responsiveness and reliability of these questionnaires. Files of patients who had been treated for work-related low back pain at a physical therapy clinic were reviewed, and those containing initial and follow-up Functional Rating Index and 18-item Roland-Morris Disability Questionnaires were selected. The responsiveness of both questionnaires was compared using two different methods. First, using the assumption that patients receiving treatment improve over time, various responsiveness coefficients were calculated. Second, using change in work status as an external criterion to identify improved and nonimproved patients, Spearman's rho and receiver operating characteristic curves were calculated. Reliability was estimated from the subset of patients who reported no change in their condition over this period and expressed with the intraclass correlation coefficient and the minimal detectable change. One hundred and forty-three patient files were retrieved. The responsiveness coefficients for the Functional Rating Index were greater than for the 18-item Roland-Morris Disability Questionnaire. The intraclass correlation coefficient values for both questionnaires calculated from 96 patient files were similar, but the minimal detectable change for the Functional Rating Index was less than for the 18-item Roland-Morris Disability Questionnaire. The Functional Rating Index seems preferable to the 18-item Roland-Morris Disability Questionnaire for use in clinical trials and clinical practice.
Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar
2015-01-01
Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W
2015-05-01
To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.
2015-01-01
Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
Estimating the Nominal Response Model under Nonnormal Conditions
ERIC Educational Resources Information Center
Preston, Kathleen Suzanne Johnson; Reise, Steven Paul
2014-01-01
The nominal response model (NRM), a much understudied polytomous item response theory (IRT) model, provides researchers the unique opportunity to evaluate within-item category distinctions. Polytomous IRT models, such as the NRM, are frequently applied to psychological assessments representing constructs that are unlikely to be normally…
Lu, Cailing; Svoboda, Kurt R; Lenz, Kade A; Pattison, Claire; Ma, Hongbo
2018-06-01
Manganese (Mn) is considered as an emerging metal contaminant in the environment. However, its potential interactions with companying toxic metals and the associated mixture effects are largely unknown. Here, we investigated the toxicity interactions between Mn and two commonly seen co-occurring toxic metals, Pb and Cd, in a model organism the nematode Caenorhabditis elegans. The acute lethal toxicity of mixtures of Mn+Pb and Mn+Cd were first assessed using a toxic unit model. Multiple toxicity endpoints including reproduction, lifespan, stress response, and neurotoxicity were then examined to evaluate the mixture effects at sublethal concentrations. Stress response was assessed using a daf-16::GFP transgenic strain that expresses GFP under the control of DAF-16 promotor. Neurotoxicity was assessed using a dat-1::GFP transgenic strain that expresses GFP in dopaminergic neurons. The mixture of Mn+Pb induced a more-than-additive (synergistic) lethal toxicity in the worm whereas the mixture of Mn+Cd induced a less-than-additive (antagonistic) toxicity. Mixture effects on sublethal toxicity showed more complex patterns and were dependent on the toxicity endpoints as well as the modes of toxic action of the metals. The mixture of Mn+Pb induced additive effects on both reproduction and lifespan, whereas the mixture of Mn+Cd induced additive effects on lifespan but not reproduction. Both mixtures seemed to induce additive effects on stress response and neurotoxicity, although a quantitative assessment was not possible due to the single concentrations used in mixture tests. Our findings demonstrate the complexity of metal interactions and the associated mixture effects. Assessment of metal mixture toxicity should take into consideration the unique property of individual metals, their potential toxicity mechanisms, and the toxicity endpoints examined.
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.
Shou, Yiyun; Sellbom, Martin; Xu, Jing
2018-05-01
There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Concise evaluation of decision aids.
Stalmeier, Peep F M; Roosmalen, Marielle S
2009-01-01
Decision aids purport to help patients make treatment related choices. Several instruments exist to evaluate decision aids. Our aim is to compare the responsiveness of several instruments. Two different decision aids were randomized in patients at high risk for breast and ovarian cancer. Treatment choices were between prophylactic surgery and screening. Effect sizes were calculated to compare the responsiveness of the measures. One decision aid was randomized in 390 women, the other in 91 ensuing mutation carriers. Three factors were identified related to Information, Well-being and Decision Making. Within each factor, single item measures were as responsive as multi-item measures. Four single items, 'the amount of information received for decision making,' 'strength of preference,' 'I weighed the pros and cons,' and 'General Health,' were adequately responsive to the decision aids. These items might be considered for inclusion in questionnaires to evaluate decision aids.
Paige, Samantha R; Krieger, Janice L; Stellefson, Michael; Alber, Julia M
2017-02-01
Chronic disease patients are affected by low computer and health literacy, which negatively affects their ability to benefit from access to online health information. To estimate reliability and confirm model specifications for eHealth Literacy Scale (eHEALS) scores among chronic disease patients using Classical Test (CTT) and Item Response Theory techniques. A stratified sample of Black/African American (N=341) and Caucasian (N=343) adults with chronic disease completed an online survey including the eHEALS. Item discrimination was explored using bi-variate correlations and Cronbach's alpha for internal consistency. A categorical confirmatory factor analysis tested a one-factor structure of eHEALS scores. Item characteristic curves, in-fit/outfit statistics, omega coefficient, and item reliability and separation estimates were computed. A 1-factor structure of eHEALS was confirmed by statistically significant standardized item loadings, acceptable model fit indices (CFI/TLI>0.90), and 70% variance explained by the model. Item response categories increased with higher theta levels, and there was evidence of acceptable reliability (ω=0.94; item reliability=89; item separation=8.54). eHEALS scores are a valid and reliable measure of self-reported eHealth literacy among Internet-using chronic disease patients. Providers can use eHEALS to help identify patients' eHealth literacy skills. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Waller, Niels G; Feuerstahler, Leah
2017-01-01
In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).
Moser, V C; Casey, M; Hamm, A; Carter, W H; Simmons, J E; Gennings, C
2005-07-01
Environmental exposures generally involve chemical mixtures instead of single chemicals. Statistical models such as the fixed-ratio ray design, wherein the mixing ratio (proportions) of the chemicals is fixed across increasing mixture doses, allows for the detection and characterization of interactions among the chemicals. In this study, we tested for interaction(s) in a mixture of five organophosphorus (OP) pesticides (chlorpyrifos, diazinon, dimethoate, acephate, and malathion). The ratio of the five pesticides (full ray) reflected the relative dietary exposure estimates of the general population as projected by the US EPA Dietary Exposure Evaluation Model (DEEM). A second mixture was tested using the same dose levels of all pesticides, but excluding malathion (reduced ray). The experimental approach first required characterization of dose-response curves for the individual OPs to build a dose-additivity model. A series of behavioral measures were evaluated in adult male Long-Evans rats at the time of peak effect following a single oral dose, and then tissues were collected for measurement of cholinesterase (ChE) activity. Neurochemical (blood and brain cholinesterase [ChE] activity) and behavioral (motor activity, gait score, tail-pinch response score) endpoints were evaluated statistically for evidence of additivity. The additivity model constructed from the single chemical data was used to predict the effects of the pesticide mixture along the full ray (10-450 mg/kg) and the reduced ray (1.75-78.8 mg/kg). The experimental mixture data were also modeled and statistically compared to the additivity models. Analysis of the 5-OP mixture (the full ray) revealed significant deviation from additivity for all endpoints except tail-pinch response. Greater-than-additive responses (synergism) were observed at the lower doses of the 5-OP mixture, which contained non-effective dose levels of each of the components. The predicted effective doses (ED20, ED50) were about half that predicted by additivity, and for brain ChE and motor activity, there was a threshold shift in the dose-response curves. For the brain ChE and motor activity, there was no difference between the full (5-OP mixture) and reduced (4-OP mixture) rays, indicating that malathion did not influence the non-additivity. While the reduced ray for blood ChE showed greater deviation from additivity without malathion in the mixture, the non-additivity observed for the gait score was reversed when malathion was removed. Thus, greater-than-additive interactions were detected for both the full and reduced ray mixtures, and the role of malathion in the interactions varied depending on the endpoint. In all cases, the deviations from additivity occurred at the lower end of the dose-response curves.
Response Times to Gustatory-Olfactory Flavor Mixtures: Role of Congruence.
Shepard, Timothy G; Veldhuizen, Maria G; Marks, Lawrence E
2015-10-01
A mixture of perceptually congruent gustatory and olfactory flavorants (sucrose and citral) was previously shown to be detected faster than predicted by a model of probability summation that assumes stochastically independent processing of the individual gustatory and olfactory signals. This outcome suggests substantial integration of the signals. Does substantial integration also characterize responses to mixtures of incongruent flavorants? Here, we report simple response times (RTs) to detect brief pulses of 3 possible flavorants: monosodium glutamate, MSG (gustatory: "umami" quality), citral (olfactory: citrus quality), and a mixture of MSG and citral (gustatory-olfactory). Each stimulus (and, on a fraction of trials, water) was presented orally through a computer-operated, automated flow system, and subjects were instructed to press a button as soon as they detected any of the 3 non-water stimuli. Unlike responses previously found to the congruent mixture of sucrose and citral, responses here to the incongruent mixture of MSG and citral took significantly longer (RTs were greater) and showed lower detection rates than the values predicted by probability summation. This outcome suggests that the integration of gustatory and olfactory flavor signals is less extensive when the component flavors are perceptually incongruent rather than congruent, perhaps because incongruent flavors are less familiar. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Validity of Multiprocess IRT Models for Separating Content and Response Styles
ERIC Educational Resources Information Center
Plieninger, Hansjörg; Meiser, Thorsten
2014-01-01
Response styles, the tendency to respond to Likert-type items irrespective of content, are a widely known threat to the reliability and validity of self-report measures. However, it is still debated how to measure and control for response styles such as extreme responding. Recently, multiprocess item response theory models have been proposed that…
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.
Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl
2018-02-01
The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Wilks, Rainford; Younger, Novie; Mullings, Jasneth; Zohoori, Namvar; Figueroa, Peter; Tulloch-Reid, Marshall; Ferguson, Trevor; Walters, Christine; Bennett, Franklyn; Forrester, Terrence; Ward, Elizabeth; Ashley, Deanna
2007-02-28
Health surveys provide important information on the burden and secular trends of risk factors and disease. Several factors including survey and item non-response can affect data quality. There are few reports on efficiency, validity and the impact of item non-response, from developing countries. This report examines factors associated with item non-response and study efficiency in a national health survey in a developing Caribbean island. A national sample of participants aged 15-74 years was selected in a multi-stage sampling design accounting for 4 health regions and 14 parishes using enumeration districts as primary sampling units. Means and proportions of the variables of interest were compared between various categories. Non-response was defined as failure to provide an analyzable response. Linear and logistic regression models accounting for sample design and post-stratification weighting were used to identify independent correlates of recruitment efficiency and item non-response. We recruited 2012 15-74 year-olds (66.2% females) at a response rate of 87.6% with significant variation between regions (80.9% to 97.6%; p < 0.0001). Females outnumbered males in all parishes. The majority of subjects were recruited in a single visit, 39.1% required multiple visits varying significantly by region (27.0% to 49.8% [p < 0.0001]). Average interview time was 44.3 minutes with no variation between health regions, urban-rural residence, educational level, gender and SES; but increased significantly with older age category from 42.9 minutes in the youngest to 46.0 minutes in the oldest age category. Between 15.8% and 26.8% of persons did not provide responses for the number of sexual partners in the last year. Women and urban residents provided less data than their counterparts. Highest item non-response related to income at 30% with no gender difference but independently related to educational level, employment status, age group and health region. Characteristics of non-responders vary with types of questions. Informative health surveys are possible in developing countries. While survey response rates may be satisfactory, item non-response was high in respect of income and sexual practice. In contrast to developed countries, non-response to questions on income is higher and has different correlates. These findings can inform future surveys.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
The Effects of Response Option Order and Question Order on Self-Rated Health
Garbarski, Dana; Schaeffer, Nora Cate; Dykema, Jennifer
2014-01-01
Objectives This study aims to assess the impact of response option order and question order on the distribution of responses to the self-rated health (SRH) question and the relationship between SRH and other health-related measures. Methods In an online panel survey, we implement a 2-by-2 between-subjects factorial experiment, manipulating the following levels of each factor: 1) order of response options (“excellent” to “poor” versus “poor” to “excellent”); and 2) order of SRH item (either preceding or following the administration of domain-specific health items). We use chi-square difference tests, polychoric correlations, and differences in means and proportions to evaluate the effect of the experimental treatments on SRH responses and the relationship between SRH and other health measures. Results Mean SRH is higher (better health) and proportion in “fair” or “poor” health lower when response options are ordered from “excellent” to “poor” and SRH is presented first compared to other experimental treatments. Presenting SRH after domain-specific health items increases its correlation with these items, particularly when response options are ordered “excellent” to “poor.” Among participants with the highest level of current health risks, SRH is worse when it is presented last versus first. Conclusion While more research on the presentation of SRH is needed across a range of surveys, we suggest that ordering response options from “poor” to “excellent” might reduce positive clustering. Given the question order effects found here, we suggest presenting SRH before domain-specific health items in order to increase inter-survey comparability, as domain-specific health items will vary across surveys. PMID:25409654
Davenport, Tracey A; Burns, Jane M; Hickie, Ian B
2017-01-01
Background Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). Objective The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item’s susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. Methods A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. Results When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression analysis showed respondents who were male (beta=−.19, P=.048) or who were not in employment, education, or training (NEET; beta=−.32, P=.001) were significantly more likely to provide different responses on matched items when responding in the CATI as compared with the Web-based self-report survey. The Web-based self-report survey, however, demonstrated some evidence of avidity and attrition bias. Conclusions Compared with CATI, Web-based self-report surveys are highly cost-effective and had higher rates of self-disclosure on sensitive items, particularly for respondents who identify as male and NEET. A drawback to Web-based surveying methodologies, however, includes the limited control over avidity bias and the greater incidence of attrition bias. These findings have important implications for further development of survey methods in the area of health and well-being, especially when considering research topics (in this case diagnosis, suicidal ideation, sexting, and body image) and groups that are being recruited (young people, males, and NEET). PMID:28951382
Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations
NASA Astrophysics Data System (ADS)
Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly
2014-09-01
A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.
NASA Astrophysics Data System (ADS)
Rahmani, B. D.
2018-01-01
The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Pilkonis, Paul A; Yu, Lan; Dodds, Nathan E; Johnston, Kelly L; Lawrence, Suzanne M; Hilton, Thomas F; Daley, Dennis C; Patkar, Ashwin A; McCarty, Dennis
2017-08-01
There is a need to monitor patients receiving prescription opioids to detect possible signs of abuse. To address this need, we developed and calibrated an item bank for severity of abuse of prescription pain medication as part of the Patient-Reported Outcomes Measurement Information System (PROMIS ® ). Comprehensive literature searches yielded an initial bank of 5,310 items relevant to substance use and abuse, including abuse of prescription pain medication, from over 80 unique instruments. After qualitative item analysis (i.e., focus groups, cognitive interviewing, expert review, and item revision), 25 items for abuse of prescribed pain medication were included in field testing. Items were written in a first-person, past-tense format, with a three-month time frame and five response options reflecting frequency or severity. The calibration sample included 448 respondents, 367 from the general population (ascertained through an internet panel) and 81 from community treatment programs participating in the National Drug Abuse Treatment Clinical Trials Network. A final bank of 22 items was calibrated using the two-parameter graded response model from item response theory. A seven-item static short form was also developed. The test information curve showed that the PROMIS ® item bank for abuse of prescription pain medication provided substantial information in a broad range of severity. The initial psychometric characteristics of the item bank support its use as a computerized adaptive test or short form, with either version providing a brief, precise, and efficient measure relevant to both clinical and community samples. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Huang, Wei Ying; Liu, Fei; Liu, Shu Shen; Ge, Hui Lin; Chen, Hong Han
2011-09-01
The predictions of mixture toxicity for chemicals are commonly based on two models: concentration addition (CA) and independent action (IA). Whether the CA and IA can predict mixture toxicity of phenolic compounds with similar and dissimilar action mechanisms was studied. The mixture toxicity was predicted on the basis of the concentration-response data of individual compounds. Test mixtures at different concentration ratios and concentration levels were designed using two methods. The results showed that the Weibull function fit well with the concentration-response data of all the components and their mixtures, with all relative coefficients (Rs) greater than 0.99 and root mean squared errors (RMSEs) less than 0.04. The predicted values from CA and IA models conformed to observed values of the mixtures. Therefore, it can be concluded that both CA and IA can predict reliable results for the mixture toxicity of the phenolic compounds with similar and dissimilar action mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Lu, Yi
2012-01-01
Cross-national comparisons of responses to survey items are often affected by response style, particularly extreme response style (ERS). ERS varies across cultures, and has the potential to bias inferences in cross-national comparisons. For example, in both PISA and TIMSS assessments, it has been documented that when examined within countries,…
ERIC Educational Resources Information Center
Haberman, Shelby J.; von Davier, Matthias; Lee, Yi-Hsuan
2008-01-01
Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to…
ERIC Educational Resources Information Center
Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.
2010-01-01
Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
ERIC Educational Resources Information Center
Jackson, Allen W.; Morrow, James R., Jr.; Bowles, Heather R.; FitzGerald, Shannon J.; Blair, Steven N.
2007-01-01
Valid measurement of physical activity is important for studying the risks for morbidity and mortality. The purpose of this study was to examine evidence of construct validity of two similar single-response items assessing physical activity via self-report. Both items are based on the stages of change model. The sample was 687 participants (men =…
A Combined IRT and SEM Approach for Individual-Level Assessment in Test-Retest Studies
ERIC Educational Resources Information Center
Ferrando, Pere J.
2015-01-01
The standard two-wave multiple-indicator model (2WMIM) commonly used to analyze test-retest data provides information at both the group and item level. Furthermore, when applied to binary and graded item responses, it is related to well-known item response theory (IRT) models. In this article the IRT-2WMIM relations are used to obtain additional…
On an Extension of the Rasch Model to the Case of Polychotomously Scored Items.
ERIC Educational Resources Information Center
Vogt, Dorothee K.
The Rasch model for the probability of a person's response to an item is extended to the case where this response depends on a set of scoring or category weights, in addition to person and item parameters. The maximum likelihood approach introduced by Wright for the dichotomous case is applicable here also, and it is shown to yield a unique…
ERIC Educational Resources Information Center
de Bruijn, Ellen R. A.; Dijkstra, Ton; Chwilla, Dorothee J.; Schriefers, Herbert J.
2001-01-01
Dutch-English bilinguals performed a generalized lexical decision task on triplets of items, responding with "yes" if all items wee correct Dutch and/or English words, and with "no" if one or ore of the items was not a word in wither language. Semantic priming effects were found in on-line response times. Event-related…
ERIC Educational Resources Information Center
Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent
2011-01-01
The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a strong Mokken scale for outpatients and…
ERIC Educational Resources Information Center
Senarat, Somprasong; Tayraukham, Sombat; Piyapimonsit, Chatsiri; Tongkhambanjong, Sakesan
2013-01-01
The purpose of this research is to develop a multidimensional computerized adaptive test for diagnosing the cognitive process of grade 7 students in learning algebra by applying multidimensional item response theory. The research is divided into 4 steps: 1) the development of item bank of algebra, 2) the development of the multidimensional…
ROYAL, KENNETH D.; STOCKDALE, MYRAH R.
2017-01-01
Introduction: Research has asserted MCQ items using three response options (one correct answer with two distractors) is comparable to, and possibly preferable over, traditional MCQ item formats consisting of four response options (e.g., one correct answer with three distractors), or five response options (e.g., one correct answer with four distractors). Some medical educators have also adopted the practice of using 3-option responses on MCQ exams as a response to the difficulty experienced in generating additional plausible distractors. To date, however, little work has explored how 3-option responses might impact validity threats stemming from random guessing strategies, and what impact 3-option responses might have on cut-score determinations, particularly in the context of medical education classroom assessments. The purpose of this work is to further explore these critically important considerations that largely have gone ignored in the medical education literature to this point. Methods: A cumulative binomial distribution formula was used to calculate the probability that an examinee will answer at random a given number of items correctly on any exam (of any length). By way of a demonstration, a variety of scenarios were presented to illustrate how examination length and the number of response options impact examinees’ chances of passing a given examination, and how subsequent cut-score decisions may be impacted by these factors. Results: As a general rule, classroom assessments containing fewer items should utilize traditional 4-option or 5-option responses, whereas assessments of greater length are afforded greater flexibility in potentially utilizing 3-option responses. Conclusions: More research on items with 3-option responses is needed to better understand what value, if any, 3-option responses truly add to classroom assessments, and in what contexts potential benefits might be discernible. PMID:28367465
Royal, Kenneth D; Stockdale, Myrah R
2017-04-01
Research has asserted MCQ items using three response options (one correct answer with two distractors) is comparable to, and possibly preferable over, traditional MCQ item formats consisting of four response options (e.g., one correct answer with three distractors), or five response options (e.g., one correct answer with four distractors). Some medical educators have also adopted the practice of using 3-option responses on MCQ exams as a response to the difficulty experienced in generating additional plausible distractors. To date, however, little work has explored how 3-option responses might impact validity threats stemming from random guessing strategies, and what impact 3-option responses might have on cut-score determinations, particularly in the context of medical education classroom assessments. The purpose of this work is to further explore these critically important considerations that largely have gone ignored in the medical education literature to this point. A cumulative binomial distribution formula was used to calculate the probability that an examinee will answer at random a given number of items correctly on any exam (of any length). By way of a demonstration, a variety of scenarios were presented to illustrate how examination length and the number of response options impact examinees' chances of passing a given examination, and how subsequent cut-score decisions may be impacted by these factors. As a general rule, classroom assessments containing fewer items should utilize traditional 4-option or 5-option responses, whereas assessments of greater length are afforded greater flexibility in potentially utilizing 3-option responses. More research on items with 3-option responses is needed to better understand what value, if any, 3-option responses truly add to classroom assessments, and in what contexts potential benefits might be discernible.
Handbook of Polytomous Item Response Theory Models
ERIC Educational Resources Information Center
Nering, Michael L., Ed.; Ostini, Remo, Ed.
2010-01-01
This comprehensive "Handbook" focuses on the most used polytomous item response theory (IRT) models. These models help us understand the interaction between examinees and test questions where the questions have various response categories. The book reviews all of the major models and includes discussions about how and where the models…
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?
ERIC Educational Resources Information Center
Royal, Kenneth D.; Stockdale, Myrah R.
2015-01-01
The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
Projective Item Response Model for Test-Independent Measurement
ERIC Educational Resources Information Center
Ip, Edward Hak-Sing; Chen, Shyh-Huei
2012-01-01
The problem of fitting unidimensional item-response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that contains a major dimension of interest but that may also contain minor nuisance dimensions. Because fitting a unidimensional model to multidimensional data results in…
Dental responsibility loadings and the relative value of dental services.
Teusner, D N; Ju, X; Brennan, D S
2017-09-01
To estimate responsibility loadings for a comprehensive list of dental services, providing a standardized unit of clinical work effort. Dentists (n = 2500) randomly sampled from the Australian Dental Association membership (2011) were randomly assigned to one of 25 panels. Panels were surveyed by questionnaires eliciting responsibility loadings for eight common dental services (core items) and approximately 12 other items unique to that questionnaire. In total, loadings were elicited for 299 items listed in the Australian Dental Schedule 9th Edition. Data were weighted to reflect the age and sex distribution of the workforce. To assess reliability, regression models assessed differences in core item loadings by panel assignment. Estimated loadings were described by reporting the median and mean. Response rate was 37%. Panel composition did not vary by practitioner characteristics. Core item loadings did not vary by panel assignment. Oral surgery and endodontic service areas had the highest proportion (91%) of services with median loadings ≥1.5, followed by prosthodontics (78%), periodontics (76%), orthodontics (63%), restorative (62%) and diagnostic services (31%). Preventive services had median loadings ≤1.25. Dental responsibility loadings estimated by this study can be applied in the development of relative value scales. © 2017 Australian Dental Association.
Polytomous Latent Scales for the Investigation of the Ordering of Items
ERIC Educational Resources Information Center
Ligtvoet, Rudy; van der Ark, L. Andries; Bergsma, Wicher P.; Sijtsma, Klaas
2011-01-01
We propose three latent scales within the framework of nonparametric item response theory for polytomously scored items. Latent scales are models that imply an invariant item ordering, meaning that the order of the items is the same for each measurement value on the latent scale. This ordering property may be important in, for example,…
Component Identification and Item Difficulty of Raven's Matrices Items.
ERIC Educational Resources Information Center
Green, Kathy E.; Kluever, Raymond C.
Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Fitting the Rasch Model to Account for Variation in Item Discrimination
ERIC Educational Resources Information Center
Weitzman, R. A.
2009-01-01
Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
ERIC Educational Resources Information Center
Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet
2012-01-01
Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…