Item Response Theory Modeling of the Philadelphia Naming Test.
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D
2015-06-01
In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
ERIC Educational Resources Information Center
Matlock, Ki Lynn; Turner, Ronna
2016-01-01
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.
ERIC Educational Resources Information Center
Reckase, Mark D.; McKinley, Robert L.
A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
ERIC Educational Resources Information Center
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-01-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.
2011-01-01
Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C
2017-02-01
The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
On Maximizing Item Information and Matching Difficulty with Ability.
ERIC Educational Resources Information Center
Bickel, Peter; Buyske, Steven; Chang, Huahua; Ying, Zhiliang
2001-01-01
Examined the assumption that matching difficulty levels of test items with an examinee's ability makes a test more efficient and challenged this assumption through a class of one-parameter item response theory models. Found the validity of the fundamental assumption to be closely related to the van Zwet tail ordering of symmetric distributions (W.…
Estimation of Item Response Theory Parameters in the Presence of Missing Data
ERIC Educational Resources Information Center
Finch, Holmes
2008-01-01
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Park, Jong Cook; Kim, Kwang Sig
2012-03-01
The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo
2017-10-30
The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.
Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri
2017-03-01
Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Item analysis of examinations in the Faculty of Medicine of Tunis.
Hermi, Amene; Achour, Wafa
2016-04-01
Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT). Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency. Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?
ERIC Educational Resources Information Center
DeMars, Christine
Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
Automatic Item Generation of Probability Word Problems
ERIC Educational Resources Information Center
Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina
2009-01-01
Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
Fitting the Rasch Model to Account for Variation in Item Discrimination
ERIC Educational Resources Information Center
Weitzman, R. A.
2009-01-01
Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
An Investigation of the Impact of Guessing on Coefficient α and Reliability
2014-01-01
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
ERIC Educational Resources Information Center
Freund, Philipp Alexander; Hofer, Stefan; Holling, Heinz
2008-01-01
Figural matrix items are a popular task type for assessing general intelligence (Spearman's g). Items of this kind can be constructed rationally, allowing the implementation of computerized generation algorithms. In this study, the influence of different task parameters on the degree of difficulty in matrix items was investigated. A sample of N =…
ERIC Educational Resources Information Center
Engelen, Ron J. H.; And Others
Fisher's information measure for the item difficulty parameter in the Rasch model and its marginal and conditional formulations are investigated. It is shown that expected item information in the unconditional model equals information in the marginal model, provided the assumption of sampling examinees from an ability distribution is made. For the…
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
ERIC Educational Resources Information Center
Hamadneh, Iyad Mohammed
2015-01-01
This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
Investigating the Performance of Omega Index According to Item Parameters and Ability Levels
ERIC Educational Resources Information Center
Sunbul, Onder; Yormaz, Seha
2018-01-01
Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
ERIC Educational Resources Information Center
Sullins, Walter L.
Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
ERIC Educational Resources Information Center
Pawade, Yogesh R.; Diwase, Dipti S.
2016-01-01
Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Do item-writing flaws reduce examinations psychometric quality?
Pais, João; Silva, Artur; Guimarães, Bruno; Povo, Ana; Coelho, Elisabete; Silva-Pereira, Fernanda; Lourinho, Isabel; Ferreira, Maria Amélia; Severo, Milton
2016-08-11
The psychometric characteristics of multiple-choice questions (MCQ) changed when taking into account their anatomical sites and the presence of item-writing flaws (IWF). The aim is to understand the impact of the anatomical sites and the presence of IWF in the psychometric qualities of the MCQ. 800 Clinical Anatomy MCQ from eight examinations were classified as standard or flawed items and according to one of the eight anatomical sites. An item was classified as flawed if it violated at least one of the principles of item writing. The difficulty and discrimination indices of each item were obtained. 55.8 % of the MCQ were flawed items. The anatomical site of the items explained 6.2 and 3.2 % of the difficulty and discrimination parameters and the IWF explained 2.8 and 0.8 %, respectively. The impact of the IWF was heterogeneous, the Writing the Stem and Writing the Choices categories had a negative impact (higher difficulty and lower discrimination) while the other categories did not have any impact. The anatomical site effect was higher than IWF effect in the psychometric characteristics of the examination. When constructing MCQ, the focus should be in the topic/area of the items and only after in the presence of IWF.
Item selection via Bayesian IRT models.
Arima, Serena
2015-02-10
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
On Interpreting the Model Parameters for the Three Parameter Logistic Model
ERIC Educational Resources Information Center
Maris, Gunter; Bechger, Timo
2009-01-01
This paper addresses two problems relating to the interpretability of the model parameters in the three parameter logistic model. First, it is shown that if the values of the discrimination parameters are all the same, the remaining parameters are nonidentifiable in a nontrivial way that involves not only ability and item difficulty, but also the…
Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju
2017-01-01
The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
An Application of the Rasch Model.
ERIC Educational Resources Information Center
Veitch, William R.
The one parameter latent trait theory of Georg Rasch has two assumptions: that student abilities can be measured on an equal interval scale, and that the success of a student with a given item is a function of student achievement and item difficulty. The grade four Michigan Educational Assessment Program reading test was designed to measure…
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.
Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland
2014-04-01
To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Some Considerations on the Partial Credit Model
ERIC Educational Resources Information Center
Verhelst, N. D.; Verstralen, H. H. F. M.
2008-01-01
The Partial Credit Model (PCM) is sometimes interpreted as a model for stepwise solution of polytomously scored items, where the item parameters are interpreted as difficulties of the steps. It is argued that this interpretation is not justified. A model for stepwise solution is discussed. It is shown that the PCM is suited to model sums of binary…
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study
ERIC Educational Resources Information Center
Sydorenko, Tetyana
2011-01-01
This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
A new item response theory model to adjust data allowing examinee choice
Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo
2018-01-01
In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
A Method of Q-Matrix Validation for the Linear Logistic Test Model
Baghaei, Purya; Hohensinn, Christine
2017-01-01
The linear logistic test model (LLTM) is a well-recognized psychometric model for examining the components of difficulty in cognitive tests and validating construct theories. The plausibility of the construct model, summarized in a matrix of weights, known as the Q-matrix or weight matrix, is tested by (1) comparing the fit of LLTM with the fit of the Rasch model (RM) using the likelihood ratio (LR) test and (2) by examining the correlation between the Rasch model item parameters and LLTM reconstructed item parameters. The problem with the LR test is that it is almost always significant and, consequently, LLTM is rejected. The drawback of examining the correlation coefficient is that there is no cut-off value or lower bound for the magnitude of the correlation coefficient. In this article we suggest a simulation method to set a minimum benchmark for the correlation between item parameters from the Rasch model and those reconstructed by the LLTM. If the cognitive model is valid then the correlation coefficient between the RM-based item parameters and the LLTM-reconstructed item parameters derived from the theoretical weight matrix should be greater than those derived from the simulated matrices. PMID:28611721
Ietsugu, Tetsuji; Sukigara, Masune; Furukawa, Toshiaki A
2007-12-01
The dichotomous diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) lose much important information concerning what each symptom can offer. This study explored the characteristics and performances of DSM-IV and ICD-10 diagnostic criteria items for panic attack using modern item response theory (IRT). The National Comorbidity Survey used the Composite International Diagnostic Interview to assess 14 DSM-IV and ICD-10 panic attack diagnostic criteria items in the general population in the USA. The dimensionality and measurement properties of these items were evaluated using dichotomous factor analysis and the two-parameter IRT model. A total of 1213 respondents reported at least one subsyndromal or syndromal panic attack in their lifetime. Factor analysis indicated that all items constitute a unidimensional construct. The two-parameter IRT model produced meaningful and interpretable results. Among items with high discrimination parameters, the difficulty parameter for "palpitation" was relatively low, while those for "choking," "fear of dying" and "paresthesia" were relatively high. Several items including "dry mouth" and "fear of losing control" had low discrimination parameters. The item characteristics of diagnostic criteria among help-seeking clinical populations may be different from those that we observed in the general population and deserve further examination. "Paresthesia," "choking" and "fear of dying" can be thought to be good indicators of severe panic attacks, while "palpitation" can discriminate well between cases and non-cases at low level of panic attack severity. Items such as "dry mouth" would contribute less to the discrimination.
Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Choi, Bongsam
2018-01-01
[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?
Schweizer, Karl; Troche, Stefan
2018-02-01
In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Echeverri, Margarita; Anderson, David; Nápoles, Anna María
2016-01-01
This article describes the adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish speakers. A cross-sectional field test of the Spanish version of the CHLT (CHLT-30-DKspa) was conducted among healthy Latinos in Louisiana. Diagonally weighted least squares was used to confirm the factor structure. Item response analysis using 2-parameter logistic estimates was used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. The mean CHLT-30-DKspa score (N = 400) was 17.13 (range = 0-30, SD = 6.65). Results confirmed a unidimensional structure, χ(2)(405) = 461.55, p = .027, comparative fit index = .993, Tucker-Lewis index = .992, root mean square error of approximation = .0180. Cronbach's alpha was .88. Items Q1-High Calorie and Q15-Tumor Spread had the lowest item-scale correlations (.148 and .288, respectively) and standardized factor loadings (.152 and .302, respectively). Items Q19-Smoking Risk, Q8-Palliative Care, and Q1-High Calorie had the highest item difficulty parameters (difficulty = 1.12, 1.21, and 2.40, respectively). Results generally support the applicability of the CHLT-30-DKspa for healthy Spanish-speaking populations, with the exception of 4 items that need to be deleted or revised and further studied: Q1, Q8, Q15, and Q19.
ERIC Educational Resources Information Center
Kostin, Irene
2004-01-01
The purpose of this study is to explore the relationship between a set of item characteristics and the difficulty of TOEFL[R] dialogue items. Identifying characteristics that are related to item difficulty has the potential to improve the efficiency of the item-writing process The study employed 365 TOEFL dialogue items, which were coded on 49…
Statistical Approaches to the Study of Item Difficulty.
ERIC Educational Resources Information Center
Olson, John F.; And Others
Traditionally, item difficulty has been defined in terms of the performance of examinees. For test development purposes, a more useful concept would be some kind of intrinsic item difficulty, defined in terms of the item's content, context, or characteristics and the task demands set by the item. In this investigation, the measurement literature…
Modelling Question Difficulty in an A Level Physics Examination
ERIC Educational Resources Information Center
Crisp, Victoria; Grayson, Rebecca
2013-01-01
"Item difficulty modelling" is a technique used for a number of purposes such as to support future item development, to explore validity in relation to the constructs that influence difficulty and to predict the difficulty of items. This research attempted to explore the factors influencing question difficulty in a general qualification…
Understanding Orgasmic Difficulty in Women.
Rowland, David L; Kolba, Tiffany N
2016-08-01
Women's primary issue with the orgasmic phase is usually difficulty reaching orgasm. To identify predictors of orgasmic difficulty in women within the context of a partnered sexual experience; to assess the relation between orgasmic difficulty and self-reported levels of sexual desire or interest and arousal in women; and to assess the interrelations among three dimensions of orgasmic response during partnered sex: self-reported time to reach orgasm, general difficulty or ease of reaching orgasm, and level of distress or concern. Drawing from a community-based sample using the Internet, 866 women were queried on a 26-item survey regarding their difficulty reaching orgasm during partnered sex. Four hundred sixteen women who indicated difficulty also responded to items assessing arousal and desire difficulties, level of distress about their condition, and their estimated time to reach orgasm. Answers to a 26-item survey on surveyed women's difficulty reaching orgasm during partnered sex. Age, arousal difficulty, and lubrication difficulty predicted difficulty reaching orgasm in the overall sample. In the subsample of women reporting difficulty, approximately half reported issues with arousal. Women with arousal problems reported greater difficulty reaching orgasm but did not differ from those without arousal problems on measurements of orgasm latency or levels of distress. Slightly more than half the women experiencing difficulty reaching orgasm were distressed by their condition; distressed women reported greater difficulty reaching orgasm and longer latencies to orgasm than non-distressed counterparts. They also reported lower satisfaction with their sexual relationship. This study indicates the importance of assessing multiple parameters when investigating orgasmic problems in women, including arousal issues, levels of distress, and latency to orgasm. Results also clarify that women with arousal problems do not differ substantially from those without arousal problems; in contrast, women distressed by their condition differ from non-distressed women along some critical dimensions. Although orgasmic problems decreased with age, the overall relation of this variable to distress, arousal, and latency to orgasm was essentially unchanged across age groups. Copyright © 2016 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments
ERIC Educational Resources Information Center
El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne
2017-01-01
Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
An Improved Internal Consistency Reliability Estimate.
ERIC Educational Resources Information Center
Cliff, Norman
1984-01-01
The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…
ERIC Educational Resources Information Center
Matlock, Ki Lynn
2013-01-01
When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.
McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G
2016-08-01
Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
Optimal pricing and marketing planning for deteriorating items.
Moosavi Tabatabaei, Seyed Reza; Sadjadi, Seyed Jafar; Makui, Ahmad
2017-01-01
Optimal pricing and marketing planning plays an essential role in production decisions on deteriorating items. This paper presents a mathematical model for a three-level supply chain, which includes one producer, one distributor and one retailer. The proposed study considers the production of a deteriorating item where demand is influenced by price, marketing expenditure, quality of product and after-sales service expenditures. The proposed model is formulated as a geometric programming with 5 degrees of difficulty and the problem is solved using the recent advances in optimization techniques. The study is supported by several numerical examples and sensitivity analysis is performed to analyze the effects of the changes in different parameters on the optimal solution. The preliminary results indicate that with the change in parameters influencing on demand, inventory holding, inventory deteriorating and set-up costs change and also significantly affect total revenue.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.
Lebedeva, Elena; Huang, Mei; Koski, Lisa
2016-03-01
The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
ERIC Educational Resources Information Center
Nissan, Susan; And Others
One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Sources of difficulty in assessment: example of PISA science items
NASA Astrophysics Data System (ADS)
Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie
2017-03-01
The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J
2016-11-01
To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?
ERIC Educational Resources Information Center
Schweizer, Karl; Troche, Stefan
2018-01-01
In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…
Rasch Measurement of Collaborative Problem Solving in an Online Environment.
Harding, Susan-Marie E; Griffin, Patrick E
2016-01-01
This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment
Lebedeva, Elena; Huang, Mei; Koski, Lisa
2016-01-01
Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
Optimal pricing and marketing planning for deteriorating items
Moosavi Tabatabaei, Seyed Reza; Sadjadi, Seyed Jafar; Makui, Ahmad
2017-01-01
Optimal pricing and marketing planning plays an essential role in production decisions on deteriorating items. This paper presents a mathematical model for a three-level supply chain, which includes one producer, one distributor and one retailer. The proposed study considers the production of a deteriorating item where demand is influenced by price, marketing expenditure, quality of product and after-sales service expenditures. The proposed model is formulated as a geometric programming with 5 degrees of difficulty and the problem is solved using the recent advances in optimization techniques. The study is supported by several numerical examples and sensitivity analysis is performed to analyze the effects of the changes in different parameters on the optimal solution. The preliminary results indicate that with the change in parameters influencing on demand, inventory holding, inventory deteriorating and set-up costs change and also significantly affect total revenue. PMID:28306750
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Extending item response theory to online homework
NASA Astrophysics Data System (ADS)
Kortemeyer, Gerd
2014-06-01
Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.
ERIC Educational Resources Information Center
Solano-Flores, Guillermo
1993-01-01
Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory
Ali, Usama S.; van Rijn, Peter W.
2015-01-01
Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.
ERIC Educational Resources Information Center
Perkins, Kyle; And Others
1995-01-01
This article reports the results of using a three-layer back propagation artificial neural network to predict item difficulty in a reading comprehension test. Three classes of variables were examined: text structure, propositional analysis, and cognitive demand. Results demonstrate that the networks can consistently predict item difficulty. (JL)
Multiple choice questions can be designed or revised to challenge learners' critical thinking.
Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A
2013-12-01
Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.
ERIC Educational Resources Information Center
Finney, Sara J.; Smith, Russell W.; Wise, Steven L.
Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A
2013-12-01
A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.
ERIC Educational Resources Information Center
Rubin, Lois S.; Mott, David E. W.
An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Using the Nudge and Shove Methods to Adjust Item Difficulty Values.
Royal, Kenneth D
2015-01-01
In any examination, it is important that a sufficient mix of items with varying degrees of difficulty be present to produce desirable psychometric properties and increase instructors' ability to make appropriate and accurate inferences about what a student knows and/or can do. The purpose of this "teaching tip" is to demonstrate how examination items can be affected by the quality of distractors, and to present a simple method for adjusting items to meet difficulty specifications.
Shen, Minxue; Hu, Ming; Sun, Zhenqiu
2017-01-01
Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Component Identification and Item Difficulty of Raven's Matrices Items.
ERIC Educational Resources Information Center
Green, Kathy E.; Kluever, Raymond C.
Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Analyzing force concept inventory with item response theory
NASA Astrophysics Data System (ADS)
Wang, Jing; Bao, Lei
2010-10-01
Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.
Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R
2018-05-01
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
ERIC Educational Resources Information Center
Ali, Usama S.; Walker, Michael E.
2014-01-01
Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
ERIC Educational Resources Information Center
Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.
2013-01-01
Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
Outcome-based self-assessment on a team-teaching subject in the medical school
Cho, Sa Sun
2014-01-01
We attempted to investigate the reason why the students got a worse grade in gross anatomy and the way how we can improve upon the teaching method since there were gaps between teaching and learning under recently changed integration curriculum. General characteristics of students and exploratory factors to testify the validity were compared between year 2011 and 2012. Students were asked to complete a short survey with a Likert scale. The results were as follows: although the percentage of acceptable items was similar between professors, professor C preferred questions with adequate item discrimination and inappropriate item difficulty whereas professor Y preferred adequate item discrimination and appropriate item difficulty with statistical significance (P<0.01). The survey revealed that 26.5% of total students gave up the exam on gross anatomy of professor Y irrespective of years. These results suggested that students were affected by the corrected item difficulty rather than item discrimination in order to obtain academic achievement. Therefore, professors in a team-teaching subject should reach a consensus on an item difficulty with proper teaching methods. PMID:25548724
A Comparison of Three Test Formats to Assess Word Difficulty
ERIC Educational Resources Information Center
Culligan, Brent
2015-01-01
This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
Sim, Si-Mui; Rasiah, Raja Isaiah
2006-02-01
This paper reports the relationship between the difficulty level and the discrimination power of true/false-type multiple-choice questions (MCQs) in a multidisciplinary paper for the para-clinical year of an undergraduate medical programme. MCQ items in papers taken from Year II Parts A, B and C examinations for Sessions 2001/02, and Part B examinations for 2002/03 and 2003/04, were analysed to obtain their difficulty indices and discrimination indices. Each paper consisted of 250 true/false items (50 questions of 5 items each) on topics drawn from different disciplines. The questions were first constructed and vetted by the individual departments before being submitted to a central committee, where the final selection of the MCQs was made, based purely on the academic judgement of the committee. There was a wide distribution of item difficulty indices in all the MCQ papers analysed. Furthermore, the relationship between the difficulty index (P) and discrimination index (D) of the MCQ items in a paper was not linear, but more dome-shaped. Maximal discrimination (D = 51% to 71%) occurred with moderately easy/difficult items (P = 40% to 74%). On average, about 38% of the MCQ items in each paper were "very easy" (P > or =75%), while about 9% were "very difficult" (P <25%). About two-thirds of these very easy/difficult items had "very poor" or even negative discrimination (D < or =20%). MCQ items that demonstrate good discriminating potential tend to be moderately difficult items, and the moderately-to-very difficult items are more likely to show negative discrimination. There is a need to evaluate the effectiveness of our MCQ items.
ERIC Educational Resources Information Center
Kahraman, Nilufer; De Champlain, Andre; Raymond, Mark
2012-01-01
Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very…
Mamikonian-Zarpas, Ani; Laganá, Luciana
2016-01-01
Functional status is often defined by cumulative scores across indices of independence in performing basic and instrumental activities of daily living (ADL/IADL), but little is known about the unique relationship of each daily activity item with the fall outcome. The purpose of this retrospective study was to examine the level of relative risk for a future fall associated with difficulty with performing various tasks of normal daily functioning among older adults who had fallen at least once in the past 12 months. The sample was comprised of community-dwelling individuals 70 years and older from the 1984–1990 Longitudinal Study of Aging by Kovar, Fitti, and Chyba (1992). Risk analysis was performed on individual items quantifying 6 ADLs and 7 IADLs, as well as 10 items related to mobility limitations. Within a subsample of 1,675 older adults with a history of at least one fall within the past year, the responses of individuals who reported multiple falls were compared to the responses of participants who had a single fall and reported 1) difficulty with walking and/or balance (FRAIL group, n = 413) vs. 2) no difficulty with walking or dizziness (NDW+ND group, n = 415). The items that had the strongest relationships and highest risk ratios for the FRAIL group (which had the highest probabilities for a future fall) included difficulty with: eating (73%); managing money (70%); biting or chewing food (66%); walking a quarter of a mile (65%); using fingers to grasp (65%); and dressing without help (65%). For the NDW+ND group, the most noteworthy items included difficulty with: bathing or showering (79%); managing money (77%); shopping for personal items (75%); walking up 10 steps without rest (72%); difficulty with walking a quarter of a mile (72%); and stooping/crouching/kneeling (70%). These findings suggest that individual items quantifying specific ADLs and IADLs have substantive relationships with the fall outcome among older adults who have difficulty with walking and balance, as well as among older individuals without dizziness or difficulty with walking. Furthermore, the examination of the relationships between items that are related to more challenging activities and the fall outcome revealed that higher functioning older adults who reported difficulty with the 6 items that yielded the highest risk ratios may also be at elevated risk for a fall. PMID:27200366
ERIC Educational Resources Information Center
Golino, Hudson F.; Gomes, Cristiano M. A.
2016-01-01
This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…
Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun
2016-01-01
The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14).
Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi
2015-01-01
Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach's alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice.
Factors Affecting Item Difficulty in English Listening Comprehension Tests
ERIC Educational Resources Information Center
Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia
2015-01-01
Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
Comparison of university students' understanding of graphs in different contexts
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana; Milin-Sipus, Zeljka
2013-12-01
This study investigates university students’ understanding of graphs in three different domains: mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel mathematics, physics, and other context questions about graphs were developed. A test consisting of these eight sets of questions (24 questions in all) was administered to 385 first year students at University of Zagreb who were either prospective physics or mathematics teachers or prospective physicists or mathematicians. Rasch analysis of data was conducted and linear measures for item difficulties were obtained. Average difficulties of items in three domains (mathematics, physics, and other contexts) and over two concepts (graph slope, area under the graph) were computed and compared. Analysis suggests that the variation of average difficulty among the three domains is much smaller for the concept of graph slope than for the concept of area under the graph. Most of the slope items are very close in difficulty, suggesting that students who have developed sufficient understanding of graph slope in mathematics are generally able to transfer it almost equally successfully to other contexts. A large difference was found between the difficulty of the concept of area under the graph in physics and other contexts on one side and mathematics on the other side. Comparison of average difficulty of the three domains suggests that mathematics without context is the easiest domain for students. Adding either physics or other context to mathematical items generally seems to increase item difficulty. No significant difference was found between the average item difficulty in physics and contexts other than physics, suggesting that physics (kinematics) remains a difficult context for most students despite the received instruction on kinematics in high school.
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.
ERIC Educational Resources Information Center
Marzano, Robert J.
To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
Bayesian inference in an item response theory model with a generalized student t link function
NASA Astrophysics Data System (ADS)
Azevedo, Caio L. N.; Migon, Helio S.
2012-10-01
In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.
Iwashita, Yukio; Hibi, Taizo; Ohyama, Tetsuji; Honda, Goro; Yoshida, Masahiro; Miura, Fumihiko; Takada, Tadahiro; Han, Ho-Seong; Hwang, Tsann-Long; Shinya, Satoshi; Suzuki, Kenji; Umezawa, Akiko; Yoon, Yoo-Seok; Choi, In-Seok; Huang, Wayne Shih-Wei; Chen, Kuo-Hsin; Watanabe, Manabu; Abe, Yuta; Misawa, Takeyuki; Nagakawa, Yuichi; Yoon, Dong-Sup; Jang, Jin-Young; Yu, Hee Chul; Ahn, Keun Soo; Kim, Song Cheol; Song, In Sang; Kim, Ji Hoon; Yun, Sung Su; Choi, Seong Ho; Jan, Yi-Yin; Shan, Yan-Shen; Ker, Chen-Guo; Chan, De-Chuan; Wu, Cheng-Chung; Lee, King-Teh; Toyota, Naoyuki; Higuchi, Ryota; Nakamura, Yoshiharu; Mizuguchi, Yoshiaki; Takeda, Yutaka; Ito, Masahiro; Norimizu, Shinji; Yamada, Shigetoshi; Matsumura, Naoki; Shindoh, Junichi; Sunagawa, Hiroki; Gocho, Takeshi; Hasegawa, Hiroshi; Rikiyama, Toshiki; Sata, Naohiro; Kano, Nobuyasu; Kitano, Seigo; Tokumura, Hiromi; Yamashita, Yuichi; Watanabe, Goro; Nakagawa, Kunitoshi; Kimura, Taizo; Yamakawa, Tatsuo; Wakabayashi, Go; Mori, Rintaro; Endo, Itaru; Miyazaki, Masaru; Yamamoto, Masakazu
2017-04-01
We previously identified 25 intraoperative findings during laparoscopic cholecystectomy (LC) as potential indicators of surgical difficulty per nominal group technique. This study aimed to build a consensus among expert LC surgeons on the impact of each item on surgical difficulty. Surgeons from Japan, Korea, and Taiwan (n = 554) participated in a Delphi process and graded the 25 items on a seven-stage scale (range, 0-6). Consensus was defined as (1) the interquartile range (IQR) of overall responses ≤2 and (2) ≥66% of the responses concentrated within a median ± 1 after stratification by workplace and LC experience level. Response rates for the first and the second-round Delphi were 92.6% and 90.3%, respectively. Final consensus was reached for all the 25 items. 'Diffuse scarring in the Calot's triangle area' in the 'Factors related to inflammation of the gallbladder' category had the strongest impact on surgical difficulty (median, 5; IQR, 1). Surgeons agreed that the surgical difficulty increases as more fibrotic change and scarring develop. The median point for each item was set as the difficulty score. A Delphi consensus was reached among expert LC surgeons on the impact of intraoperative findings on surgical difficulty. © 2017 Japanese Society of Hepato-Biliary-Pancreatic Surgery.
Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos
2018-02-23
The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
2016-01-01
Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
ERIC Educational Resources Information Center
Kim, Sooyeon; Livingston, Samuel A.
2017-01-01
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.
ERIC Educational Resources Information Center
Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne
Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam
2014-01-01
The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
ERIC Educational Resources Information Center
Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey
2016-01-01
We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
ERIC Educational Resources Information Center
Kramer, Gene A.; Smith, Richard M.
2001-01-01
Examined the role that gender differences play in the determination of the components influencing the difficulty of spatial ability items. Results for 2,245 examinees taking a spatial ability test that is part of the Dental School Admission Battery show that component difficulties show little variation across gender. (SLD)
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14)
Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi
2015-01-01
Purpose Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. Methods In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach’s alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). Results The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. Conclusion The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice. PMID:26247356
Conditional statistical inference with multistage testing designs.
Zwitser, Robert J; Maris, Gunter
2015-03-01
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
ERIC Educational Resources Information Center
Retnawati, Heri; Kartowagiran, Badrun; Arlinwibowo, Janu; Sulistyaningsih, Eny
2017-01-01
The quality of national examination items plays an enormous role in identifying students' competencies mastery and their difficulties. This study aims to identify the difficult items in the Junior High School Mathematics National Examination, to find the factors that cause students' difficulty and to reveal the strategies that the teachers and the…
ERIC Educational Resources Information Center
Dodonova, Yulia A.; Dodonov, Yury S.
2013-01-01
Using more complex items than those commonly employed within the information-processing approach, but still easier than those used in intelligence tests, this study analyzed how the association between processing speed and accuracy level changes as the difficulty of the items increases. The study involved measuring cognitive ability using Raven's…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.
ERIC Educational Resources Information Center
Perkins, Kyle; And Others
This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
Detecting unexpected variables in the MMPI 2 Social Introversion scale.
Chang, C H; Wright, B D
2001-01-01
The standard scoring structure of the revised Minnesota Multiphasic Personality Inventory (MMPI-2) Social Introversion (Si) scale was reexamined with Rasch Measurement. The 69-item Si scale split into two distinct dimensions when their standardized residuals were factor analyzed. Items keyed "true" to Si defined one dimension and items keyed "false" defined another. Relationships between Lexile values (an index of reading difficulty and comprehension) and item difficulties were also explored. The article shows how to use Rasch Measurement to understand and improve personality assessment.
An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research.
ERIC Educational Resources Information Center
Holland, Paul W.; Thayer, Dorothy T.
An alternative definition has been developed of the delta scale of item difficulty used at Educational Testing Service. The traditional delta scale uses an inverse normal transformation based on normal ogive models developed years ago. However, no use is made of this fact in typical uses of item deltas. It is simply one way to make the probability…
The role of difficulty and gender in numbers, algebra, geometry and mathematics achievement
NASA Astrophysics Data System (ADS)
Rabab'h, Belal Sadiq Hamed; Veloo, Arsaythamby; Perumal, Selvan
2015-05-01
This study aims to identify the role of difficulty and gender in numbers, algebra, geometry and mathematics achievement among secondary schools students in Jordan. The respondent of the study were 337 students from eight public secondary school in Alkoura district by using stratified random sampling. The study comprised of 179 (53%) males and 158 (47%) females students. The mathematics test comprises of 30 items which has eight items for numbers, 14 items for algebra and eight items for geometry. Based on difficulties among male and female students, the findings showed that item 4 (fractions - 0.34) was most difficult for male students and item 6 (square roots - 0.39) for females in numbers. For the algebra, item 11 (inequality - 0.23) was most difficult for male students and item 6 (algebraic expressions - 0.35) for female students. In geometry, item 3 (reflection - 0.34) was most difficult for male students and item 8 (volume - 0.33) for female students. Based on gender differences, female students showed higher achievement in numbers and algebra compare to male students. On the other hand, there was no differences between male and female students achievement in geometry test. This study suggest that teachers need to give more attention on numbers and algebra when teaching mathematics.
Keller, Johannes
2007-06-01
Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths performance. The study was designed to test theoretical ideas derived from stereotype threat theory and assumptions outlined in the Yerkes-Dodson law proposing a nonlinear relationship between arousal, task difficulty and performance. Participants were 108 high school students attending secondary schools. Participants worked on a test comprising maths problems of different difficulty levels. Half of the participants learned that the test had been shown to produce gender differences (stereotype threat). The other half learned that the test had been shown not to produce gender differences (no threat). The degree to which participants identify with the domain of maths was included as a quasi-experimental factor. Maths-identified female students showed performance decrements under conditions of stereotype threat. Moreover, the stereotype threat manipulation had different effects on low and high domain identifiers' performance depending on test item difficulty. On difficult items, low identifiers showed higher performance under threat (vs. no threat) whereas the reverse was true in high identifiers. This interaction effect did not emerge on easy items. Domain identification and test item difficulty are two important factors that need to be considered in the attempt to understand the impact of stereotype threat on performance.
Ali, Amira Mohammed; Ahmed, Anwar; Sharaf, Amira; Kawakami, Norito; Abdeldayem, Samia M; Green, Joseph
2017-12-01
This study aimed to examine the validity of the Arabic version of the Depression Anxiety Stress Scale-21 (DASS-21) in 149 illicit drug users. We calculated α coefficient, inter-item and item-total correlations, coefficients of reproducibility and scalability (CR and CS), item difficulty and discrimination indices. The DASS-21 had an acceptable reliability; but values of the CR and the CS were less than acceptable. Items varied in difficulty and discrimination; some items are candidates for elimination. The DASS-21 is a probabilistic and not a deterministic measure of distress; it has problematic items and needs further investigations. Copyright © 2017 Elsevier B.V. All rights reserved.
Selecting Items for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Mellenbergh, Gideon J.; van der Linden, Wim J.
1982-01-01
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Item Difficulty Modeling of Paragraph Comprehension Items
ERIC Educational Resources Information Center
Gorin, Joanna S.; Embretson, Susan E.
2006-01-01
Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…
Item difficulty and item validity for the Children's Group Embedded Figures Test.
Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S
1994-02-01
The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
North American Veterinary Licensing Examination pacing study.
Subhiyah, Raja G; Boyce, John R
2010-01-01
The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
Difficulty and Discriminability of Introductory Psychology Test Items.
ERIC Educational Resources Information Center
Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis
2001-01-01
Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Working memory capacity and fluid abilities: the more difficult the item, the more more is better.
Little, Daniel R; Lewandowsky, Stephan; Craig, Stewart
2014-01-01
The relationship between fluid intelligence and working memory is of fundamental importance to understanding how capacity-limited structures such as working memory interact with inference abilities to determine intelligent behavior. Recent evidence has suggested that the relationship between a fluid abilities test, Raven's Progressive Matrices, and working memory capacity (WMC) may be invariant across difficulty levels of the Raven's items. We show that this invariance can only be observed if the overall correlation between Raven's and WMC is low. Simulations of Raven's performance revealed that as the overall correlation between Raven's and WMC increases, the item-wise point bi-serial correlations involving WMC are no longer constant but increase considerably with item difficulty. The simulation results were confirmed by two studies that used a composite measure of WMC, which yielded a higher correlation between WMC and Raven's than reported in previous studies. As expected, with the higher overall correlation, there was a significant positive relationship between Raven's item difficulty and the extent of the item-wise correlation with WMC.
Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A
2009-06-01
Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
Psychometric analyses to improve the Dutch ICF Activity Inventory.
Bruijning, Janna E; van Rens, Ger; Knol, Dirk; van Nispen, Ruth
2013-08-01
In the past, rehabilitation centers for the visually impaired used unstructured or semistructured methods to assess rehabilitation needs of their patients. Recently, an extensive instrument, the Dutch ICF Activity Inventory (D-AI), was developed to systematically investigate rehabilitation needs of visually impaired adults and to evaluate rehabilitation outcomes. The purpose of this study was to investigate the underlying factor structure and other psychometric properties to shorten and improve the D-AI. The D-AI was administered to 241 visually impaired persons who recently enrolled in a multidisciplinary rehabilitation center. The D-AI uses graded scores to assess the importance and difficulty of 65 rehabilitation goals. For high-priority goals (e.g., daily meal preparation), the difficulty of underlying tasks (e.g., read recipes, cut vegetables) was assessed. To reduce underlying task items (>950), descriptive statistics were investigated and factor analyses were performed for several goals. The internal consistency reliability and test-retest reliability of the D-AI were investigated by calculating Cronbach α and Cohen (weighted) κ. Finally, consensus-based discussions were used to shorten and improve the D-AI. Except for one goal, factor analysis model parameters were at least reasonable. Internal consistency reliability was satisfactory (range, 0.74 to 0.93). In total, 60% of the 65 goal importance items and 84.4% of the goal difficulty items showed moderate to almost perfect κ values (≥0.40). After consensus-based discussions, a new D-AI was produced, containing 48 goals and less than 500 tasks. The analyses were an important step in the validation process of the D-AI and to develop a more feasible assessment tool to investigate rehabilitation needs of visually impaired persons in a systematic way. The D-AI is currently implemented in all Dutch rehabilitation centers serving all visually impaired adults with various rehabilitation needs.
Menon, Chloe; Westervelt, Holly James; Jahn, Danielle R.; Dressel, Jeffrey A.; O’Bryant, Sid E.
2013-01-01
The Brief Smell Identification Test (BSIT) is a commonly used measure of olfactory functioning in elderly populations. Few studies have provided normative data for this measure, and minimal data are available regarding the impact of sociodemographic factors on test scores. This study presents normative data for the BSIT in a sample of English- and Spanish-speaking Hispanic and non-Hispanic Whites. A Rasch analysis was also conducted to identify the items that best discriminated between varying levels of olfactory functioning, as measured by the BSIT. The total sample included 302 older adults seen as part of an ongoing study of rural cognitive aging, Project FRONTIER. Hierarchical regression analyses revealed that BSIT scores require adjustment by age and gender, but years of education, ethnicity, and language did not significantly influence BSIT performance. Four items best discriminated between varying levels of smell identification, accounting for 59.44% of total information provided by the measure. However, items did not represent a continuum of difficulty on the BSIT. The results of this study indicate that the BSIT appears to be well-suited for assessing odor identification deficits in older adults of diverse backgrounds, but that fine-tuning of this instrument may be recommended in light of its items’ difficulty and discrimination parameters. Clinical and empirical implications are discussed. PMID:23634698
Mokken scaling of the Myocardial Infarction Dimensional Assessment Scale (MIDAS).
Thompson, David R; Watson, Roger
2011-02-01
The purpose of this study was to examine the hierarchical and cumulative nature of the 35 items of the Myocardial Infarction Dimensional Assessment Scale (MIDAS), a disease-specific health-related quality of life measure. Data from 668 participants who completed the MIDAS were analysed using the Mokken Scaling Procedure, which is a computer program that searches polychotomous data for hierarchical and cumulative scales on the basis of a range of diagnostic criteria. Fourteen MIDAS items were retained in a Mokken scale and these items included physical activity, insecurity, emotional reaction and dependency items but excluded items related to diet, medication or side-effects. Item difficulty, in item response theory terms, ran from physical activity items (low difficulty) to insecurity, suggesting that the most severe quality of life effect of myocardial infarction is loneliness and isolation. Items from the MIDAS form a strong and reliable Mokken scale, which provides new insight into the relationship between items in the MIDAS and the measurement of quality of life after myocardial infarction. © 2010 Blackwell Publishing Ltd.
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.
ERIC Educational Resources Information Center
Woodcock, Richard W.
Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
The Effect of Anchor Test Construction on Scale Drift
ERIC Educational Resources Information Center
Antal, Judit; Proctor, Thomas P.; Melican, Gerald J.
2014-01-01
In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI
NASA Astrophysics Data System (ADS)
Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS
2011-01-01
Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Simple mental addition in children with and without mild mental retardation.
Janssen, R; De Boeck, P; Viaene, M; Vallaeys, L
1999-11-01
The speeded performance on simple mental addition problems of 6- and 7-year-old children with and without mild mental retardation is modeled from a person perspective and an item perspective. On the person side, it was found that a single cognitive dimension spanned the performance differences between the two ability groups. However, a discontinuity, or "jump," was observed in the performance of the normal ability group on the easier items. On the item side, the addition problems were almost perfectly ordered in difficulty according to their problem size. Differences in difficulty were explained by factors related to the difficulty of executing nonretrieval strategies. All findings were interpreted within the framework of Siegler's (e.g., R. S. Siegler & C. Shipley, 1995) model of children's strategy choices in arithmetic. Models from item response theory were used to test the hypotheses. Copyright 1999 Academic Press.
ERIC Educational Resources Information Center
Hewitt, Margaret A.; Homan, Susan P.
2004-01-01
Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Developing Item Response Theory-Based Short Forms to Measure the Social Impact of Burn Injuries.
Marino, Molly E; Dore, Emily C; Ni, Pengsheng; Ryan, Colleen M; Schneider, Jeffrey C; Acton, Amy; Jette, Alan M; Kazis, Lewis E
2018-03-01
To develop self-reported short forms for the Life Impact Burn Recovery Evaluation (LIBRE) Profile. Short forms based on the item parameters of discrimination and average difficulty. A support network for burn survivors, peer support networks, social media, and mailings. Burn survivors (N=601) older than 18 years. Not applicable. The LIBRE Profile. Ten-item short forms were developed to cover the 6 LIBRE Profile scales: Relationships with Family & Friends, Social Interactions, Social Activities, Work & Employment, Romantic Relationships, and Sexual Relationships. Ceiling effects were ≤15% for all scales; floor effects were <1% for all scales. The marginal reliability of the short forms ranged from .85 to .89. The LIBRE Profile-Short Forms demonstrated credible psychometric properties. The short form version provides a viable alternative to administering the LIBRE Profile when resources do not allow computer or Internet access. The full item bank, computerized adaptive test, and short forms are all scored along the same metric, and therefore scores are comparable regardless of the mode of administration. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Bond, Kathy S; Chalmers, Kathryn J; Jorm, Anthony F; Kitchener, Betty A; Reavley, Nicola J
2015-06-03
There is a strong association between mental health problems and financial difficulties. Therefore, people who work with those who have financial difficulties (financial counsellors and financial institution staff) need to have knowledge and helping skills relevant to mental health problems. Conversely, people who support those with mental health problems (mental health professionals and carers) may need to have knowledge and helping skills relevant to financial difficulties. The Delphi expert consensus method was used to develop guidelines for people who work with or support those with mental health problems and financial difficulties. A systematic review of websites, books and journal articles was conducted to develop a questionnaire containing items about the knowledge, skills and actions relevant to working with or supporting someone with mental health problems and financial difficulties. These items were rated over three rounds by five Australian expert panels comprising of financial counsellors (n = 33), financial institution staff (n = 54), mental health professionals (n = 31), consumers (n = 20) and carers (n = 24). A total of 897 items were rated, with 462 items endorsed by at least 80 % of members of each of the expert panels. These endorsed statements were used to develop a set of guidelines for financial counsellors, financial institution staff, mental health professionals and carers about how to assist someone with mental health problems and financial difficulties. A diverse group of expert panel members were able to reach substantial consensus on the knowledge, skills and actions needed to work with and support people with mental health problems and financial difficulties. These guidelines can be used to inform policy and practice in the financial and mental health sectors.
Echeverri, Margarita; Anderson, David; Nápoles, Anna María
2016-01-01
Objective Describe adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish-speakers. Methods Cross-sectional field test of the CHLT Spanish version (CHLT-30-DKspa) among healthy Latinos in Louisiana. Diagonally Weighted Least Squares were used to confirm the factor structure. Item-Response Analysis using 2-parameter logistic estimates were used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. Results Mean CHLT-30-DKspa score (N=400) was 17.13 (range 0 to 30; SD 6.65). Results confirmed a unidimensional structure (X2[405] =461.55, p=.027, CFI=.993; TLI=.992, RMSEA=.0180). Cronbach's alpha was 0.88. Items Q1-High calorie and Q15-Tumor spread had the lowest item-scale correlations (.148 and .288) and standardized factor loadings (.152 and .302). Items Q1-High Calories, Q8-Palliative Care, and Q19-Smoking Risk had the highest item-difficulty parameters (diff=1.12, 1.21, and 2.40). Conclusions Results generally supported the applicability of the CHLT-30-DKspa for Spanish-speaking healthy populations, with the exception of four items that need to be deleted or revised and further studied Q1, Q8, Q15, and Q19). Practical Implications The CHLT-30-DKspa can be used to assess cancer health literacy among Spanish-speaking populations to advance research on cancer health literacy and outcomes. PMID:27043760
ERIC Educational Resources Information Center
Shulruf, Boaz; Jones, Phil; Turner, Rolf
2015-01-01
The determination of Pass/Fail decisions over Borderline grades, (i.e., grades which do not clearly distinguish between the competent and incompetent examinees) has been an ongoing challenge for academic institutions. This study utilises the Objective Borderline Method (OBM) to determine examinee ability and item difficulty, and from that…
ERIC Educational Resources Information Center
Wu, Pei-Chen; Chang, Lily
2008-01-01
The authors investigated the Chinese version of the Beck Depression Inventory-II (BDI-II-C; Chinese Behavioral Science Corporation, 2000) within the Rasch framework in terms of dimensionality, item difficulty, and category functioning. Two underlying scale dimensions, relatively high item difficulties, and a need for collapsing 2 response…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
ERIC Educational Resources Information Center
Benson, Jeri; Wilson, Michael
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2011-01-01
A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…
Fraundorf, Scott H; Benjamin, Aaron S
2016-09-01
Information about others' success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent's accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent's performance and once afterwards. Participants reconsidered their responses least often when the opponent's accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent's accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent's performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others' knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall.
Intervention for children with word-finding difficulties: a parallel group randomised control trial.
Best, Wendy; Hughes, Lucy Mari; Masterson, Jackie; Thomas, Michael; Fedor, Anna; Roncoli, Silvia; Fern-Pollak, Liory; Shepherd, Donna-Lynn; Howard, David; Shobbrook, Kate; Kapikian, Anna
2017-07-31
The study investigated the outcome of a word-web intervention for children diagnosed with word-finding difficulties (WFDs). Twenty children age 6-8 years with WFDs confirmed by a discrepancy between comprehension and production on the Test of Word Finding-2, were randomly assigned to intervention (n = 11) and waiting control (n = 9) groups. The intervention group had six sessions of intervention which used word-webs and targeted children's meta-cognitive awareness and word-retrieval. On the treated experimental set (n = 25 items) the intervention group gained on average four times as many items as the waiting control group (d = 2.30). There were also gains on personally chosen items for the intervention group. There was little change on untreated items for either group. The study is the first randomised control trial to demonstrate an effect of word-finding therapy with children with language difficulties in mainstream school. The improvement in word-finding for treated items was obtained following a clinically realistic intervention in terms of approach, intensity and duration.
Fayyaz Khan, Humaira; Farooq Danish, Khalid; Saeed Awan, Azra; Anwar, Masood
2013-05-01
The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011. This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ's submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test. Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%. Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ's.
A Review of Classical Methods of Item Analysis.
ERIC Educational Resources Information Center
French, Christine L.
Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah
2011-01-01
Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Adaptable Learning Assistant for Item Bank Management
ERIC Educational Resources Information Center
Nuntiyagul, Atorn; Naruedomkul, Kanlaya; Cercone, Nick; Wongsawang, Damras
2008-01-01
We present PKIP, an adaptable learning assistant tool for managing question items in item banks. PKIP is not only able to automatically assist educational users to categorize the question items into predefined categories by their contents but also to correctly retrieve the items by specifying the category and/or the difficulty level. PKIP adapts…
Two-item same/different discrimination in rhesus monkeys (Macaca mulatta).
Basile, Benjamin M; Moylan, Emily J; Charles, David P; Murray, Elisabeth A
2015-11-01
Almost all nonhuman animals can recognize when one item is the same as another item. It is less clear whether nonhuman animals possess abstract concepts of "same" and "different" that can be divorced from perceptual similarity. Pigeons and monkeys show inconsistent performance, and often surprising difficulty, in laboratory tests of same/different learning that involve only two items. Previous results from tests using multi-item arrays suggest that nonhumans compute sameness along a continuous scale of perceptual variability, which would explain the difficulty of making two-item same/different judgments. Here, we provide evidence that rhesus monkeys can learn a two-item same/different discrimination similar to those on which monkeys and pigeons have previously failed. Monkeys' performance transferred to novel stimuli and was not affected by perceptual variations in stimulus size, rotation, view, or luminance. Success without the use of multi-item arrays, and the lack of effect of perceptual variability, suggests a computation of sameness that is more categorical, and perhaps more abstract, than previously thought.
Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A
2018-01-01
Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
ERIC Educational Resources Information Center
Marie, S. Maria Josephine Arokia; Edannur, Sreekala
2015-01-01
This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.
ERIC Educational Resources Information Center
Hertz, Norman R.; Chinn, Roberta N.
This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.
ERIC Educational Resources Information Center
Rudner, Lawrence M.
Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
ERIC Educational Resources Information Center
Chauvin, Bruno; Leonova, Tamara
2016-01-01
Key concerns about the psychometric properties of the 25-item version of the Strengths and Difficulties Questionnaire (SDQ) have consistently been raised in the literature. The present study aimed at examining the meaningfulness of an alternative model to the SDQ in which 7 problematic items are excluded. French-speaking parents of 262 boys and…
Smolen, Tomasz; Chuderski, Adam
2015-01-01
Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Fraundorf, Scott H.; Benjamin, Aaron S.
2015-01-01
Information about others’ success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent’s accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent’s performance and once afterwards. Participants reconsidered their responses least often when the opponent’s accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent’s accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent’s performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others’ knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall. PMID:26247369
ERIC Educational Resources Information Center
Jones, Andrew T.
2011-01-01
Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.
Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro
2013-01-01
the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Handren, Lindsay; Crano, William D.
2018-01-01
Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of “dry” weekdays and “wet” weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns. PMID:27488456
Lac, Andrew; Handren, Lindsay; Crano, William D
2016-10-01
Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of "dry" weekdays and "wet" weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns.
Hagman, Brett T
2017-11-01
The Diagnostic and Statistical Manual of Mental Disorders (5th edition) Alcohol Use Disorder (DSM-5 AUD) criteria have been modified to reflect a single, continuous disorder. It is critical that we develop brief assessment measures that can accurately assess for DSM-5 AUD criteria in college students to assist in screening, referral, and brief intervention services implemented on college campuses. The present study sought to develop and assess for the psychometric properties of a brief 13-item measure designed to capture the full spectrum of the DSM-5 AUD criteria in a sample of college students. Participants were past-year drinkers (N = 923) between the ages of 18 to 30 enrolled at 3 universities. Respondents completed a 30-min anonymous battery of questionnaires online. The Brief DSM-5 AUD Assessment consisted of 13 items designed to reflect the DSM-5 AUD criteria. Results indicated a high degree of internal consistency reliability with high item-to-scale correlations. Confirmatory factor analyses indicated that a dominant single factor emerged with good model fit. The Item Response Theory (IRT) analyses indicated that the difficulty parameters for each criterion were intermixed along the upper portion of the underlying AUD severity continuum, and the discrimination parameters were all high. Additional analysis indicated that those with a DSM-5 AUD had greater levels of alcohol and other drug use and problem severity in comparison to those without a DSM-5 AUD. Study findings provide empirical support for the reliability and validity of the Brief 13-item DSM-5 Assessment. It should be routinely included into research and clinical practice efforts. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?
ERIC Educational Resources Information Center
Jackson, Evelyn W.; And Others
1994-01-01
Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
Measuring Student Learning with Item Response Theory
ERIC Educational Resources Information Center
Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.
2008-01-01
We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Combining the Best of Two Standard Setting Methods: The Ordered Item Booklet Angoff
ERIC Educational Resources Information Center
Smith, Russell W.; Davis-Becker, Susan L.; O'Leary, Lisa S.
2014-01-01
This article describes a hybrid standard setting method that combines characteristics of the Angoff (1971) and Bookmark (Mitzel, Lewis, Patz & Green, 2001) methods. The proposed approach utilizes strengths of each method while addressing weaknesses. An ordered item booklet, with items sorted based on item difficulty, is used in combination…
Comparative Racial Analysis of Enlisted Advancement Exams: Item- Difficulty.
1975-07-01
11cm-ana lysis Promotion Racial comparison Equal opportunity 1 20. ABSTRACT (Continue on reveree aide 11 neceeemry mnd Identity by block...improving equal oppor- tunity in career growth for minority groups. The study of exam item- difficulty levels is the first of a series of technical reports...under Exploratory Development Task Area PF55.521.032 (Contemporary Social Issues). J. J. CLARKIN Commanding Officer SUMMARY Purpose A number of
Psychometric Evaluation of a Cultural Competency Assessment Instrument for Health Professionals
Haywood, Sonja H.; Goode, Tawara; Gao, Yong; Smith, Kristyn; Bronheim, Suzanne; Flocke, Susan A; Zyzanski, Steve
2012-01-01
Background Few valid and reliable measures exist for health care professionals interested in determining their levels of cultural and linguistic competence. Objective To evaluate the measurement properties of the Cultural Competence Health Practitioner Assessment (CCHPA-129). Methods The CCHPA-129 is a 129-item web-based instrument, developed by the National Center for Cultural Competence (NCCC). Responses on the CCHPA -129 were examined using factor analysis; Rasch modeling; and Differential Item Functioning (DIF) across race, ethnicity, gender, and profession. Subjects 2504 practitioners, including 1864 nurses (RN/LPN,/BSN); 341 clinicians (PA/NP); and 299 physicians (MD/DO), who completed the CCHPA-129 online between 2005 and 2008. Results Three factors representing domains of knowledge, adapting practice, and promoting health for culturally and linguistically diverse populations accounted for 46% of the variance. Among Knowledge factor items, 53% (23/43) fit the Rasch model, item difficulties ranged from −1.01 logits (least difficult) to +1.11 logits (most difficult), separation index (SI) 13.82, and Cronbach’s α 0.92. Forty-seven percent (21/44) Adapting Practice factor items fit the model, item difficulties −0.07 to +1.11 logits, SI 11.59, Cronbach’s α 0.88; and 58% (23/39). Promoting Health factor items fit the model, item difficulties −1.01 to +1.38 logits, SI 22.64, Cronbach’s α 0.92. Early evidence of validity was established by known groups having statistically different scores. Conclusion The 67-item CCHPA-67 is psychometrically sound. This shorted instrument can be used to establish associations between practitioners’ cultural and linguistic competence and health outcomes as well as to evaluate interventions to increase practitioners’ cultural and linguistic competence. PMID:22437625
ERIC Educational Resources Information Center
Quaigrain, Kennedy; Arhin, Ato Kwamina
2017-01-01
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.
ERIC Educational Resources Information Center
Wainer, Howard
It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…
ERIC Educational Resources Information Center
Masters, James S.
2010-01-01
With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Anders, Royce; Riès, Stéphanie; Van Maanen, Leendert; Alario, F-Xavier
Patients with lesions in the left prefrontal cortex (PFC) have been shown to be impaired in lexical selection, especially when interference between semantically related alternatives is increased. To more deeply investigate which computational mechanisms may be impaired following left PFC damage due to stroke, a psychometric modelling approach is employed in which we assess the cognitive parameters of the patients from an evidence accumulation (sequential information sampling) modelling of their response data. We also compare the results to healthy speakers. Analysis of the cognitive parameters indicates an impairment of the PFC patients to appropriately adjust their decision threshold, in order to handle the increased item difficulty that is introduced by semantic interference. Also, the modelling contributes to other topics in psycholinguistic theory, in which specific effects are observed on the cognitive parameters according to item familiarization, and the opposing effects of priming (lower threshold) and semantic interference (lower drift) which are found to depend on repetition. These results are developed for the blocked-cyclic picture naming paradigm, in which pictures are presented within semantically homogeneous (HOM) or heterogeneous (HET) blocks, and are repeated several times per block. Overall, the results are in agreement with a role of the left PFC in adjusting the decision threshold for lexical selection in language production.
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.
Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily
2018-02-23
The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.
ERIC Educational Resources Information Center
Maihoff, N. A.; Mehrens, Wm. A.
A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.
ERIC Educational Resources Information Center
Brutten, Sheila R.; And Others
A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy
ERIC Educational Resources Information Center
Chariker, Julia H.; Naaz, Farah; Pani, John R.
2012-01-01
This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift
ERIC Educational Resources Information Center
Guo, Rui; Zheng, Yi; Chang, Hua-Hua
2015-01-01
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Francis, Wendy S; Tokowicz, Natasha; Kroll, Judith F
2014-01-01
Repetition priming was used to assess how proficiency and the ease or difficulty of lexical access influence bilingual translation. Two experiments, conducted at different universities with different Spanish-English bilingual populations and materials, showed repetition priming in word translation for same-direction and different-direction repetitions. Experiment 1, conducted in an English-dominant environment, revealed an effect of translation direction but not of direction match, whereas Experiment 2, conducted in a more balanced bilingual environment, showed an effect of direction match but not of translation direction. A combined analysis on the items common to both studies revealed that bilingual proficiency was negatively associated with response time (RT), priming, and the degree of translation asymmetry in RTs and priming. An item analysis showed that item difficulty was positively associated with RTs, priming, and the benefit of same-direction over different-direction repetition. Thus, although both participant accuracy and item accuracy are indices of learning, they have distinct effects on translation RTs and on the learning that is captured by the repetition-priming paradigm.
The second version of the L. V. Prasad-functional vision questionnaire.
Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K
2012-11-01
The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.
Maximum Likelihood Item Easiness Models for Test Theory Without an Answer Key
Batchelder, William H.
2014-01-01
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce two extensions to the basic model in order to account for item rating easiness/difficulty. The first extension is a multiplicative model and the second is an additive model. We show how the multiplicative model is related to the Rasch model. We describe several maximum-likelihood estimation procedures for the models and discuss issues of model fit and identifiability. We describe how the CCT models could be used to give alternative consensus-based measures of reliability. We demonstrate the utility of both the basic and extended models on a set of essay rating data and give ideas for future research. PMID:29795812
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy
Chariker, Julia H.; Naaz, Farah; Pani, John R.
2012-01-01
This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present paper demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. PMID:22231801
Item difficulty in the evaluation of computer-based instruction: an example from neuroanatomy.
Chariker, Julia H; Naaz, Farah; Pani, John R
2012-01-01
This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present article demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. Copyright © 2011 American Association of Anatomists.
ERIC Educational Resources Information Center
Zickar, Michael J.; Ury, Karen L.
2002-01-01
Attempted to relate content features of personality items to item parameter estimates from the partial credit model of E. Muraki (1990) by administering the Adjective Checklist (L. Goldberg, 1992) to 329 undergraduates. As predicted, the discrimination parameter was related to the item subtlety ratings of personality items but the level of word…
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models
ERIC Educational Resources Information Center
Lee, Wooyeol; Cho, Sun-Joo
2017-01-01
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
2011-01-01
Background The quality of data in national health information systems has been questionable in most developing countries. However, the mechanisms of errors in the case identification process are not fully understood. This study aimed to investigate the mechanisms of errors in the case identification process in the existing routine health information system (RHIS) in the Philippines by measuring the risk of committing errors for health program indicators used in the Field Health Services Information System (FHSIS 1996), and characterizing those indicators accordingly. Methods A structured questionnaire on the definitions of 12 selected indicators in the FHSIS was administered to 132 health workers in 14 selected municipalities in the province of Palawan. A proportion of correct answers (difficulty index) and a disparity of two proportions of correct answers between higher and lower scored groups (discrimination index) were calculated, and the patterns of wrong answers for each of the 12 items were abstracted from 113 valid responses. Results None of 12 items reached a difficulty index of 1.00. The average difficulty index of 12 items was 0.266 and the discrimination index that showed a significant difference was 0.216 and above. Compared with these two cut-offs, six items showed non-discrimination against lower difficulty indices of 0.035 (4/113) to 0.195 (22/113), two items showed a positive discrimination against lower difficulty indices of 0.142 (16/113) and 0.248 (28/113), and four items showed a positive discrimination against higher difficulty indices of 0.469 (53/113) to 0.673 (76/113). Conclusions The results suggest three characteristics of definitions of indicators such as those that are (1) unsupported by the current conditions in the health system, i.e., (a) data are required from a facility that cannot directly generate the data and, (b) definitions of indicators are not consistent with its corresponding program; (2) incomplete or ambiguous, which allow several interpretations; and (3) complete yet easily misunderstood by health workers. Taking systemic factors into account, the case identification step needs to be reviewed and designed to generate intended data in health information systems. PMID:21995369
Redintegration, task difficulty, and immediate serial recall tasks.
Ritchie, Gabrielle; Tolan, Georgina Anne; Tehan, Gerald
2015-03-01
While current theoretical models remain somewhat inconclusive in their explanation of short-term memory (STM), many theories suggest at least a contribution of long-term memory (LTM) to the short-term system. A number of researchers refer to this process as redintegration (e.g., Schweickert, 1993). Under short-term recall conditions, the current study investigated the effects of redintegration and task difficulty in order to extend research conducted by Neale and Tehan (2007). Thirty participants in Experiment 1 and 26 participants in Experiment 2 completed a serial recall task in which retention interval, presentation rate, and articulatory suppression were used to modify task difficulty. Redintegration was examined by manipulating the characteristics of the to-be-remembered items; lexicality in Experiment 1 and wordlikeness in Experiment 2. Responses were scored based on correct-in-position recall, item scoring, and order accuracy scoring. In line with the Neale and Tehan results, as the difficulty of the task increased so did the effects of redintegration. This was evident in that the advantage for words in Experiment 1 and wordlikeness in Experiment 2 decreased as task difficulty increased. This relationship was observed for item but not order memory, and findings were discussed in relation to the theory of redintegration. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
ERIC Educational Resources Information Center
Alberta Dept. of Education, Edmonton.
This document outlines the use of machine-scorable open-ended questions for the evaluation of Physics 30 in Alberta. Contents include: (1) an introduction to the questions; (2) sample instruction sheet; (3) fifteen sample items; (4) item information including the key, difficulty, and source of each item; (5) solutions to items having multiple…
2017-01-01
Background Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. Purpose To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. Method The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Findings Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Discussion Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. Conclusion The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses’ knowledge in palliative care and it is adequate to establish international comparisons. PMID:28545037
Chover-Sierra, Elena; Martínez-Sabater, Antonio; Lapeña-Moñux, Yolanda Raquel
2017-01-01
Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses' knowledge in palliative care and it is adequate to establish international comparisons.
Fractionating the Neural Substrates of Incidental Recognition Memory
ERIC Educational Resources Information Center
Greene, Ciara M.; Vidaki, Kleio; Soto, David
2015-01-01
Familiar stimuli are typically accompanied by decreases in neural response relative to the presentation of novel items, but these studies often include explicit instructions to discriminate old and new items; this creates difficulties in partialling out the contribution of top-down intentional orientation to the items based on recognition goals.…
ERIC Educational Resources Information Center
Gaitas, Sérgio; Alves Martins, Margarida
2017-01-01
This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…
Measuring and Predicting Graded Reader Difficulty
ERIC Educational Resources Information Center
Holster, Trevor A.; Lake, J. W.; Pellowe, William R.
2017-01-01
This study used many-faceted Rasch measurement to investigate the difficulty of graded readers using a 3-item survey. Book difficulty was compared with Kyoto Level, Yomiyasusa Level, Lexile Level, book length, mean sentence length, and mean word frequency. Word frequency and Kyoto Level were found to be ineffective in predicting students'…
Critical success factors in awareness of and choice towards low vision rehabilitation.
Fraser, Sarah A; Johnson, Aaron P; Wittich, Walter; Overbury, Olga
2015-01-01
The goal of the current study was to examine the critical factors indicative of an individual's choice to access low vision rehabilitation services. Seven hundred and forty-nine visually impaired individuals, from the Montreal Barriers Study, completed a structured interview and questionnaires (on visual function, coping, depression, satisfaction with life). Seventy-five factors from the interview and questionnaires were entered into a data-driven Classification and Regression Tree Analysis in order to determine the best predictors of awareness group: positive personal choice (I knew and I went), negative personal choice (I knew and did not go), and lack of information (Nobody told me, and I did not know). Having a response of moderate to no difficulty on item 6 (reading signs) of the Visual Function Index 14 (VF-14) indicated that the person had made a positive personal choice to seek rehabilitation, whereas reporting a great deal of difficulty on this item was associated with a lack of information on low vision rehabilitation. In addition to this factor, symptom duration of under nine years, moderate difficulty or less on item 5 (seeing steps or curbs) of the VF-14, and an indication of little difficulty or less on item 3 (reading large print) of the VF-14 further identified those who were more likely to have made a positive personal choice. Individuals in the lack of information group also reported greater difficulty on items 3 and 5 of the VF-14 and were more likely to be male. The duration-of-symptoms factor suggests that, even in the positive choice group, it may be best to offer rehabilitation services early. Being male and responding moderate difficulty or greater to the VF-14 questions about far, medium-distance and near situations involving vision was associated with individuals that lack information. Consequently, these individuals may need additional education about the benefits of low vision services in order to make a positive personal choice. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna
2014-01-01
This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.
Increased susceptibility to proactive interference in adults with dyslexia?
Bogaerts, Louisa; Szmalec, Arnaud; Hachmann, Wibke M; Page, Mike P A; Woumans, Evy; Duyck, Wouter
2015-01-01
Recent findings show that people with dyslexia have an impairment in serial-order memory. Based on these findings, the present study aimed to test the hypothesis that people with dyslexia have difficulties dealing with proactive interference (PI) in recognition memory. A group of 25 adults with dyslexia and a group of matched controls were subjected to a 2-back recognition task, which required participants to indicate whether an item (mis)matched the item that had been presented 2 trials before. PI was elicited using lure trials in which the item matched the item in the 3-back position instead of the targeted 2-back position. Our results demonstrate that the introduction of lure trials affected 2-back recognition performance more severely in the dyslexic group than in the control group, suggesting greater difficulty in resisting PI in dyslexia.
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10
ERIC Educational Resources Information Center
Livingston, Samuel A.; Dorans, Neil J.
2004-01-01
This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
ERIC Educational Resources Information Center
Lee, Young-Sun; Krishnan, Anita; Park, Yoon Soo
2012-01-01
The purpose of this study was to investigate psychometric properties of the Children's Depression Inventory within a nonclinical and longitudinal sample (8th and 12th grades). Using the Rasch rating scale, most items represented one dimension. There was adequate separation among items and no overlap between ranges of item difficulties with latent…
ERIC Educational Resources Information Center
Atalmis, Erkan Hasan
2016-01-01
Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Huprich, Steven K; Paggeot, Amy V; Samuel, Douglas B
2015-01-01
One-hundred sixty-nine psychiatric outpatients and 171 undergraduate students were assessed with the Personality Disorder Interview-IV (PDI-IV; Widiger, Mangine, Corbitt, Ellis, & Thomas, 1995) and the Structured Clinical Interview for DSM-IV Axis II disorders (SCID-II; First, Gibbon, Spitzer, Williams, & Benjamin, 1997) for borderline personality disorder (BPD). Eighty individuals met PDI-IV BPD criteria, whereas 34 met SCID-II BPD criteria. Dimensional ratings of both measures were highly intercorrelated (rs = .78, .75), and item-level interrater reliability fell in the good to excellent range. An item-response theory analysis was performed to investigate whether properties of the items from each interview could help understand these differences. The limited agreement seemed to be explained by differences in the response options across the two interviews. We found that suicidal behavior was among the most discriminating criteria on both instruments, whereas dissociation and difficulty controlling anger had the 2 lowest alpha parameter values. Finally, those meeting BPD criteria on both interviews had higher levels of anxiety, depression, and more impairments in object relations than those meeting criteria on just the PDI-IV. These findings suggest that the choice of measure has a notable effect on the obtained diagnostic prevalence and the level of BPD severity that is detected.
Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C
2017-04-12
Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level
ERIC Educational Resources Information Center
Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram
2013-01-01
In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment
ERIC Educational Resources Information Center
Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas
2015-01-01
Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties
ERIC Educational Resources Information Center
Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.
2010-01-01
This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Regression Effects in Angoff Ratings: Examples from Credentialing Exams
ERIC Educational Resources Information Center
Wyse, Adam E.
2018-01-01
This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
A Five-Year Evaluation of Examination Structure in a Cardiovascular Pharmacotherapy Course
Kolar, Claire; Janke, Kristin K.
2015-01-01
Objective. To evaluate the composition and effectiveness as an assessment tool of a criterion-referenced examination comprised of clinical cases tied to practice decisions, to examine the effect of varying audience response system (ARS) questions on student examination preparation, and to articulate guidelines for structuring examinations to maximize evaluation of student learning. Design. Multiple-choice items developed over 5 years were evaluated using Bloom’s Taxonomy classification, point biserial correlation, item difficulty, and grade distribution. In addition, examination items were classified into categories based on similarity to items used in ARS preparation. Assessment. As the number of items directly tied to clinical practice rose, Bloom’s Taxonomy level and item difficulty also rose. In examination years where Bloom’s levels were high but preparation was minimal, average grade distribution was lower compared with years in which student preparation was higher. Conclusion. Criterion-referenced examinations can benefit from systematic evaluation of their composition and effectiveness as assessment tools. Calculated design and delivery of classroom preparation is an asset in improving examination performance on rigorous, practice-relevant examinations. PMID:27168611
Kılıç, Aslı; Hoyer, William J; Howard, Marc W
2013-01-01
BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.
ERIC Educational Resources Information Center
Carroll, H. C. M.
2013-01-01
Two complementary studies of poor and better attenders are presented. To measure emotional and behavioural difficulties (EBD) different teacher-completed rating scales were employed, and to determine social difficulties, the studies used sociometry and some items from the scales. One study had a longitudinal design. It revealed that, after…
Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items
ERIC Educational Resources Information Center
Chen, Cheng-Te; Wang, Wen-Chung
2007-01-01
This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…
ERIC Educational Resources Information Center
Kim, Kyung Yong; Lee, Won-Chan
2017-01-01
This article provides a detailed description of three factors (specification of the ability distribution, numerical integration, and frame of reference for the item parameter estimates) that might affect the item parameter estimation of the three-parameter logistic model, and compares five item calibration methods, which are combinations of the…
Middle school students' reading comprehension of mathematical texts and algebraic equations
NASA Astrophysics Data System (ADS)
Duru, Adem; Koklu, Onder
2011-06-01
In this study, middle school students' abilities to translate mathematical texts into algebraic representations and vice versa were investigated. In addition, students' difficulties in making such translations and the potential sources for these difficulties were also explored. Both qualitative and quantitative methods were used to collect data for this study: questionnaire and clinical interviews. The questionnaire consisted of two general types of items: (1) selected-response (multiple-choice) items for which the respondent selects from multiple options and (2) open-ended items for which the respondent constructs a response. In order to further investigate the students' strategies while they were translating the given mathematical texts to algebraic equations and vice versa, five randomly chosen (n = 5) students were interviewed. Data were collected in the 2007-2008 school year from 185 middle-school students in five teachers' classrooms in three different schools in the city of Adıyaman, Turkey. After the analysis of data, it was found that students who participated in this study had difficulties in translating the mathematical texts into algebraic equations by using symbols. It was also observed that these students had difficulties in translating the symbolic representations into mathematical texts because of their weak reading comprehension. In addition, finding of this research revealed that students' difficulties in translating the given mathematical texts into symbolic representations or vice versa come from different sources.
Waller, Niels G; Feuerstahler, Leah
2017-01-01
In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).
Gabay, Yafit; Karni, Avi; Banai, Karen
2017-01-01
Speech perception can improve substantially with practice (perceptual learning) even in adults. Here we compared the effects of four training protocols that differed in whether and how task difficulty was changed during a training session, in terms of the gains attained and the ability to apply (transfer) these gains to previously un-encountered items (tokens) and to different talkers. Participants trained in judging the semantic plausibility of sentences presented as time-compressed speech and were tested on their ability to reproduce, in writing, the target sentences; trail-by-trial feedback was afforded in all training conditions. In two conditions task difficulty (low or high compression) was kept constant throughout the training session, whereas in the other two conditions task difficulty was changed in an adaptive manner (incrementally from easy to difficult, or using a staircase procedure). Compared to a control group (no training), all four protocols resulted in significant post-training improvement in the ability to reproduce the trained sentences accurately. However, training in the constant-high-compression protocol elicited the smallest gains in deciphering and reproducing trained items and in reproducing novel, untrained, items after training. Overall, these results suggest that training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous compared to conditions wherein severe distortions are presented to participants from the very beginning of the training session. PMID:28545039
Adaptive Mental Testing: The State of the Art
1979-11-01
typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
ERIC Educational Resources Information Center
van der Linden, Wim J.; Eggen, Theo J. H. M.
A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayes approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is indicated how a paired-comparisons design…
Assessment of item-writing flaws in multiple-choice questions.
Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John
2013-01-01
This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
Narimoto, Tadamasa; Matsuura, Naomi; Takezawa, Tomohiro; Mitsuhashi, Yoshinori; Hiratani, Michio
2013-01-01
The authors investigated whether impaired spatial short-term memory exhibited by children with nonverbal learning disabilities is due to a problem in the encoding process. Children with or without nonverbal learning disabilities performed a simple spatial test that required them to remember 3, 5, or 7 spatial items presented simultaneously in random positions (i.e., spatial configuration) and to decide if a target item was changed or all items including the target were in the same position. The results showed that, even when the spatial positions in the encoding and probe phases were similar, the mean proportion correct of children with nonverbal learning disabilities was 0.58 while that of children without nonverbal learning disabilities was 0.84. The authors argue with the results that children with nonverbal learning disabilities have difficulty encoding relational information between spatial items, and that this difficulty is responsible for their impaired spatial short-term memory.
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey
ERIC Educational Resources Information Center
Bulut, Okan; Kan, Adnan
2012-01-01
Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…
Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.
ERIC Educational Resources Information Center
O'Neill, Thomas R.; Lunz, Mary E.
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Rasch Based Analysis of Oral Proficiency Test Data.
ERIC Educational Resources Information Center
Nakamura, Yuji
2001-01-01
This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…
ERIC Educational Resources Information Center
Parish, Jane A.; Karisch, Brandi B.
2013-01-01
Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
Exploring the Manifestations of Anxiety in Children with Autism Spectrum Disorders
ERIC Educational Resources Information Center
Hallett, Victoria; Lecavalier, Luc; Sukhodolsky, Denis G.; Cipriano, Noreen; Aman, Michael G.; McCracken, James T.; McDougle, Christopher J.; Tierney, Elaine; King, Bryan H.; Hollander, Eric; Sikich, Linmarie; Bregman, Joel; Anagnostou, Evdokia; Donnelly, Craig; Katsovich, Lily; Dukes, Kimberly; Vitiello, Benedetto; Gadow, Kenneth; Scahill, Lawrence
2013-01-01
This study explores the manifestation and measurement of anxiety symptoms in 415 children with ASDs on a 20-item, parent-rated, DSM-IV referenced anxiety scale. In both high and low-functioning children (IQ above vs. below 70), commonly endorsed items assessed restlessness, tension and sleep difficulties. Items requiring verbal expression of worry…
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items
ERIC Educational Resources Information Center
Michaelides, Michalis P.
2010-01-01
The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Rapp, B; Caramazza, A
1997-02-01
We describe the case of a brain-damaged individual whose speech is characterized by difficulty with practically all words except for elements of the closed class vocabulary. In contrast, his written sentence production exhibits a complementary impairment involving the omission of closed class vocabulary items and the relative sparing of nouns. On the basis of these differences we argue: (1) that grammatical categories constitute an organizing parameter of representation and/or processing for each of the independent, modality-specific lexicons, and (2) that these observations contribute to the growing evidence that access to the orthographic and phonological forms of words can occur independently.
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
2018-07-01
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Benaïm, C; Perennou, D-A; Pelissier, J-Y; Daures, J-P
2010-02-01
Many clinical scales contain items that are scored separately prior to being compiled into a single score. However, if the items have different degrees of importance, they should be weighted differently before being compiled. The principal aims of this study were to show how the "analytic hierarchy process" (AHP), which has never been used for this purpose, can be applied to weighting the six items of the "London handicap scale", and to compare the AHP to the "conjoint analysis" (CA), which was previously implemented by Harwood et al. (1994) [1]. In order to assess the relative importance of the six items, we submitted AHP and CA to a group of 10 physiatrists. We compared the methods in terms of item ranking according to importance, assessment of fictitious patients based on weights determined by each method, and perceived difficulty by the physiatrist. For both techniques, "Physical independence" (PHY) was the best-weighted item, but other ranks varied depending on the technique. AHP was better than CA in terms of accuracy (global assessment of the clinical status) and perceived difficulty. AHP may be used to reveal the importance that experts assign to the items of a multidimensional scale, and to calculate the appropriate weights for specific items. For this purpose, AHP seems to be more accurate than CA.
Validity of a Protocol for Adult Self-Report of Dyslexia and Related Difficulties
ERIC Educational Resources Information Center
Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles
2012-01-01
Background: There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. Method: 417 adults completed a protocol comprising a 15-item questionnaire rating reading and…
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Psychometric Consequences of Subpopulation Item Parameter Drift
ERIC Educational Resources Information Center
Huggins-Manley, Anne Corinne
2017-01-01
This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…
ERIC Educational Resources Information Center
Brese, Falk, Ed.
2012-01-01
The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
Gerrard, Paul
2013-01-01
Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
Yuen, Eva; Knight, Tess; Dodson, Sarity; Chirgwin, Jacqueline; Busija, Lucy; Ricciardelli, Lina A; Burney, Susan; Parente, Phillip; Livingston, Patricia M
2018-05-01
Caregivers have been largely neglected in health literacy measurement. We assess the construct validity, and internal consistency of the Health Literacy of Caregivers Scale-Cancer (HLCS-C), and present a revised, psychometrically robust scale. Using data from 297 cancer caregivers (12.4% response rate) recruited from Melbourne, Australia between January-July 2014, confirmatory factor analysis (CFA) was conducted to evaluate the HLCS-C's proposed factor structure. Items were evaluated for: item difficulty, unidimensionality and overall item fit within their domain. Item-threshold-ordering was examined though one-parameter Item Response Theory models. Internal consistency was assessed using Raykov's reliability coefficient. CFA results identified 42 poorly performing/redundant items which were subsequently removed. A 10-factor model was fitted to 46 acceptable items with no correlated residuals or factor cross-loadings accepted. Adequate fit was revealed (χ 2 WLSMV = 1463.807[df = 944], p < .001, RMSEA = 0.043, CFI = 0.980, TLI = 0.978, WRMR = 1.00). Ten domains were identified: Proactivity and determination to seek information; Adequate information about cancer and cancer management; Supported by healthcare providers (HCP) to understand information; Social support; Cancer-related communication with the care recipient (CR); Understanding CR needs and preferences; Self-care; Understanding the healthcare system; Capacity to process health information; and Active engagement with HCP. Internal consistency was adequate across domains (0.78-0.92). The revised HLCS-C demonstrated good structural, convergent, and discriminant validity, and high internal consistency. The scale may be useful for the development and evaluation of caregiver interventions. © 2017 John Wiley & Sons Ltd.
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly
ERIC Educational Resources Information Center
Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.
2013-01-01
Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
ERIC Educational Resources Information Center
Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet
2012-01-01
Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…
Michaelides, Michalis P.
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
Michaelides, Michalis P
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
ERIC Educational Resources Information Center
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan
2014-01-01
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
ERIC Educational Resources Information Center
Magno, Carlo
2009-01-01
The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
The Accuracy of Estimated Total Test Statistics. Final Report.
ERIC Educational Resources Information Center
Kleinke, David J.
In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
ERIC Educational Resources Information Center
Kibble, Jonathan D.; Johnson, Teresa
2011-01-01
The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The…
ERIC Educational Resources Information Center
Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.
2017-01-01
Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
ERIC Educational Resources Information Center
Planinic, Maja; Boone, William J.; Krsnik, Rudolf; Beilfuss, Meredith L.
2006-01-01
Croatian 1st-year and 3rd-year high-school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear…
A measure of early physical functioning (EPF) post-stroke.
Finch, Lois E; Higgins, Johanne; Wood-Dauphinee, Sharon; Mayo, Nancy E
2008-07-01
To develop a comprehensive measure of Early Physical Functioning (EPF) post-stroke quantified through Rasch analysis and conceptualized using the International Classification of Functioning Disability and Health (ICF). An observational cohort study. A cohort of 262 subjects (mean age 71.6 (standard deviation 12.5) years) hospitalized post-acute stroke. Functional assessments were made within 3 days of stroke with items from valid and reliable indices commonly utilized to evaluate stroke survivors. Information on important variables was also collected. Principal component and Rasch analysis confirmed the factor structure, and dimensionality of the measure. Rasch analysis combined items across ICF components to develop the measure. Items were deleted iteratively, those retained fit the model and were related to the construct; reliability and validity were assessed. A 38-item unidimensional measure of the EPF met all Rasch model requirements. The item difficulty matched the person ability (mean person measure: -0.31; standard error 0.37 logits), reliability of the person-item-hierarchy was excellent at 0.97. Initial validity was adequate. The 38-item EPF measure was developed. It expands the range of assessment post acute stroke; it covers a broad spectrum of difficulty with good initial psychometric properties that, once revalidated, can assist in planning and evaluating early interventions.
Hällgren, Monica; Nygård, Louise; Kottorp, Anders
2014-05-01
While the development and possibilities of technology today are commonly regarded to be unlimited, knowledge regarding the technological needs of people with mental retardation is fairly limited. The aim of this study was to enhance knowledge of perceived relevance and difficulty in using everyday technology (ET) such as stoves, cell phones, and elevators in adults with mental retardation. 120 participants with different levels of mental retardation were interviewed with the Everyday Technology Use Questionnaire (ETUQ) about their use of such technologies in their everyday life. Analyses of variance, post hoc tests, and regression analyses were used to explore the data. Participants with moderate and severe mental retardation differed in mean perceived difficulty from those with mild mental retardation, suggesting that increased perceived difficulty in ET use is related to the level of mental retardation. Differences between groups were also found in the proportion of items that were relevant for each person. The variables Level of Mental Retardation, Additional Disabilities, and Proportional Relevance of ET Items could together predict 67.2% of the variation in perceived difficulty in technology use. The findings also indicate that age, housing, gender, and geographical district do not covariate with perceived difficulty in ET use.
CTTITEM: SAS macro and SPSS syntax for classical item analysis.
Lei, Pui-Wa; Wu, Qiong
2007-08-01
This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
Computerized Adaptive Testing with Item Clones. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; van der Linden, Wim J.
To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Applicability of the Newtonian gravity concept inventory to introductory college physics classes
NASA Astrophysics Data System (ADS)
Williamson, Kathryn; Prather, Edward E.; Willoughby, Shannon
2016-06-01
The study described here extends the applicability of the Newtonian Gravity Concept Inventory (NGCI) to college algebra-based physics classes, beyond the general education astronomy courses for which it was originally developed. The four conceptual domains probed by the NGCI (Directionality, Force Law, Independence of Other Forces, and Threshold) are well suited for investigating students' reasoning about gravity in both populations, making the NGCI a highly versatile instrument. Classical test theory statistical analysis with physics student responses pre-instruction (N = 1,392) and post-instruction (N = 929) from eight colleges and universities across the United States indicate that the NGCI is composed of items with appropriate difficulty and discrimination and is reliable for this population. Also, expert review and student interviews support the NGCI's validity for the physics population. Emergent similarities and differences in how physics students reason about gravity compared to astronomy students are discussed, as well as future directions for analyzing the instrument's item parameters across both populations.
Parent outcome expectancies for purchasing fruit and vegetables: a validation.
Baranowski, Tom; Watson, Kathy; Missaghian, Mariam; Broadfoot, Alison; Baranowski, Janice; Cullen, Karen; Nicklas, Theresa; Fisher, Jennifer; O'Donnell, Sharon
2007-03-01
To validate four scales -- outcome expectancies for purchasing fruit and for purchasing vegetables, and comparative outcome expectancies for purchasing fresh fruit and for purchasing fresh vegetables versus other forms of fruit and vegetables (F&V). Survey instruments were administered twice, separated by 6 weeks. Recruited in front of supermarkets and grocery stores; interviews conducted by telephone. One hundred and sixty-one food shoppers with children (18 years or younger). Single dimension scales were specified for fruit and for vegetable purchasing outcome expectancies, and for comparative (fresh vs. other) fruit and vegetable purchasing outcome expectancies. Item Response Theory parameter estimates revealed easily interpreted patterns in the sequence of items by difficulty of response. Fruit and vegetable purchasing and fresh fruit comparative purchasing outcome expectancy scales were significantly correlated with home F&V availability, after controlling for social desirability of response. Comparative fresh vegetable outcome expectancy scale was significantly bivariately correlated with home vegetable availability, but not after controlling for social desirability. These scales are available to help better understand family F&V purchasing decisions.
ERIC Educational Resources Information Center
Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.
2011-01-01
Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…
Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan
2016-01-01
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan
2016-01-01
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability. PMID:26941699
ERIC Educational Resources Information Center
Arce-Ferrer, Alvaro J.; Bulut, Okan
2017-01-01
This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
ERIC Educational Resources Information Center
Tian, Wei; Cai, Li; Thissen, David; Xin, Tao
2013-01-01
In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Building an Evaluation Scale using Item Response Theory.
Lalor, John P; Wu, Hao; Yu, Hong
2016-11-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory
Lalor, John P.; Wu, Hao; Yu, Hong
2016-01-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
A signal detection-item response theory model for evaluating neuropsychological measures.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
2018-02-05
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Development of the Sexual Minority Adolescent Stress Inventory
Schrager, Sheree M.; Goldbach, Jeremy T.; Mamey, Mary Rose
2018-01-01
Although construct measurement is critical to explanatory research and intervention efforts, rigorous measure development remains a notable challenge. For example, though the primary theoretical model for understanding health disparities among sexual minority (e.g., lesbian, gay, bisexual) adolescents is minority stress theory, nearly all published studies of this population rely on minority stress measures with poor psychometric properties and development procedures. In response, we developed the Sexual Minority Adolescent Stress Inventory (SMASI) with N = 346 diverse adolescents ages 14–17, using a comprehensive approach to de novo measure development designed to produce a measure with desirable psychometric properties. After exploratory factor analysis on 102 candidate items informed by a modified Delphi process, we applied item response theory techniques to the remaining 72 items. Discrimination and difficulty parameters and item characteristic curves were estimated overall, within each of 12 initially derived factors, and across demographic subgroups. Two items were removed for excessive discrimination and three were removed following reliability analysis. The measure demonstrated configural and scalar invariance for gender and age; a three-item factor was excluded for demonstrating substantial differences by sexual identity and race/ethnicity. The final 64-item measure comprised 11 subscales and demonstrated excellent overall (α = 0.98), subscale (α range 0.75–0.96), and test–retest (scale r > 0.99; subscale r range 0.89–0.99) reliabilities. Subscales represented a mix of proximal and distal stressors, including domains of internalized homonegativity, identity management, intersectionality, and negative expectancies (proximal) and social marginalization, family rejection, homonegative climate, homonegative communication, negative disclosure experiences, religion, and work domains (distal). Thus, the SMASI development process illustrates a method to incorporate information from multiple sources, including item response theory models, to guide item selection in building a psychometrically sound measure. We posit that similar methods can be used to improve construct measurement across all areas of psychological research, particularly in areas where a strong theoretical framework exists but existing measures are limited. PMID:29599737
Linking Item Parameters to a Base Scale
ERIC Educational Resources Information Center
Kang, Taehoon; Petersen, Nancy S.
2012-01-01
This paper compares three methods of item calibration--concurrent calibration, separate calibration with linking, and fixed item parameter calibration--that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord in "Appl Psychol Measure"…
A Note on the Item Information Function of the Four-Parameter Logistic Model
ERIC Educational Resources Information Center
Magis, David
2013-01-01
This article focuses on four-parameter logistic (4PL) model as an extension of the usual three-parameter logistic (3PL) model with an upper asymptote possibly different from 1. For a given item with fixed item parameters, Lord derived the value of the latent ability level that maximizes the item information function under the 3PL model. The…
ERIC Educational Resources Information Center
Karkee, Thakur B.; Wright, Karen R.
2004-01-01
Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…
Haggerty, Jeannie L; Levesque, Jean-Frédéric
2017-04-01
Patients are the most valid source for evaluating the accessibility of services, but a previous study observed differential psychometric performance of instruments in rural and urban respondents. To validate a measure of organizational accessibility free of differential rural-urban performance that predicts consequences of difficult access for patient-initiated care. Sequential qualitative-quantitative study. Qualitative findings used to adapt or develop evaluative and reporting items. Quantitative validation study. Primary data by telephone from 750 urban, rural and remote respondents in Quebec, Canada; follow-up mailed questionnaire to a subset of 316. Items were developed for barriers along the care trajectory. We used common factor and confirmatory factor analysis to identify constructs and compare models. We used item response theory analysis to test for differential rural-urban performance; examine individual item performance; adjust response options; and exclude redundant or non-discriminatory items. We used logistic regression to examine predictive validity of the subscale on access difficulty (outcome). Initial factor resolution suggested geographic and organizational dimensions, plus consequences of access difficulty. After second administration, organizational accommodation and geographic indicators were integrated into a 6-item subscale of Effective Availability and Accommodation, which demonstrates good variability and internal consistency (α = 0.84) and no differential functioning by geographic area. Each unit increase predicts decreased likelihood of consequences of access difficulties (unmet need and problem aggravation). The new subscale is a practical, valid and reliable measure for patients to evaluate first-contact health services accessibility, yielding valid comparisons between urban and rural contexts. © 2016 The Authors. Health Expectations published by John Wiley & Sons Ltd.
O'Brien, Kelly K; Bayoumi, Ahmed M; Stratford, Paul; Solomon, Patricia
2015-01-01
To assess the dimensions of disability measured by the HIV Disability Questionnaire (HDQ), a newly developed 72-item self-administered questionnaire that describes the presence, severity and episodic nature of disability experienced by people living with HIV. We recruited adults living with HIV from hospital clinics, AIDS service organizations and a specialty hospital and administered the HDQ followed by a demographic questionnaire. We conducted an exploratory factor analysis using disability severity scores to determine the domains of disability in the HDQ. We used the following steps: (a) ensured correlations between items were >0.30 and <0.80; (b) conducted a principal components analysis to extract factors; (c) used the Scree Test and eigenvalue threshold >1.5 to determine the number of factors to retain; and d) used oblique rotation to simplify the factor loading matrix. We assigned items to factors based on factor loadings of >0.30. Of the 361 participants, 80% were men and 77% reported living with at least two concurrent health conditions in addition to HIV. The exploratory factor analysis suggested retaining six factors. Items related to symptoms and impairments loaded on three factors (physical [20 items], cognitive [3 items], and mental and emotional health [11 items]) and items related to worrying about the future, daily activities, and personal relationships loaded on three additional factors (uncertainty [14 items], difficulties with day-to-day activities [9 items], social inclusion [12 items]). The HDQ has six domains: physical symptoms and impairments; cognitive symptoms and impairments; mental and emotional health symptoms and impairments; uncertainty; difficulties with day-to-day activities and challenges to social inclusion. These domains establish the scoring structure for the dimensions of disability measured by the HDQ. Implications for Rehabilitation As individuals live longer and age with HIV, they may be living with the health-related consequences of HIV and concurrent health conditions, a concept that may be termed disability. Measuring disability is important to understand the impact of HIV and its comorbidities. The HIV Disability Questionnaire (HDQ) is a self-administered questionnaire developed to describe the presence, severity and episodic nature of disability experienced by people living with HIV. The HDQ is comprised of six domains of disability including: physical symptoms and impairments (20 items); cognitive symptoms and impairments (3 items); mental and emotional health symptoms and impairments (11 items); uncertainty (14 items); difficulties with day-to-day activities (9 items) and challenges to social inclusion (12 items). These domains represent the dimensions of disability measured by the HDQ. The HDQ is the first known HIV-specific disability measure for adults living with HIV. The HDQ may be used by clinicians and researchers to assess disability experienced by adults living with HIV.
Coons, Stephen Joel; Chongpison, Yuda; Wendel, Christopher S; Grant, Marcia; Krouse, Robert S
2007-09-01
To explore whether there was a significant relationship between difficulty paying for ostomy supplies and overall quality of life among a sample of ostomates receiving care from the Veterans Health Administration (VHA). The data were collected as part of the Veterans Affairs (VA) Ostomy Health-Related Quality of Life Study, in which 511 respondents (239 cases, 272 controls) completed a survey instrument that included the modified City of Hope Quality of Life (mCOH-QOL) Ostomy questionnaire, SF-36V, and sociodemographic items. Responses from the 239 cases (ie, patients with intestinal stomas) were used in this analysis. The modified City of Hope Quality of Life Ostomy questionnaire item, "How good is your overall quality of life?," was the dependent variable for this analysis. The primary independent variable was the response (yes/no) to the item, "If you pay for any of the (ostomy) costs, is it difficult for you?" A hierarchical regression model was used to examine whether difficulty paying was significantly related to overall quality of life after adjusting for age, income, race/ethnicity, and physical health. After accounting for the proportion of variance explained by age, income, race/ethnicity, and physical health, the additional proportion of variance explained by difficulty paying was statistically significant. Individuals reporting difficulty paying had a roughly 1 point lower (ie, beta-coefficient = -1.052; SE = 0.481) overall quality of life score on the 11-point scale. We found a significant association between difficulty paying for ostomy supplies and overall quality of life. Although the cross-sectional study design does not allow causal inference, the results suggest a relationship that merits further examination.
ERIC Educational Resources Information Center
Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.
2010-01-01
Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
ERIC Educational Resources Information Center
Semino, Sara; Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.
2018-01-01
Autism Spectrum Disorder (ASD) is generally associated with difficulties in contextual source memory but not single item memory. There are surprising inconsistencies in the literature, however, that the current study seeks to address by examining item and source memory in age and ability matched groups of 22 ASD and 21 comparison adults. Results…
Arnould, Carlyne; Vandervelde, Laure; Batcho, Charles Sèbiyo; Penta, Massimo; Thonnard, Jean-Louis
2012-01-01
Objectives Several ABILHAND Rasch-built manual ability scales were previously developed for chronic stroke (CS), cerebral palsy (CP), rheumatoid arthritis (RA), systemic sclerosis (SSc) and neuromuscular disorders (NMD). The present study aimed to explore the applicability of a generic manual ability scale unbiased by diagnosis and to study the nature of manual ability across diagnoses. Design Cross-sectional study. Setting Outpatient clinic homes (CS, CP, RA), specialised centres (CP), reference centres (CP, NMD) and university hospitals (SSc). Participants 762 patients from six diagnostic groups: 103 CS adults, 113 CP children, 112 RA adults, 156 SSc adults, 124 NMD children and 124 NMD adults. Primary and secondary outcome measures Manual ability as measured by the ABILHAND disease-specific questionnaires, diagnosis and nature (ie, uni-manual or bi-manual involvement and proximal or distal joints involvement) of the ABILHAND manual activities. Results The difficulties of most manual activities were diagnosis dependent. A principal component analysis highlighted that 57% of the variance in the item difficulty between diagnoses was explained by the symmetric or asymmetric nature of the disorders. A generic scale was constructed, from a metric point of view, with 11 items sharing a common difficulty among diagnoses and 41 items displaying a category-specific location (asymmetric: CS, CP; and symmetric: RA, SSc, NMD). This generic scale showed that CP and NMD children had significantly less manual ability than RA patients, who had significantly less manual ability than CS, SSc and NMD adults. However, the generic scale was less discriminative and responsive to small deficits than disease-specific instruments. Conclusions Our finding that most of the manual item difficulties were disease-dependent emphasises the danger of using generic scales without prior investigation of item invariance across diagnostic groups. Nevertheless, a generic manual ability scale could be developed by adjusting and accounting for activities perceived differently in various disorders. PMID:23117570
ERIC Educational Resources Information Center
Sinharay, Sandip
2015-01-01
The maximum likelihood estimate (MLE) of the ability parameter of an item response theory model with known item parameters was proved to be asymptotically normally distributed under a set of regularity conditions for tests involving dichotomous items and a unidimensional ability parameter (Klauer, 1990; Lord, 1983). This article first considers…
Linking Item Parameters to a Base Scale. ACT Research Report Series, 2009-2
ERIC Educational Resources Information Center
Kang, Taehoon; Petersen, Nancy S.
2009-01-01
This paper compares three methods of item calibration--concurrent calibration, separate calibration with linking, and fixed item parameter calibration--that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord (1983) characteristic curve method…
ASCAL: A Microcomputer Program for Estimating Logistic IRT Item Parameters.
ERIC Educational Resources Information Center
Vale, C. David; Gialluca, Kathleen A.
ASCAL is a microcomputer-based program for calibrating items according to the three-parameter logistic model of item response theory. It uses a modified multivariate Newton-Raphson procedure for estimating item parameters. This study evaluated this procedure using Monte Carlo Simulation Techniques. The current version of ASCAL was then compared to…
Lawton IADL scale in dementia: can item response theory make it more informative?
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
2014-07-01
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Agency attributions of mental effort during self-regulated learning.
Koriat, Asher
2018-04-01
Previous results suggest that the monitoring of one's own performance during self-regulated learning is mediated by self-agency attributions and that these attributions can be influenced by poststudy effort-framing instructions. These results pose a challenge to the study of issues of self-agency in metacognition when the objects of self-regulation are mental operations rather than motor actions that have observable outcomes. When participants studied items in Experiment 1 under time pressure, they invested greater study effort in the easier items in the list. However, the effects of effort framing were the same as when learners typically invest more study effort in the more difficult items: Judgments of learning (JOLs) decreased with effort when instructions biased the attribution of effort to nonagentic sources but increased when they biased attribution to agentic sources. However, the effects of effort framing were constrained by parameters of the study task: Interitem differences in difficulty constrained the attribution of effort to agentic regulation (Experiment 2) whereas interitem differences in the incentive for recall constrained the attribution of effort to nonagentic sources (Experiment 3). The results suggest that the regulation and attribution of effort during self-regulated learning occur within a module that is dissociated from the learner's superordinate agenda but is sensitive to parameters of the task. A model specifies the stage at which effort framing affects the effort-JOL relationship by biasing the attribution of effort to agentic or nonagentic sources. The potentialities that exist in metacognition for the investigation of issues of self-agency are discussed.
Sample Size and Item Parameter Estimation Precision When Utilizing the One-Parameter "Rasch" Model
ERIC Educational Resources Information Center
Custer, Michael
2015-01-01
This study examines the relationship between sample size and item parameter estimation precision when utilizing the one-parameter model. Item parameter estimates are examined relative to "true" values by evaluating the decline in root mean squared deviation (RMSD) and the number of outliers as sample size increases. This occurs across…
ERIC Educational Resources Information Center
Ackerman, Brian P.; And Others
1990-01-01
Results of four experiments show that developmental differences in elaborative conceptual processing at acquisition and retrieval contribute independently to developmental increases in recall. Item identification processes for both words and pictures constrain children's elaborative processing. The constraints are time limited. (RH)
Treatment of Not-Administered Items on Individually Administered Intelligence Tests
ERIC Educational Resources Information Center
He, Wei; Wolfe, Edward W.
2012-01-01
In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…
ERIC Educational Resources Information Center
DeMars, Christine E.
2012-01-01
In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
ERIC Educational Resources Information Center
Finch, Holmes
2010-01-01
The accuracy of item parameter estimates in the multidimensional item response theory (MIRT) model context is one that has not been researched in great detail. This study examines the ability of two confirmatory factor analysis models specifically for dichotomous data to properly estimate item parameters using common formulae for converting factor…
ERIC Educational Resources Information Center
Yen, Wendy M.
The extent, causes, and importance of context effects on item parameters for one- and three-parameter latent-trait models were examined. Items were taken from the California Achievement Tests Reading Comprehension and Mathematics Concepts and Applications subtests. The reading items were administered to 1,678 fourth-grade students, and the…
ERIC Educational Resources Information Center
Matthews-Lopez, Joy L.; Hombo, Catherine M.
The purpose of this study was to examine the recovery of item parameters in simulated Automatic Item Generation (AIG) conditions, using Markov chain Monte Carlo (MCMC) estimation methods to attempt to recover the generating distributions. To do this, variability in item and ability parameters was manipulated. Realistic AIG conditions were…
Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri
2017-05-13
The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
Anesthesiology Journal club assessment by means of semantic changes.
Vieira, Joaquim Edson; Torres, Marcelo Luís Abramides; Pose, Regina Albanese; Auler, José Otávio Costa Junior
2014-01-01
the interactive approach of a journal club has been described in the medical education literature. The aim of this investigation is to present an assessment of journal club as a tool to address the question whether residents read more and critically. this study reports the performance of medical residents in anesthesiology from the Clinics Hospital - University of São Paulo Medical School. All medical residents were invited to answer five questions derived from discussed papers. The answer sheet consisted of an affirmative statement with a Likert type scale (totally disagree-disagree-not sure-agree-totally agree), each related to one of the chosen articles. The results were evaluated by means of item analysis - difficulty index and discrimination power. residents filled one hundred and seventy three evaluations in the months of December 2011 (n=51), July 2012 (n=66) and December 2012 (n=56). The first exam presented all items with straight statement, second and third exams presented mixed items. Separating "totally agree" from "agree" increased the difficulty indices, but did not improve the discrimination power. the use of a journal club assessment with straight and inverted statements and by means of five points scale for agreement has been shown to increase its item difficulty and discrimination power. This may reflect involvement either with the reading or the discussion during the journal meeting. Copyright © 2013 Sociedade Brasileira de Anestesiologia. Published by Elsevier Editora Ltda. All rights reserved.
Constructing three emotion knowledge tests from the invariant measurement approach
Prieto, Gerardo; Burin, Debora I.
2017-01-01
Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
ERIC Educational Resources Information Center
Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.
2012-01-01
This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
Item analysis of three Spanish naming tests: a cross-cultural investigation.
Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Examination of the item structure of the Alberta infant motor scale.
Liao, Pai-Jun M; Campbell, Suzann K
2004-01-01
The Alberta Infant Motor Scale (AIMS) is a screening tool for identifying delayed motor development from birth to 18 months of age. The purpose of this study was to examine the psychometric structure of the AIMS, including the hierarchical scale of items and the precision for measuring infant ability at different ages. Ninety-seven infants with varying degrees of risk of developmental disability were recruited from three hospitals or from the community in the Chicago metropolitan area. Infants were tested on the AIMS at three, six, nine, and 12 months of age. The hierarchical structure and the range and distribution of item difficulty on the AIMS were analyzed using Rasch psychometric analysis. The Rasch analysis confirmed that items for each of the four testing positions (supine, prone, sitting, and standing) were arranged in increasing order of difficulty, but a ceiling effect was present. Gaps exist at six ability levels, indicating low precision of measurement for differentiating among infants after about nine months of age. The AIMS shows a ceiling effect, measures infant ability best from three to nine months of age, and has few items available for discriminating among infants after they pass the controlled lowering through standing item. Clinical impressions should be drawn with caution at ages when the precision of measurement is low.
Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra
2012-03-13
Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.
ERIC Educational Resources Information Center
Marco, Gary L.; And Others
Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…
A Rasch measure of teachers' views of teacher-student relationships in the primary school.
Leitao, Natalie; Waugh, Russell F
2012-01-01
This study investigated teacher-student relationships from the teachers' point of view at Perth metropolitan schools in Western Australia. The study identified three key social and emotional aspects that affect teacher-student relationships, namely, Connectedness, Availability and Communication. Data were collected by questionnaire (N = 139) with stem-items answered in three perspectives: (1) Idealistic: this is what I would like to happen; (2) Capability: this is what I am capable of; and (3) Behaviour: this is what actually happens, using four ordered response categories: not at all (score 1), some of the time (score 2), most of the time (score 3), and almost always (score 4). Data were analysed with a Rasch measurement model and a uni-dimensional, linear scale with 24 items, ordered from easy to hard, was created. The data were shown to be highly reliable, so that valid inferences could be made from the scale. The Person Separation Index (akin to a reliability index) was 0.93; there was good global teacher and item fit to the measurement model; there was good item fit; the targeting of the item difficulties against the teacher measures was good, and the response categories were answered consistently and logically. Teachers said that the ideal items were all easier than their corresponding capability items which were in turn easier than the behaviour items (where the items fitted the model), as conceptualized. The easiest ideal items were: I like this child and This child and I get along well together. The hardest ideal item (but still easy) was: I am available for this child. The easiest behaviour item (but still hard) was: This child and I get along well together. The hardest behaviour item (and very hard) was: I am interested to learn about this child's personal thoughts, feelings and experiences. The difficulties of the items supported the conceptual structure of the variable.
ERIC Educational Resources Information Center
Cacchione, Trix; Indino, Marcello; Fujita, Kazuo; Itakura, Shoji; Matsuno, Toyomi; Schaub, Simone; Amici, Federica
2014-01-01
Previous research has demonstrated that adults are successful at visually tracking rigidly moving items, but experience great difficulties when tracking substance-like "pouring" items. Using a comparative approach, we investigated whether the presence/absence of the grammatical count-mass distinction influences adults and children's…
The Handling of Missing Binary Data in Language Research
ERIC Educational Resources Information Center
Pichette, François; Béland, Sébastien; Jolani, Shahab; Lesniewska, Justyna
2015-01-01
Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Graham, 2002) that data…
Decimal Fraction Arithmetic: Logical Error Analysis and Its Validation.
ERIC Educational Resources Information Center
Standiford, Sally N.; And Others
This report illustrates procedures of item construction for addition and subtraction examples involving decimal fractions. Using a procedural network of skills required to solve such examples, an item characteristic matrix of skills analysis was developed to describe the characteristics of the content domain by projected student difficulties. Then…
Mutual Information Item Selection in Adaptive Classification Testing
ERIC Educational Resources Information Center
Weissman, Alexander
2007-01-01
A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…
Bootstrap Standard Errors for Maximum Likelihood Ability Estimates When Item Parameters Are Unknown
ERIC Educational Resources Information Center
Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi
2014-01-01
When item parameter estimates are used to estimate the ability parameter in item response models, the standard error (SE) of the ability estimate must be corrected to reflect the error carried over from item calibration. For maximum likelihood (ML) ability estimates, a corrected asymptotic SE is available, but it requires a long test and the…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.
ERIC Educational Resources Information Center
Kaskowitz, Gary S.; De Ayala, R. J.
2001-01-01
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
Equating with Miditests Using IRT
ERIC Educational Resources Information Center
Fitzpatrick, Joseph; Skorupski, William P.
2016-01-01
The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…
A Test of the Similar Sequence Hypothesis.
ERIC Educational Resources Information Center
Silverstein, A. B.; And Others
1982-01-01
Scales for object permanence and spatial relationships were administered to 98 severely and profoundly mentally retarded children (mean age 13 years) on three occasions, 6 months apart. Differences in the difficulty of the items were quite stable, but their order of difficulty differed appreciably from that for nonretarded infants. (Author/SB)
Reproduction of Inflectional Markers in French-Speaking Children with Reading Impairment
ERIC Educational Resources Information Center
St-Pierre, Marie-Catherine; Beland, Renee
2010-01-01
Purpose: Children with reading impairment (RI) experience difficulties in oral and written production of inflectional markers. The origin of these difficulties is not well documented in French. According to some authors, acquisition of irregular items by typically developing children is predicted by token frequency, whereas acquisition of regular…
Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)
Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur
2016-01-01
We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Influence of Context on Item Parameters in Forced-Choice Personality Assessments
ERIC Educational Resources Information Center
Lin, Yin; Brown, Anna
2017-01-01
A fundamental assumption in computerized adaptive testing is that item parameters are invariant with respect to context--items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the…
Item Response Theory Equating Using Bayesian Informative Priors.
ERIC Educational Resources Information Center
de la Torre, Jimmy; Patz, Richard J.
This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.
ERIC Educational Resources Information Center
Wetzel, C. Douglas; McBride, James R.
Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
ERIC Educational Resources Information Center
Palmer, D. G.
This publication presents an organized collection of biology questions, designed for use in evaluation at the secondary level in Tasmania. Each item has been tried for quality and is accompanied by its difficulty percentage as well as by its content area and the mental processes required to answer it. The content areas include: Diversity,…
Development and assessment of floor and ceiling items for the PROMIS physical function item bank
2013-01-01
Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166
Selivanova, Alexandra; Shin, Hyun Joon; Miller, Joan W.; Jackson, Mary Lou
2018-01-01
Purpose Vision loss from age-related macular degeneration (AMD) has a profound effect on vision-related quality of life (VRQoL). The pupose of this study is to identify clinical factors associated with VRQoL using the Rasch- calibrated NEI VFQ-25 scales in bilateral advanced AMD patients. Methods We retrospectively reviewed 47 patients (mean age 83.2 years) with bilateral advanced AMD. Clinical assessment included age, gender, type of AMD, high contrast visual acuity (VA), history of medical conditions, contrast sensitivity (CS), central visual field loss, report of Charles Bonnet Syndrome, current treatment for AMD and Rasch-calibrated NEI VFQ-25 visual function and socioemotional function scales. The NEI VFQ visual function scale includes items of general vision, peripheral vision, distance vision and near vision-related activity while the socioemotional function scale includes items of vision related-social functioning, role difficulties, dependency, and mental health. Multiple regression analysis (structural regression model) was performed using fixed item parameters obtained from the one-parameter item response theory model. Results Multivariate analysis showed that high contrast VA and CS were two factors influencing VRQoL visual function scale (β = -0.25, 95% CI-0.37 to -0.12, p<0.001 and β = 0.35, 95% CI 0.25 to 0.46, p<0.001) and socioemontional functioning scale (β = -0.2, 95% CI -0.37 to -0.03, p = 0.023, and β = 0.3, 95% CI 0.18 to 0.43, p = 0.001). Central visual field loss was not assoicated with either VRQoL visual or socioemontional functioning scale (β = -0.08, 95% CI-0.28 to 0.12,p = 0.44 and β = -0.09, 95% CI -0.03 to 0.16, p = 0.50, respectively). Conclusion In patients with vision impairment secondary to bilateral advanced AMD, high contrast VA and CS are two important factors affecting VRQoL. PMID:29746512
Roh, Miin; Selivanova, Alexandra; Shin, Hyun Joon; Miller, Joan W; Jackson, Mary Lou
2018-01-01
Vision loss from age-related macular degeneration (AMD) has a profound effect on vision-related quality of life (VRQoL). The pupose of this study is to identify clinical factors associated with VRQoL using the Rasch- calibrated NEI VFQ-25 scales in bilateral advanced AMD patients. We retrospectively reviewed 47 patients (mean age 83.2 years) with bilateral advanced AMD. Clinical assessment included age, gender, type of AMD, high contrast visual acuity (VA), history of medical conditions, contrast sensitivity (CS), central visual field loss, report of Charles Bonnet Syndrome, current treatment for AMD and Rasch-calibrated NEI VFQ-25 visual function and socioemotional function scales. The NEI VFQ visual function scale includes items of general vision, peripheral vision, distance vision and near vision-related activity while the socioemotional function scale includes items of vision related-social functioning, role difficulties, dependency, and mental health. Multiple regression analysis (structural regression model) was performed using fixed item parameters obtained from the one-parameter item response theory model. Multivariate analysis showed that high contrast VA and CS were two factors influencing VRQoL visual function scale (β = -0.25, 95% CI-0.37 to -0.12, p<0.001 and β = 0.35, 95% CI 0.25 to 0.46, p<0.001) and socioemontional functioning scale (β = -0.2, 95% CI -0.37 to -0.03, p = 0.023, and β = 0.3, 95% CI 0.18 to 0.43, p = 0.001). Central visual field loss was not assoicated with either VRQoL visual or socioemontional functioning scale (β = -0.08, 95% CI-0.28 to 0.12,p = 0.44 and β = -0.09, 95% CI -0.03 to 0.16, p = 0.50, respectively). In patients with vision impairment secondary to bilateral advanced AMD, high contrast VA and CS are two important factors affecting VRQoL.
How Task Features Impact Evidence from Assessments Embedded in Simulations and Games
ERIC Educational Resources Information Center
Almond, Russell G.; Kim, Yoon Jeon; Velasquez, Gertrudes; Shute, Valerie J.
2014-01-01
One of the key ideas of evidence-centered assessment design (ECD) is that task features can be deliberately manipulated to change the psychometric properties of items. ECD identifies a number of roles that task-feature variables can play, including determining the focus of evidence, guiding form creation, determining item difficulty and…
An Eye-Movement Study of Relational Memory in Adults with Autism Spectrum Disorder
ERIC Educational Resources Information Center
Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.
2017-01-01
Persons with Autism Spectrum Disorder (ASD) demonstrate good memory for single items but difficulties remembering contextual information related to these items. Recently, we found compromised explicit but intact implicit retrieval of object-location information in ASD (Ring et al. "Autism Res" 8(5):609-619, 2015). Eye-movement data…
Probing University Students' Pre-Knowledge in Quantum Physics with QPCS Survey
ERIC Educational Resources Information Center
Asikainen, Mervi A.
2017-01-01
The study investigated the use of Quantum Physics Conceptual Survey (QPCS) in probing student understanding of quantum physics. Altogether 103 Finnish university students responded to QPCS. The mean scores of the student responses were calculated and the test was evaluated using common five indices: Item difficulty index, Item discrimination…
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach
ERIC Educational Resources Information Center
Mesic, Vanes; Muratovic, Hasnija
2011-01-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
Cognitive Complexity in the Remote Association Test--Chinese Version
ERIC Educational Resources Information Center
Hung, Su-Pin; Huang, Po-Sheng; Chen, Hsueh-Chih
2016-01-01
The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT--Chinese Version (RAT-C) using the…
Analysis of Open-Ended Statistics Questions with Many Facet Rasch Model
ERIC Educational Resources Information Center
Güler, Nese
2014-01-01
Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.
ERIC Educational Resources Information Center
Braun, Henry I.; And Others
The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…
A Comparison between Element Salience versus Context as Item Difficulty Factors in Raven's Matrices
ERIC Educational Resources Information Center
Perez-Salas, Claudia P.; Streiner, David L.; Roberts, Maxwell J.
2012-01-01
The nature of contextual facilitation effects for items derived from Raven's Progressive Matrices was investigated in two experiments. For these, the original matrices were modified, creating either abstract versions with high element salience, or versions which comprised realistic entities set in familiar contexts. In order to replicate and…
ERIC Educational Resources Information Center
Marcoulides, Katerina M.
2018-01-01
This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
ERIC Educational Resources Information Center
Finch, Holmes; Edwards, Julianne M.
2016-01-01
Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…
Interactions Between Item Content And Group Membership on Achievement Test Items.
ERIC Educational Resources Information Center
Linn, Robert L.; Harnisch, Delwyn L.
The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
Variability in Parameter Estimates and Model Fit across Repeated Allocations of Items to Parcels
ERIC Educational Resources Information Center
Sterba, Sonya K.; MacCallum, Robert C.
2010-01-01
Different random or purposive allocations of items to parcels within a single sample are thought not to alter structural parameter estimates as long as items are unidimensional and congeneric. If, additionally, numbers of items per parcel and parcels per factor are held fixed across allocations, different allocations of items to parcels within a…
Shalev, Anat; Shor, Ron
2016-12-01
Limited research attention has been given to the needs of family caregivers of persons with mental illness in psychiatric hospitals despite the stressors and difficulties they experience. In light of the recognition of the significance of helping family caregivers, a new model of consultation and support centers for family caregivers, called Meital, has been developed. To examine the needs of family caregivers who receive help in Meital, at the Beer Sheva Mental Health Center. Eighty-five family caregivers participated in the research. They completed a structured questionnaire constructed for this research two weeks after they started receiving services from Meital. The questionnaire included four areas of needs for help. These areas examined the extent of the need for help with respect to each of the items in the instrument. The mean of the extent of need for help of the items in the 'information and knowledge' subscale was the highest. Average to high means of the items of the subscales were found in the subscales relating to 'difficulties stemming from the impact of the situation of the person with mental illness on the function of the family caregiver receiving help,' 'on the function of other family members' and 'difficulties coping with the person with mental illness.' The mean of the items of the subscale 'relationships with professionals and informal systems' was the lowest. An examination of the items within the subscales indicated that items relating to the 'impact of the situation of the person with mental illness on the family caregiver who receives help' were ranked higher than the items relating to the 'impact on the function of other family caregivers.' Items relating to 'relationships with professionals' were ranked higher than items relating to 'relationships with informal systems.' This research emphasizes the importance of implementing the family-centered approach, the basis of the Meital Model, in psychiatric institutions. The focus of this approach is on the need for help of family caregivers beyond the help needed for them to function as a resource of help for the ill person. The findings also illuminate the importance of making information and knowledge accessible for family caregivers.
2015-01-01
Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
ERIC Educational Resources Information Center
Zhang, Dake; Ding, Yi; Stegall, Joanna; Mo, Lei
2012-01-01
Students who struggle with learning mathematics often have difficulties with geometry problem solving, which requires strong visual imagery skills. These difficulties have been correlated with deficiencies in visual working memory. Cognitive psychology has shown that chunking of visual items accommodates students' working memory deficits. This…
ERIC Educational Resources Information Center
Dickey, Wayne C.; Blumberg, Stephen J.
2004-01-01
Objective: The Strengths and Difficulties Questionnaire is a 25-item instrument developed to assess emotional and behavioral problems. The current study attempted to replicate previous European structural analyses and to describe the latent dimensions that underlie responses to the parent-reported version of the Strengths and Difficulties…
Eye Movements Reveal How Task Difficulty Moulds Visual Search
ERIC Educational Resources Information Center
Young, Angela H.; Hulleman, Johan
2013-01-01
In two experiments we investigated the relationship between eye movements and performance in visual search tasks of varying difficulty. Experiment 1 provided evidence that a single process is used for search among static and moving items. Moreover, we estimated the functional visual field (FVF) from the gaze coordinates and found that its size…
Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.
ERIC Educational Resources Information Center
Oosterhof, Albert C.; Coats, Pamela K.
Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…
Belief-bias reasoning in non-clinical delusion-prone individuals.
Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R
2017-03-01
It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Belief-bias reasoning in non-clinical delusion-prone individuals.
Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R
2017-09-01
It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Cross-cultural comparisons of the Mini-mental State Examination between Japanese and U.S. cohorts
Meguro, Kenichi; Ishii, Hiroshi; Yamaguchi, Satoshi; Saxton, Judith A.; Ganguli, Mary
2009-01-01
Background The Mini-mental State Examination (MMSE) is widely used in Japan and the U.S.A. for cognitive screening in the clinical setting and in epidemiological studies. A previous Japanese community study reported distributions of the MMSE total score very similar to that of the U.S.A. Methods Data were obtained from the Monongahela Valley Independent Elder's Study (MoVIES), a representative sample of community-dwelling elderly people aged 65 and older living near Pittsburgh, U.S.A., and from the Tajiri Project, with similar aims in Tajiri, Japan. We examined item-by-item distributions of the MMSE between two cohorts, comparing (1) percentage of correct answers for each item within each cohort, and (2) relative difficulty of each item measured by Item Characteristic Curve analysis (ICC), which estimates log odds of obtaining a correct answer adjusted for the remaining MMSE items, demographic variables (age, gender, education) and interactions of demographic variables and cohort. Results Median MMSE scores were very similar between the two samples within the same education groups. However, the relative difficulty of each item differed substantially between the two cohorts. Specifically, recall and auditory comprehension were easier for the Tajiri group, but reading comprehension and sentence construction were easier for the MoVIES group. Conclusions Our results reaffirm the importance of validation and examination of thresholds in each cohort to be studied when a common instrument is used as a dementia screening tool or for defining cognitive impairment. PMID:18925977
Braun, J
1994-02-01
In more than one respect, visual search for the most salient or the least salient item in a display are different kinds of visual tasks. The present work investigated whether this difference is primarily one of perceptual difficulty, or whether it is more fundamental and relates to visual attention. Display items of different salience were produced by varying either size, contrast, color saturation, or pattern. Perceptual masking was employed and, on average, mask onset was delayed longer in search for the least salient item than in search for the most salient item. As a result, the two types of visual search presented comparable perceptual difficulty, as judged by psychophysical measures of performance, effective stimulus contrast, and stability of decision criterion. To investigate the role of attention in the two types of search, observers attempted to carry out a letter discrimination and a search task concurrently. To discriminate the letters, observers had to direct visual attention at the center of the display and, thus, leave unattended the periphery, which contained target and distractors of the search task. In this situation, visual search for the least salient item was severely impaired while visual search for the most salient item was only moderately affected, demonstrating a fundamental difference with respect to visual attention. A qualitatively identical pattern of results was encountered by Schiller and Lee (1991), who used similar visual search tasks to assess the effect of a lesion in extrastriate area V4 of the macaque.
Wang, Xiaoli; Xuan, Yifu; Jarrold, Christopher
2016-01-01
Previous studies have examined whether difficulties in short-term memory for verbal information, that might be associated with dyslexia, are driven by problems in retaining either information about to-be-remembered items or the order in which these items were presented. However, such studies have not used process-pure measures of short-term memory for item or order information. In this work we adapt a process dissociation procedure to properly distinguish the contributions of item and order processes to verbal short-term memory in a group of 28 adults with a self-reported diagnosis of dyslexia and a comparison sample of 29 adults without a dyslexia diagnosis. In contrast to previous work that has suggested that individuals with dyslexia experience item deficits resulting from inefficient phonological representation and language-independent order memory deficits, the results showed no evidence of specific problems in short-term retention of either item or order information among the individuals with a self-reported diagnosis of dyslexia, despite this group showing expected difficulties on separate measures of word and non-word reading. However, there was some suggestive evidence of a link between order memory for verbal material and individual differences in non-word reading, consistent with other claims for a role of order memory in phonologically mediated reading. The data from the current study therefore provide empirical evidence to question the extent to which item and order short-term memory are necessarily impaired in dyslexia. PMID:26941679
Wang, Xiaoli; Xuan, Yifu; Jarrold, Christopher
2016-01-01
Previous studies have examined whether difficulties in short-term memory for verbal information, that might be associated with dyslexia, are driven by problems in retaining either information about to-be-remembered items or the order in which these items were presented. However, such studies have not used process-pure measures of short-term memory for item or order information. In this work we adapt a process dissociation procedure to properly distinguish the contributions of item and order processes to verbal short-term memory in a group of 28 adults with a self-reported diagnosis of dyslexia and a comparison sample of 29 adults without a dyslexia diagnosis. In contrast to previous work that has suggested that individuals with dyslexia experience item deficits resulting from inefficient phonological representation and language-independent order memory deficits, the results showed no evidence of specific problems in short-term retention of either item or order information among the individuals with a self-reported diagnosis of dyslexia, despite this group showing expected difficulties on separate measures of word and non-word reading. However, there was some suggestive evidence of a link between order memory for verbal material and individual differences in non-word reading, consistent with other claims for a role of order memory in phonologically mediated reading. The data from the current study therefore provide empirical evidence to question the extent to which item and order short-term memory are necessarily impaired in dyslexia.
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION
de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Item-focussed Trees for the Identification of Items in Differential Item Functioning.
Tutz, Gerhard; Berger, Moritz
2016-09-01
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Investigating the Impact of Uncertainty about Item Parameters on Ability Estimation
ERIC Educational Resources Information Center
Zhang, Jinming; Xie, Minge; Song, Xiaolan; Lu, Ting
2011-01-01
Asymptotic expansions of the maximum likelihood estimator (MLE) and weighted likelihood estimator (WLE) of an examinee's ability are derived while item parameter estimators are treated as covariates measured with error. The asymptotic formulae present the amount of bias of the ability estimators due to the uncertainty of item parameter estimators.…
NON-SPECIFIC SYMPTOMS AND SCREENING OF NON-PSYCHOTIC MORBIDITY IN PRIMARY CARE1
Srinivasan, T.N.; Suresh, T.R.
1990-01-01
SUMMARY Much of the non-psychotic mental morbidity in primary care goes undetected by the primary care health personnel. This is often because of the non-specific somatic nature of the presenting complaints of these patients and the difficulty on the part of the primary care physician to elicit specific emotional symptoms to screen psychiatric problems. This paper describes the development of the 7-item Primary care Psychiatric Questionnaire (PPQ.) which, by requiring to elicit only the non-specific symptoms, could overcome this practical difficulty. This new screening method has been standardised against the Self Report Questionaaire—20-item version which is commonly used in primary care. PMID:21927432
Rodríguez-Díez, María Cristina; Alegre, Manuel; Díez, Nieves; Arbea, Leire; Ferrer, Marta
2016-02-03
The main factor that determines the selection of a medical specialty in Spain after obtaining a medical degree is the MIR ("médico interno residente", internal medical resident) exam. This exam consists of 235 multiple-choice questions with five options, some of which include images provided in a separate booklet. The aim of this study was to analyze the technical quality of the multiple-choice questions included in the MIR exam over the last five years. All the questions included in the exams from 2009 to 2013 were analyzed. We studied the proportion of questions including clinical vignettes, the number of items related to an image and the presence of technical flaws in the questions. For the analysis of technical flaws, we adapted the National Board of Medical Examiners (NBME) guidelines. We looked for 18 different issues included in the manual, grouped into two categories: issues related to testwiseness and issues related to irrelevant difficulties. The final number of questions analyzed was 1,143. The percentage of items based on clinical vignettes increased from 50% in 2009 to 56-58% in the following years (2010-2013). The percentage of items based on an image increased progressively from 10% in 2009 to 15% in 2012 and 2013. The percentage of items with at least one technical flaw varied between 68 and 72%. We observed a decrease in the percentage of items with flaws related to testwiseness, from 30% in 2009 to 20% in 2012 and 2013. While most of these issues decreased dramatically or even disappeared (such as the imbalance in the correct option numbers), the presence of non-plausible options remained frequent. With regard to technical flaws related to irrelevant difficulties, no improvement was observed; this is especially true with respect to negative stem questions and "hinged" questions. The formal quality of the MIR exam items has improved over the last five years with regard to testwiseness. A more detailed revision of the items submitted, checking systematically for the presence of technical flaws, could improve the validity and discriminatory power of the exam, without increasing its difficulty.
Refining a self-assessment of informatics competency scale using Mokken scaling analysis.
Yoon, Sunmoo; Shaffer, Jonathan A; Bakken, Suzanne
2015-01-01
Healthcare environments are increasingly implementing health information technology (HIT) and those from various professions must be competent to use HIT in meaningful ways. In addition, HIT has been shown to enable interprofessional approaches to health care. The purpose of this article is to describe the refinement of the Self-Assessment of Nursing Informatics Competencies Scale (SANICS) using analytic techniques based upon item response theory (IRT) and discuss its relevance to interprofessional education and practice. In a sample of 604 nursing students, the 93-item version of SANICS was examined using non-parametric IRT. The iterative modeling procedure included 31 steps comprising: (1) assessing scalability, (2) assessing monotonicity, (3) assessing invariant item ordering, and (4) expert input. SANICS was reduced to an 18-item hierarchical scale with excellent reliability. Fundamental skills for team functioning and shared decision making among team members (e.g. "using monitoring systems appropriately," "describing general systems to support clinical care") had the highest level of difficulty, and "demonstrating basic technology skills" had the lowest difficulty level. Most items reflect informatics competencies relevant to all health professionals. Further, the approaches can be applied to construct a new hierarchical scale or refine an existing scale related to informatics attitudes or competencies for various health professions.
Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.
2011-01-01
Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix
2016-03-01
To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.
Tong, Fang; Fu, Tong
2013-01-01
Objective To evaluate the differences in fluid intelligence tests between normal children and children with learning difficulties in China. Method PubMed, MD Consult, and other Chinese Journal Database were searched from their establishment to November 2012. After finding comparative studies of Raven measurements of normal children and children with learning difficulties, full Intelligent Quotation (FIQ) values and the original values of the sub-measurement were extracted. The corresponding effect model was selected based on the results of heterogeneity and parallel sub-group analysis was performed. Results Twelve documents were included in the meta-analysis, and the studies were all performed in mainland of China. Among these, two studies were performed at child health clinics, the other ten sites were schools and control children were schoolmates or classmates. FIQ was evaluated using a random effects model. WMD was −13.18 (95% CI: −16.50–−9.85). Children with learning difficulties showed significantly lower FIQ scores than controls (P<0.00001); Type of learning difficulty and gender differences were evaluated using a fixed-effects model (I2 = 0%). The sites and purposes of the studies evaluated here were taken into account, but the reasons of heterogeneity could not be eliminated; The sum IQ of all the subgroups showed considerable heterogeneity (I2 = 76.5%). The sub-measurement score of document A showed moderate heterogeneity among all documents, and AB, B, and E showed considerable heterogeneity, which was used in a random effect model. Individuals with learning difficulties showed heterogeneity as well. There was a moderate delay in the first three items (−0.5 to −0.9), and a much more pronounced delay in the latter three items (−1.4 to −1.6). Conclusion In the Chinese mainland, the level of fluid intelligence of children with learning difficulties was lower than that of normal children. Delayed development in sub-items of C, D, and E was more obvious. PMID:24236016
The Development of a Post Separation/Post Divorce Problems and Stress Scale.
ERIC Educational Resources Information Center
Raschke, Helen J.
Factors associated with the speed and level of difficulty with which individuals adjust to separation and divorce were investigated. A scale was developed to analyze these factors, and included items dealing with the subdimensions of stress and the perception of the persons involved. Factor analysis of the scale items as well as additional tests…
ERIC Educational Resources Information Center
Trace, Jonathan; Brown, James Dean; Janssen, Gerriet; Kozhevnikova, Liudmila
2017-01-01
Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level.…
Language Effects in International Testing: The Case of PISA 2006 Science Items
ERIC Educational Resources Information Center
El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art
2016-01-01
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms
ERIC Educational Resources Information Center
Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.
2011-01-01
To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
Item Mass and Complexity and the Arithmetic Computation of Students with Learning Disabilities.
ERIC Educational Resources Information Center
Cawley, John F.; Shepard, Teri; Smith, Maureen; Parmar, Rene S.
1997-01-01
The performance of 76 students (ages 10 to 15) with learning disabilities on four tasks of arithmetic computation within each of the four basic operations was examined. Tasks varied in difficulty level and number of strokes needed to complete all items. Intercorrelations between task sets and operations were examined as was the use of…
The Golden Rule Agreement is Psychometrically Defensible.
ERIC Educational Resources Information Center
Gonzalez-Tamayo, Eulogio
The agreement between the Educational Testing Service (ETS) and the Golden Rule Insurance Company of Illinois is interpreted as setting the general principles on which items must be selected to be included in a licensure test. These principles put a limit to the difficulty level of any item, and they also limit the size of the difference in…
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Generic ABILHAND Questionnaire Can Measure Manual Ability across a Variety of Motor Impairments
ERIC Educational Resources Information Center
Simone, Anna; Rota, Viviana; Tesio, Luigi; Perucca, Laura
2011-01-01
ABILHAND is, in its original version, a 46-item, 4-level questionnaire. It measures the difficulty perceived by patients with rheumatoid arthritis as they do various daily manual tasks. ABILHAND was originally built through Rasch analysis. In a later study, it was simplified to a generic 23-item, three-level questionnaire, showing both…
Solving Graphics Problems: Student Performance in Junior Grades
ERIC Educational Resources Information Center
Lowrie, Tom; Diezmann, Carmel M.
2007-01-01
The authors investigated the performance of 172 Grade 4 students (9 to 10 years) over 12 months on a 36-item test that comprised items from 6 distinct graphical languages (e.g., maps) commonly used to convey mathematical information. Results revealed (a) difficulties in Grade 4 students' capacity to decode a variety of graphics, (b) significant…
ERIC Educational Resources Information Center
Sweller, Naomi
2015-01-01
Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…
ERIC Educational Resources Information Center
Eignor, Daniel R.; Douglass, James B.
This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
ERIC Educational Resources Information Center
Kersten, Paula; Czuba, Karol; McPherson, Kathryn; Dudley, Margaret; Elder, Hinemoa; Tauroa, Robyn; Vandal, Alain
2016-01-01
This article synthesized evidence for the validity and reliability of the Strengths and Difficulties Questionnaire in children aged 3-5 years. A systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement guidelines was carried out. Study quality was rated using the Consensus-based Standards for the…
ERIC Educational Resources Information Center
Keller, Johannes
2007-01-01
Background: Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths…
Item Vector Plots for the Multidimensional Three-Parameter Logistic Model
ERIC Educational Resources Information Center
Bryant, Damon; Davis, Larry
2011-01-01
This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…
ERIC Educational Resources Information Center
De Boeck, Paul
2008-01-01
It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters…
An analysis of the masking of speech by competing speech using self-report data.
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
2009-01-01
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Yao, Shih-Ying; Bull, Rebecca; Khng, Kiat Hui; Rahim, Anisa
2018-01-01
Understanding a child's ability to decode emotion expressions is important to allow early interventions for potential difficulties in social and emotional functioning. This study applied the Rasch model to investigate the psychometric properties of the NEPSY-II Affect Recognition subtest, a U.S. normed measure for 3-16 year olds which assesses the ability to recognize facial expressions of emotion. Data were collected from 1222 children attending preschools in Singapore. We first performed the Rasch analysis with the raw item data, and examined the technical qualities and difficulty pattern of the studied items. We subsequently investigated the relation of the estimated affect recognition ability from the Rasch analysis to a teacher-reported measure of a child's behaviors, emotions, and relationships. Potential gender differences were also examined. The Rasch model fits our data well. Also, the NEPSY-II Affect Recognition subtest was found to have reasonable technical qualities, expected item difficulty pattern, and desired association with the external measure of children's behaviors, emotions, and relationships for both boys and girls. Overall, findings from this study suggest that the NEPSY-II Affect Recognition subtest is a promising measure of young children's affect recognition ability. Suggestions for future test improvement and research were discussed.
A Comparison of the One-and Three-Parameter Logistic Models on Measures of Test Efficiency.
ERIC Educational Resources Information Center
Benson, Jeri
Two methods of item selection were used to select sets of 40 items from a 50-item verbal analogies test, and the resulting item sets were compared for relative efficiency. The BICAL program was used to select the 40 items having the best mean square fit to the one parameter logistic (Rasch) model. The LOGIST program was used to select the 40 items…
Dalton, Megan; Davidson, Megan; Keating, Jenny
2011-01-01
Is the Assessment of Physiotherapy Practice (APP) a valid instrument for the assessment of entry-level competence in physiotherapy students? Cross-sectional study with Rasch analysis of initial (n=326) and validation samples (n=318). Students were assessed on completion of 4, 5, or 6-week clinical placements across one university semester. 298 clinical educators and 456 physiotherapy students at nine universities in Australia and New Zealand provided 644 completed APP instruments. APP data in both samples showed overall fit to a Rasch model of expected item functioning for interval scale measurement. Item 6 (Written communication) exhibited misfit in both samples, but was retained as an important element of competence. The hierarchy of item difficulty was the same in both samples with items related to professional behaviour and communication the easiest to achieve and items related to clinical reasoning the most difficult. Item difficulty was well targeted to person ability. No Differential Item Functioning was identified, indicating that the scale performed in a comparable way regardless of the student's age, gender or amount of prior clinical experience, and the educator's age, gender, or experience as an educator, or the type of facility, university, or clinical area. The instrument demonstrated unidimensionality confirming the appropriateness of summing the scale scores on each item to provide an overall score of clinical competence and was able to discriminate four levels of professional competence (Person Separation Index=0.96). Person ability and raw APP scores had a linear relationship (r(2)=0.99). Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice. Copyright © 2011 Australian Physiotherapy Association. Published by .. All rights reserved.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Investigation of IRT-Based Equating Methods in the Presence of Outlier Common Items
ERIC Educational Resources Information Center
Hu, Huiqin; Rogers, W. Todd; Vukmirovic, Zarko
2008-01-01
Common items with inconsistent b-parameter estimates may have a serious impact on item response theory (IRT)--based equating results. To find a better way to deal with the outlier common items with inconsistent b-parameters, the current study investigated the comparability of 10 variations of four IRT-based equating methods (i.e., concurrent…
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model
ERIC Educational Resources Information Center
Wang, Wen-Chung; Wilson, Mark
2005-01-01
This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
Examination of Different Item Response Theory Models on Tests Composed of Testlets
ERIC Educational Resources Information Center
Kogar, Esin Yilmaz; Kelecioglu, Hülya
2017-01-01
The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
The Effect of Including or Excluding Students with Testing Accommodations on IRT Calibrations.
ERIC Educational Resources Information Center
Karkee, Thakur; Lewis, Dan M.; Barton, Karen; Haug, Carolyn
This study aimed to determine the degree to which the inclusion of accommodated students with disabilities in the calibration sample affects the characteristics of item parameters and the test results. Investigated were effects on test reliability, item fit to the applicable item response theory (IRT) model, item parameter estimates, and students'…
Lynskey, M T; Agrawal, A
2007-09-01
DSM-IV criteria for illicit drug abuse and dependence are largely based on criteria developed for alcohol use disorders and there is a lack of research evidence on the psychometric properties of these symptoms when applied to illicit drugs. This study utilizes data on abuse/dependence criteria for cannabis, cocaine, stimulants, sedatives, tranquilizers, opiates, hallucinogens and inhalants from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC, n=43 093). Analyses included factor analysis to explore the dimensionality of illicit drug abuse and dependence criteria, calculation of item difficulty and discrimination within an item response framework and a descriptive analysis of 'diagnostic orphans': individuals meeting criteria for 1-2 dependence symptoms but not abuse. Rates of psychiatric disorders were compared across groups. Results favor a uni-dimensional construct for abuse/dependence on each of the eight drug classes. Factor loadings, item difficulty and discrimination were remarkably consistent across drug categories. For each drug category, between 29% and 51% of all individuals meeting criteria for at least one symptom did not receive a formal diagnosis of either abuse or dependence and were therefore classified as 'orphans'. Mean rates of disorder in these individuals suggested that illicit drug use disorders may be more adequately described along a spectrum of severity. While there were remarkable similarities across categories of illicit drugs, consideration of item difficulty suggested that some alterations to DSM regarding the relevant severity of specific abuse and dependence criteria may be warranted.
ERIC Educational Resources Information Center
Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G.
2012-01-01
Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…
A Primer on the 2- and 3-Parameter Item Response Theory Models.
ERIC Educational Resources Information Center
Thornton, Artist
Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…
ERIC Educational Resources Information Center
Wollack, James A.; Bolt, Daniel M.; Cohen, Allan S.; Lee, Young-Sun
2002-01-01
Compared the quality of item parameter estimates for marginal maximum likelihood (MML) and Markov Chain Monte Carlo (MCMC) with the nominal response model using simulation. The quality of item parameter recovery was nearly identical for MML and MCMC, and both methods tended to produce good estimates. (SLD)
ERIC Educational Resources Information Center
Tsutakawa, Robert K.
This paper presents a method for estimating certain characteristics of test items which are designed to measure ability, or knowledge, in a particular area. Under the assumption that ability parameters are sampled from a normal distribution, the EM algorithm is used to derive maximum likelihood estimates to item parameters of the two-parameter…
An Ethical Issue Scale for Community Pharmacy Setting (EISP): Development and Validation.
Crnjanski, Tatjana; Krajnovic, Dusanka; Tadic, Ivana; Stojkov, Svetlana; Savic, Mirko
2016-04-01
Many problems that arise when providing pharmacy services may contain some ethical components and the aims of this study were to develop and validate a scale that could assess difficulties of ethical issues, as well as the frequency of those occurrences in everyday practice of community pharmacists. Development and validation of the scale was conducted in three phases: (1) generating items for the initial survey instrument after qualitative analysis; (2) defining the design and format of the instrument; (3) validation of the instrument. The constructed Ethical Issue scale for community pharmacy setting has two parts containing the same 16 items for assessing the difficulty and frequency thereof. The results of the 171 completely filled out scales were analyzed (response rate 74.89%). The Cronbach's α value of the part of the instrument that examines difficulties of the ethical situations was 0.83 and for the part of the instrument that examined frequency of the ethical situations was 0.84. Test-retest reliability for both parts of the instrument was satisfactory with all Interclass correlation coefficient (ICC) values above 0.6, (for the part that examines severity ICC = 0.809, for the part that examines frequency ICC = 0.929). The 16-item scale, as a self assessment tool, demonstrated a high degree of content, criterion, and construct validity and test-retest reliability. The results support its use as a research tool to asses difficulty and frequency of ethical issues in community pharmacy setting. The validated scale needs to be further employed on a larger sample of pharmacists.
The Utility of the Family Empowerment Scale With Custodial Grandmothers
Hayslip, Bert; Smith, Gregory C.; Montoro-Rodriguez, Julian; Streider, Frederick H.; Merchant, William
2016-01-01
The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three (M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers’ psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers. PMID:26452627
The Utility of the Family Empowerment Scale With Custodial Grandmothers.
Hayslip, Bert; Smith, Gregory C; Montoro-Rodriguez, Julian; Streider, Frederick H; Merchant, William
2017-03-01
The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three ( M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers' psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers.
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
2015-12-01
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Hogge, Michaël; Adam, Stéphane; Collette, Fabienne
2008-07-01
The directed forgetting effect obtained with the item method is supposed to depend on both selective rehearsal of to-be-remembered (TBR) items and attentional inhibition of to-be-forgotten (TBF) items. In this study, we investigated the locus of the directed forgetting deficit in older adults by exploring the influence of recollection and familiarity-based retrieval processes on age-related differences in directed forgetting. Moreover, we explored the influence of processing speed, short-term memory capacity, thought suppression tendencies, and sensitivity to proactive interference on performance. The results indicated that older adults' directed forgetting difficulties are due to decreased recollection of TBR items, associated with increased automatic retrieval of TBF items. Moreover, processing speed and proactive interference appeared to be responsible for the decreased recall of TBR items.
Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan
2016-01-01
Background Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. Methods A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Results Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12–0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Conclusion Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ. PMID:27660543
Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan
2016-07-01
Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12-0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ.
ERIC Educational Resources Information Center
Liao, Chi-Wen; Livingston, Samuel A.
2008-01-01
Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
An Information Analysis of 2-, 3-, and 4-Word Verbal Discrimination Learning.
ERIC Educational Resources Information Center
Arima, James K.; Gray, Francis D.
Information theory was used to qualify the difficulty of verbal discrimination (VD) learning tasks and to measure VD performance. Words for VD items were selected with high background frequency and equal a priori probabilities of being selected as a first response. Three VD lists containing only 2-, 3-, or 4-word items were created and equated for…
ERIC Educational Resources Information Center
Chen, Chieh-Yu; Chen, Ching-I; Squires, Jane; Bian, Xiaoyan; Heo, Kay H.; Filgueiras, Alberto; Kalinina, Svetlana; Samarina, Larissa; Ermolaeva, Evgeniya; Xie, Huichao; Yu, Ting-Ying; Wu, Pei-Fang; Landeira-Fernandez, Jesus
2017-01-01
Ages & Stages Questionnaires: Social-Emotional (ASQ:SE) is a widely used screening instrument for detecting social-emotional difficulties in infants and young children. To use a screening instrument across cultures and countries, it is necessary to identify potential item-level biases and ensure item equivalence. This study investigated the…
ERIC Educational Resources Information Center
Goldhammer, Frank
2015-01-01
The main challenge of ability tests relates to the difficulty of items, whereas speed tests demand that test takers complete very easy items quickly. This article proposes a conceptual framework to represent how performance depends on both between-person differences in speed and ability and the speed-ability compromise within persons. Related…
ERIC Educational Resources Information Center
Chan, David W.
2010-01-01
Data of item responses to the Impossible Figures Task (IFT) from 492 Chinese primary, secondary, and university students were analyzed using the dichotomous Rasch measurement model. Item difficulty estimates and person ability estimates located on the same logit scale revealed that the pooled sample of Chinese students, who were relatively highly…
Leonard, Laurence B; Deevy, Patricia; Fey, Marc E; Bredin-Oja, Shelley L
2013-04-01
This study examined sentence comprehension in children with specific language impairment (SLI) in a manner designed to separate the contribution of cognitive capacity from the effects of syntactic structure. Nineteen children with SLI, 19 typically developing children matched for age (TD-A), and 19 younger typically developing children (TD-Y) matched according to sentence comprehension test scores responded to sentence comprehension items that varied in either length or their demands on cognitive capacity, based on the nature of the foils competing with the target picture. The TD-A children were accurate across all item types. The SLI and TD-Y groups were less accurate than the TD-A group on items with greater length and, especially, on items with the greatest demands on cognitive capacity. The types of errors were consistent with failure to retain details of the sentence apart from syntactic structure. The difficulty in the more demanding conditions seemed attributable to interference. Specifically, the children with SLI and the TD-Y children appeared to have difficulty retaining details of the target sentence when the information reflected in the foils closely resembled the information in the target sentence.
Both younger and older adults have difficulty updating emotional memories.
Nashiro, Kaoru; Sakaki, Michiko; Huffman, Derek; Mather, Mara
2013-03-01
The main purpose of the study was to examine whether emotion impairs associative memory for previously seen items in older adults, as previously observed in younger adults. Thirty-two younger adults and 32 older adults participated. The experiment consisted of 2 parts. In Part 1, participants learned picture-object associations for negative and neutral pictures. In Part 2, they learned picture-location associations for negative and neutral pictures; half of these pictures were seen in Part 1 whereas the other half were new. The dependent measure was how many locations of negative versus neutral items in the new versus old categories participants remembered in Part 2. Both groups had more difficulty learning the locations of old negative pictures than of new negative pictures. However, this pattern was not observed for neutral items. Despite the fact that older adults showed overall decline in associative memory, the impairing effect of emotion on updating associative memory was similar between younger and older adults.
ERIC Educational Resources Information Center
Kearns, Devin M.; Steacy, Laura M.; Compton, Donald L.; Gilbert, Jennifer K.; Goodwin, Amanda P.; Cho, Eunsoo; Lindstrom, Esther R.; Collins, Alyson A.
2016-01-01
Comprehensive models of derived polymorphemic word recognition skill in developing readers, with an emphasis on children with reading difficulty (RD), have not been developed. The purpose of the present study was to model individual differences in polymorphemic word recognition ability at the item level among 5th-grade children (N = 173)…
ERIC Educational Resources Information Center
Palmieri, Patrick A.; Smith, Gregory C.
2007-01-01
The authors examined the structural validity of the parent informant version of the Strengths and Difficulties Questionnaire (SDQ) with a sample of 733 custodial grandparents. Three models of the SDQ's factor structure were evaluated with confirmatory factor analysis based on the item covariance matrix. Although indices of fit were good across all…
ERIC Educational Resources Information Center
Oruç Ertürk, Nesrin; Mumford, Simon E.
2017-01-01
This study, conducted by two researchers who were also multiple-choice question (MCQ) test item writers at a private English-medium university in an English as a foreign language (EFL) context, was designed to shed light on the factors that influence test-takers' perceptions of difficulty in English for academic purposes (EAP) vocabulary, with the…
Kraft, Pål; Rise, Jostein; Sutton, Stephen; Røysamb, Espen
2005-09-01
A study was conducted to explore (a) the dimensional structure of perceived behavioural control (PBC), (b) the conceptual basis of perceived difficulty items, and (c) how PBC components and instrumental and affective attitudes, respectively, relate to intention and behaviour. The material stemmed from a two-wave study of Norwegian graduate students (N = 227 for the prediction of intention and N = 110 for the prediction of behaviour). Data were analysed using confirmatory factor analysis (CFA) and multiple regression by the application of structural equation modelling (SEM). CFA suggested that PBC could be conceived of as consisting of three separate but interrelated factors (perceived control, perceived confidence and perceived difficulty), or as two separate but interrelated factors representing self-efficacy (measured by perceived difficulty and perceived confidence or by just perceived confidence) and perceived control. However, the perceived difficulty items also overlapped substantially with affective attitude. Perceived confidence was a strong predictor of exercise intention but not of recycling intention. Perceived control, however, was a strong predictor of recycling intention but not exercise intention. Affective attitudes but not instrumental attitudes were identified as substantial predictors of intentions. The findings suggest that at least under some circumstances it may be inadequate to measure PBC by means of perceived difficulty. One possible consequence may be that the role of PBC as a predictor of intention is somewhat overestimated, whereas the role of (affective) attitude may be similarly underestimated.
Andersson, Helle Wessel; Bjørngaard, Johan Håkon; Kaspersen, Silje Lill; Wang, Catharina E A; Skre, Ingunn; Dahl, Thomas
2010-05-01
The aim was to examine the prevalence of mental health difficulties and prejudices toward mental illness among adolescents, and to analyze possible school and school class effects on these issues. The sample comprised 4,046 pupils (16-19 years) in 257 school classes from 45 Norwegian upper secondary schools. The estimated response rate among the pupils was about 96%. Self-reported mental health difficulties were measured with a four-item scale that covered emotional and behavioral difficulties. Prejudiced attitudes toward mental illness were assessed using a nine-item scale. Multilevel regression analysis was used to estimate the contribution of factors at the individual level, and at the school and class levels. Most of the variance in self-reported mental health difficulties and prejudices was accounted for by individual level factors (92-94%). However, there were statistically significant school and class level effects (P < 0.01), confounded by socioeconomic factors. Mental health difficulties were commonly reported, more often by females than males (P < 0.01). Difficulties with emotions and attention were the two main problem areas, with definite to severe difficulties being reported by 19 and 21% of the females, and by 9 and 16% of the males, respectively. Prejudices were reported more often by males than females (P < 0.01). Both self-reported mental health difficulties and prejudiced attitudes were related to educational program, living situation, and parental education (P < 0.01). The relatively high prevalences of mental health difficulties and prejudiced attitudes toward mental illness among adolescents indicate a need for effective mental health intervention programs. Targeted intervention strategies should be considered when there is evidence of a high number of risk factors in schools and school classes. Furthermore, the gender differences found in self-reported mental health difficulties and prejudices suggest a need for gender-differentiated programs.
2012-01-01
Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Doong, Shing H.
2009-01-01
The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…
ERIC Educational Resources Information Center
Wu, Yi-Fang
2015-01-01
Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…
Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia
2014-06-01
Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.
Evaluation of the IRT Parameter Invariance Property for the MCAT.
ERIC Educational Resources Information Center
Kelkar, Vinaya; Wightman, Linda F.; Luecht, Richard M.
The purpose of this study was to investigate the viability of the property of parameter invariance for the one-parameter (1P), two-parameter (2P), and three-parameter (3P) item response theory (IRT) models for the Medical College Admissions Tests (MCAT). Invariance of item parameters across different gender, ethnic, and language groups and the…
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2008-01-01
The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Durning, Steven J; Dong, Ting; Artino, Anthony R; van der Vleuten, Cees; Holmboe, Eric; Schuwirth, Lambert
2015-08-01
An ongoing debate exists in the medical education literature regarding the potential benefits of pattern recognition (non-analytic reasoning), actively comparing and contrasting diagnostic options (analytic reasoning) or using a combination approach. Studies have not, however, explicitly explored faculty's thought processes while tackling clinical problems through the lens of dual process theory to inform this debate. Further, these thought processes have not been studied in relation to the difficulty of the task or other potential mediating influences such as personal factors and fatigue, which could also be influenced by personal factors such as sleep deprivation. We therefore sought to determine which reasoning process(es) were used with answering clinically oriented multiple-choice questions (MCQs) and if these processes differed based on the dual process theory characteristics: accuracy, reading time and answering time as well as psychometrically determined item difficulty and sleep deprivation. We performed a think-aloud procedure to explore faculty's thought processes while taking these MCQs, coding think-aloud data based on reasoning process (analytic, nonanalytic, guessing or combination of processes) as well as word count, number of stated concepts, reading time, answering time, and accuracy. We also included questions regarding amount of work in the recent past. We then conducted statistical analyses to examine the associations between these measures such as correlations between frequencies of reasoning processes and item accuracy and difficulty. We also observed the total frequencies of different reasoning processes in the situations of getting answers correctly and incorrectly. Regardless of whether the questions were classified as 'hard' or 'easy', non-analytical reasoning led to the correct answer more often than to an incorrect answer. Significant correlations were found between self-reported recent number of hours worked with think-aloud word count and number of concepts used in the reasoning but not item accuracy. When all MCQs were included, 19 % of the variance of correctness could be explained by the frequency of expression of these three think-aloud processes (analytic, nonanalytic, or combined). We found evidence to support the notion that the difficulty of an item in a test is not a systematic feature of the item itself but is always a result of the interaction between the item and the candidate. Use of analytic reasoning did not appear to improve accuracy. Our data suggest that individuals do not apply either System 1 or System 2 but instead fall along a continuum with some individuals falling at one end of the spectrum.
Optimal Linking Design for Response Model Parameters
ERIC Educational Resources Information Center
Barrett, Michelle D.; van der Linden, Wim J.
2017-01-01
Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…
Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar
2013-11-01
To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei
2013-07-01
To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
ERIC Educational Resources Information Center
Ye, Meng; Xin, Tao
2014-01-01
The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…
ERIC Educational Resources Information Center
Schlingman, Wayne M.; Prather, Edward E.; Wallace, Colin S.; Brissenden, Gina; Rudolph, Alexander L.
2012-01-01
This paper is the first in a series of investigations into the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI). In this paper, we use classical test theory to form a framework of results that will be used to evaluate individual item difficulties, item discriminations, and the overall reliability of the…
2018-01-01
Objective To investigate the psychometric properties of the activities of daily living (ADL) instrument used in the analysis of Korean Longitudinal Study of Ageing (KLoSA) dataset. Methods A retrospective study was carried out involving 2006 KLoSA records of community-dwelling adults diagnosed with stroke. The ADL instrument used for the analysis of KLoSA included 17 items, which were analyzed using Rasch modeling to develop a robust outcome measure. The unidimensionality of the ADL instrument was examined based on confirmatory factor analysis with a one-factor model. Item-level psychometric analysis of the ADL instrument included fit statistics, internal consistency, precision, and the item difficulty hierarchy. Results The study sample included a total of 201 community-dwelling adults (1.5% of the Korean population with an age over 45 years; mean age=70.0 years, SD=9.7) having a history of stroke. The ADL instrument demonstrated unidimensional construct. Two misfit items, money management (mean square [MnSq]=1.56, standardized Z-statistics [ZSTD]=2.3) and phone use (MnSq=1.78, ZSTD=2.3) were removed from the analysis. The remaining 15 items demonstrated good item fit, high internal consistency (person reliability=0.91), and good precision (person strata=3.48). The instrument precisely estimated person measures within a wide range of theta (−4.75 logits < θ < 3.97 logits) and a reliability of 0.9, with a conceptual hierarchy of item difficulty. Conclusion The findings indicate that the 15 ADL items met Rasch expectations of unidimensionality and demonstrated good psychometric properties. It is proposed that the validated ADL instrument can be used as a primary outcome measure for assessing longitudinal disability trajectories in the Korean adult population and can be employed for comparative analysis of international disability across national aging studies. PMID:29765888
Psychometrics of the self-report safe driving behavior measure for older adults.
Classen, Sherrilene; Wen, Pey-Shan; Velozo, Craig A; Bédard, Michel; Winter, Sandra M; Brumback, Babette; Lanford, Desiree N
2012-01-01
We investigated the psychometric properties of the 68-item Safe Driving Behavior Measure (SDBM) with 80 older drivers, 80 caregivers, and 2 evaluators from two sites. Using Rasch analysis, we examined unidimensionality and local dependence; rating scale; item- and person-level psychometrics; and item hierarchy of older drivers, caregivers, and driving evaluators who had completed the SDBM. The evidence suggested the SDBM is unidimensional, but pairs of items showed local dependency. Across the three rater groups, the data showed good person (≥3.4) and item (≥3.6) separation as well as good person (≥.93) and item reliability (≥.92). Cronbach's α was ≥.96, and few items were misfitting. Some of the items did not follow the hypothesized order of item difficulty. The SDBM classified the older drivers into six ability levels, but to fully calibrate the instrument it must be refined in terms of its items (e.g., item exclusion) and then tested among participants of lesser ability. Copyright © 2012 by the American Occupational Therapy Association, Inc.
Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A
2016-06-01
To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.
Kashiwagi, Mitsuru; Suzuki, Shuhei
2009-09-01
Many children with developmental disorders are known to have motor impairment such as clumsiness and poor physical ability;however, the objective evaluation of such difficulties is not easy in routine clinical practice. In this study, we aimed to establish a simple method for evaluating motor difficulty of childhood. This method employs a scored interview and examination for detecting soft neurological signs (SNSs). After a preliminary survey with 22 normal children, we set the items and the cutoffs for the interview and SNSs. The interview consisted of questions pertaining to 12 items related to a child's motor skills in his/her past and current life, such as skipping, jumping a rope, ball sports, origami, and using chopsticks. The SNS evaluation included 5 tests, namely, standing on one leg with eyes closed, diadochokinesia, associated movements during diadochokinesia, finger opposition test, and laterally fixed gaze. We applied this method to 43 children, including 25 cases of developmental disorders. Children showing significantly high scores in both the interview and SNS were assigned to the "with motor difficulty" group, while those with low scores in both the tests were assigned to the "without motor difficulty" group. The remaining children were assigned to the "with suspicious motor difficulty" group. More than 90% of the children in the "with motor difficulty" group had high impairment scores in Movement Assessment Battery for Children (M-ABC), a standardized motor test, whereas 82% of the children in the "without motor difficulty" group revealed no motor impairment. Thus, we conclude that our simple method and criteria would be useful for the evaluation of motor difficulty of childhood. Further, we have discussed the diagnostic process for developmental coordination disorder using our evaluation method.
ERIC Educational Resources Information Center
Casabianca, Jodi M.; Lewis, Charles
2015-01-01
Loglinear smoothing (LLS) estimates the latent trait distribution while making fewer assumptions about its form and maintaining parsimony, thus leading to more precise item response theory (IRT) item parameter estimates than standard marginal maximum likelihood (MML). This article provides the expectation-maximization algorithm for MML estimation…
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.
Bernhofer, Esther I; St Marie, Barbara; Bena, James F
2017-08-01
All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Aritake, Sayaka; Asaoka, Shoichi; Kagimura, Tatsuo; Shimura, Akiyoshi; Futenma, Kunihiro; Komada, Yoko; Inoue, Yuichi
2015-04-01
This study was conducted to determine what symptom components or conditions of insomnia are related to subjective feelings of insomnia, low health-related quality of life (HRQOL), or depression. Data from 7,027 Japanese adults obtained using an Internet-based questionnaire survey was analyzed to examine associations between demographic variables and each sleep difficulty symptom item on the Pittsburgh Sleep Quality Index (PSQI) with the presence/absence of subjective insomnia and scores on the Short Form-8 (SF-8) and Center for Epidemiologic Studies Depression Scale (CES-D). Prevalence of subjective insomnia was 12.2% (n = 860). Discriminant function analysis revealed that item scores for sleep quality, sleep latency, and sleep medication use on the PSQI and CES-D showed relatively high discriminant function coefficients for identifying positivity for the subjective feeling of insomnia. Among respondents with subjective insomnia, a low SF-8 physical component summary score was associated with higher age, depressive state, and PSQI items for sleep difficulty and daytime dysfunction, whereas a low SF-8 mental component summary score was associated with depressive state, PSQI sleep latency, sleeping medication use, and daytime dysfunction. Depressive state was significantly associated with sleep latency, sleeping medication use, and daytime dysfunction. Among insomnia symptom components, disturbed sleep quality and sleep onset insomnia may be specifically associated with subjective feelings of the disorder. The existence of a depressive state could be significantly associated with not only subjective insomnia but also mental and physical QOL. Our results also suggest that different components of sleep difficulty, as measured by the PSQI, might be associated with mental and physical QOL and depressive status.
Dima, Alexandra Lelia; Schulz, Peter Johannes
2017-01-01
Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Jeong, Eunju; Lesiuk, Teresa L
2011-01-01
Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
... of items, gradual buildup of clutter in living spaces and difficulty discarding things are usually the first ... for which there is no immediate need or space. By middle age, symptoms are often severe and ...
A Diagnostic Assessment for Introductory Molecular and Cell Biology
Wood, William B.; Martin, Jennifer M.; Guild, Nancy A.; Vicens, Quentin; Knight, Jennifer K.
2010-01-01
We have developed and validated a tool for assessing understanding of a selection of fundamental concepts and basic knowledge in undergraduate introductory molecular and cell biology, focusing on areas in which students often have misconceptions. This multiple-choice Introductory Molecular and Cell Biology Assessment (IMCA) instrument is designed for use as a pre- and posttest to measure student learning gains. To develop the assessment, we first worked with faculty to create a set of learning goals that targeted important concepts in the field and seemed likely to be emphasized by most instructors teaching these subjects. We interviewed students using open-ended questions to identify commonly held misconceptions, formulated multiple-choice questions that included these ideas as distracters, and reinterviewed students to establish validity of the instrument. The assessment was then evaluated by 25 biology experts and modified based on their suggestions. The complete revised assessment was administered to more than 1300 students at three institutions. Analysis of statistical parameters including item difficulty, item discrimination, and reliability provides evidence that the IMCA is a valid and reliable instrument with several potential uses in gauging student learning of key concepts in molecular and cell biology. PMID:21123692
Arias, Victor B.; Nuñez, Daniel E.; Martínez-Molina, Agustín; Ponce, Fernando P.; Arias, Benito
2016-01-01
The Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic criteria assume that the 18 symptoms carry the same weight in an Attention Deficit with Hyperactivity Disorder (ADHD) diagnosis and bear the same discriminatory capacity. However, it is reasonable to think that symptoms may differ in terms of severity and even in the reliability with they represent the disorder. To test this hypothesis, the aim of this study was to calibrate in a sample of Spanish children (age 4–7; n = 784) a scale for assessing the symptoms of ADHD proposed by Diagnostic and Statistical Manual of Mental Disorders, IV-TR within the framework of Item Response Theory. Samejima’s Graded Response Model was used as a method for estimating the item difficulty and discrimination parameters. The results showed that ADHD subscales (Attention Deficit and Hyperactivity / Impulsivity) had good psychometric properties and had also a good fit to the model. However, relevant differences between symptoms were observed at the level of severity, informativeness and reliability for the assessment of ADHD. This finding suggests that it would be useful to identify the symptoms that are more important than the others with regard to diagnosing ADHD. PMID:27736911
Arias, Victor B; Nuñez, Daniel E; Martínez-Molina, Agustín; Ponce, Fernando P; Arias, Benito
2016-01-01
The Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic criteria assume that the 18 symptoms carry the same weight in an Attention Deficit with Hyperactivity Disorder (ADHD) diagnosis and bear the same discriminatory capacity. However, it is reasonable to think that symptoms may differ in terms of severity and even in the reliability with they represent the disorder. To test this hypothesis, the aim of this study was to calibrate in a sample of Spanish children (age 4-7; n = 784) a scale for assessing the symptoms of ADHD proposed by Diagnostic and Statistical Manual of Mental Disorders, IV-TR within the framework of Item Response Theory. Samejima's Graded Response Model was used as a method for estimating the item difficulty and discrimination parameters. The results showed that ADHD subscales (Attention Deficit and Hyperactivity / Impulsivity) had good psychometric properties and had also a good fit to the model. However, relevant differences between symptoms were observed at the level of severity, informativeness and reliability for the assessment of ADHD. This finding suggests that it would be useful to identify the symptoms that are more important than the others with regard to diagnosing ADHD.
Evaluation of five guidelines for option development in multiple-choice item-writing.
Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva
2009-05-01
This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Three controversies over item disclosure in medical licensure examinations.
Park, Yoon Soo; Yang, Eunbae B
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Selective loss of verbal imagery.
Mehta, Z; Newcombe, F
1996-05-01
This single case study of the ability to generate verbal and non-verbal imagery in a woman who sustained a gunshot wound to the brain reports a significant difficulty in generating images of word shapes but not a significant problem in generating object images. Further dissociation, however, was observed in her ability to generate images of living vs non-living material. She made more errors in imagery and factual information tasks for non-living items than for living items. This pattern contrasts with our previous report of the agnosic patient, M.S., who had severe difficulty in generating images of living material, whereas his ability to image the shape of words was comparable to that of normal control subjects. Furthermore, with regard to the generation of images of living compared with non-living material, M.S. shows more errors with living than nonliving items. In contrast, the present patient, S.M., made significantly more errors with non-living relative to living items. There appear to be two types of double dissociation which reinforce the growing evidence of dissociable impairments in the ability to generate images for different types of verbal and non-verbal material. Such dissociations, presumably related to sensory and cognitive processing demands, address the problem of the neural basis of imagery.
Semiparametric Item Response Functions in the Context of Guessing
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2016-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory.
Pasca, Laura; Aragonés, Juan I; Coello, María T
2017-01-01
The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
Short-term memory in autism spectrum disorder.
Poirier, Marie; Martin, Jonathan S; Gaigg, Sebastian B; Bowler, Dermot M
2011-02-01
Three experiments examined verbal short-term memory in comparison and autism spectrum disorder (ASD) participants. Experiment 1 involved forward and backward digit recall. Experiment 2 used a standard immediate serial recall task where, contrary to the digit-span task, items (words) were not repeated from list to list. Hence, this task called more heavily on item memory. Experiment 3 tested short-term order memory with an order recognition test: Each word list was repeated with or without the position of 2 adjacent items swapped. The ASD group showed poorer performance in all 3 experiments. Experiments 1 and 2 showed that group differences were due to memory for the order of the items, not to memory for the items themselves. Confirming these findings, the results of Experiment 3 showed that the ASD group had more difficulty detecting a change in the temporal sequence of the items. (c) 2010 APA, all rights reserved.
Development of the Serenity Scale.
Roberts, K T; Aspy, C B
1993-01-01
Serenity is a sustained inner peace. Nurses can use knowledge about serenity to help clients cope with harsh circumstances. The Serenity Scale is a 40-item self-report, summated scale that evaluates clients' serenity status. Critical attributes, identified by serenity experts, served as the theoretical framework. Sixty-five items were given to 542 male and female subjects age 20 to 95 (73% Caucasians and 27% minority) from varying income and educational levels yielding an alpha of .93. Forty items (SS.V2) were extracted for further analysis. The alpha coefficient was .92 with item-to-total correlations ranging from .25 to .67. Item means ranged from 2.6-3.7 (grand mean = 3.4). A principal components factor analysis with varimax rotation revealed nine factors explaining 58.2% of the variance. Limitations are that SS.V2 has not been tested with an independent sample and subjects with low educational levels had difficulty with some items.
Students’ understanding of forces: Force diagrams on horizontal and inclined plane
NASA Astrophysics Data System (ADS)
Sirait, J.; Hamdani; Mursyid, S.
2018-03-01
This study aims to analyse students’ difficulties in understanding force diagrams on horizontal surfaces and inclined planes. Physics education students (pre-service physics teachers) of Tanjungpura University, who had completed a Basic Physics course, took a Force concept test which has six questions covering three concepts: an object at rest, an object moving at constant speed, and an object moving at constant acceleration both on a horizontal surface and on an inclined plane. The test is in a multiple-choice format. It examines the ability of students to select appropriate force diagrams depending on the context. The results show that 44% of students have difficulties in solving the test (these students only could solve one or two items out of six items). About 50% of students faced difficulties finding the correct diagram of an object when it has constant speed and acceleration in both contexts. In general, students could only correctly identify 48% of the force diagrams on the test. The most difficult task for the students in terms was identifying the force diagram representing forces exerted on an object on in an inclined plane.
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test
Moore, Mariah; Gordon, Peter C.
2015-01-01
In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.
Moore, Mariah; Gordon, Peter C
2015-12-01
In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Sousa, Renata M; Dewey, Michael E; Acosta, Daisy; Jotheeswaran, AT; Castro-Costa, Erico; Ferri, Cleusa P; Guerra, Mariella; Huang, Yueqin; Jacob, KS; Pichardo, Juana Guillermina Rodriguez; Ramírez, Nayeli Garcia; Rodriguez, Juan Llibre; Rodriguez, Marina Calvo; Salas, Aquiles; Sosa, Ana Luisa; Williams, Joseph; Prince, Martin J
2010-01-01
We evaluated the psychometric properties of the 12-item interviewer-administered screener version of the World Health Organization Disability Assessment Schedule – version II (WHODAS II) among older people living in seven low- and middle-income countries. Principal component analysis (PCA), confirmatory factor analysis (CFA) and Mokken analyses were carried out to test for unidimensionality, hierarchical structure, and measurement invariance across 10/66 Dementia Research Group sites. PCA generated a one-factor solution in most sites. In CFA, the two-factor solution generated in Dominican Republic fitted better for all sites other than rural China. The two factors were not easily interpretable, and may have been an artefact of differing item difficulties. Strong internal consistency and high factor loadings for the one-factor solution supported unidimensionality. Furthermore, the WHODAS II was found to be a ‘strong’ Mokken scale. Measurement invariance was supported by the similarity of factor loadings across sites, and by the high between-site correlations in item difficulties. The Mokken results strongly support that the WHODAS II 12-item screener is a unidimensional and hierarchical scale confirming to item response theory (IRT) principles, at least at the monotone homogeneity model level. More work is needed to assess the generalizability of our findings to different populations. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20104493
Powell, Sarah R.; Fuchs, Lynn S.
2014-01-01
According to national mathematics standards, algebra instruction should begin at kindergarten and continue through elementary school. Most often, teachers address algebra in the elementary grades with problems related to solving equations or understanding functions. With 789 2nd- grade students, we administered (a) measures of calculations and word problems in the fall and (b) an assessment of pre-algebraic reasoning, with items that assessed solving equations and functions, in the spring. Based on the calculation and word-problem measures, we placed 148 students into 1 of 4 difficulty status categories: typically performing, calculation difficulty, word-problem difficulty, or difficulty with calculations and word problems. Analyses of variance were conducted on the 148 students; path analytic mediation analyses were conducted on the larger sample of 789 students. Across analyses, results corroborated the finding that word-problem difficulty is more strongly associated with difficulty with pre-algebraic reasoning. As an indicator of later algebra difficulty, word-problem difficulty may be a more useful predictor than calculation difficulty, and students with word-problem difficulty may require a different level of algebraic reasoning intervention than students with calculation difficulty. PMID:25309044
Non-ignorable missingness item response theory models for choice effects in examinee-selected items.
Liu, Chen-Wei; Wang, Wen-Chung
2017-11-01
Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
An Examination of Two Procedures for Identifying Consequential Item Parameter Drift
ERIC Educational Resources Information Center
Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu
2014-01-01
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…
Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2015-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory
ERIC Educational Resources Information Center
Anil, Duygu
2008-01-01
In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…
ERIC Educational Resources Information Center
Kawahara, Jun-ichiro; Enns, James T.
2009-01-01
When observers try to identify successive targets in a visual stream at a rate of 100 ms per item, accuracy for the 2nd target is impaired for intertarget lags of 100-500 ms. Yet, when the same stream is presented more rapidly (e.g., 50 ms per item), this pattern reverses and a 1st-target deficit is obtained. M. C. Potter, A. Staub, and D. H.…
When students can choose easy, medium, or hard homework problems
NASA Astrophysics Data System (ADS)
Teodorescu, Raluca E.; Seaton, Daniel T.; Cardamone, Caroline N.; Rayyan, Saif; Abbott, Jonathan E.; Barrantes, Analia; Pawl, Andrew; Pritchard, David E.
2012-02-01
We investigate student-chosen, multi-level homework in our Integrated Learning Environment for Mechanics [1] built using the LON-CAPA [2] open-source learning system. Multi-level refers to problems categorized as easy, medium, and hard. Problem levels were determined a priori based on the knowledge needed to solve them [3]. We analyze these problems using three measures: time-per-problem, LON-CAPA difficulty, and item difficulty measured by item response theory. Our analysis of student behavior in this environment suggests that time-per-problem is strongly dependent on problem category, unlike either score-based measures. We also found trends in student choice of problems, overall effort, and efficiency across the student population. Allowing students choice in problem solving seems to improve their motivation; 70% of students worked additional problems for which no credit was given.
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination
Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.
2014-01-01
Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
An Evaluation of Hierarchical Bayes Estimation for the Two- Parameter Logistic Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho
Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item parameters. Simulated data sets were analyzed using two different Bayes estimation procedures, the two-stage hierarchical Bayes estimation (HB2) and the marginal Bayesian with known hyperparameters (MB), and marginal maximum…
Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes
2017-04-11
The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Rodrigues-Bigaton, Delaine; de Castro, Ester M; Pires, Paulo F
Rasch analysis has been used in recent studies to test the psychometric properties of a questionnaire. The conditions for use of the Rasch model are one-dimensionality (assessed via prior factor analysis) and local independence (the probability of getting a particular item right or wrong should not be conditioned upon success or failure in another). To evaluate the dimensionality and the psychometric properties of the Fonseca anamnestic index (FAI), such as the fit of the data to the model, the degree of difficulty of the items, and the ability to respond in patients with myogenous temporomandibular disorder (TMD). The sample consisted of 94 women with myogenous TMD, diagnosed by the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD), who answered the FAI. For the factor analysis, we applied the Kaiser-Meyer-Olkin test, Bartlett's sphericity, Spearman's correlation, and the determinant of the correlation matrix. For extraction of the factors/dimensions, an eigenvalue >1.0 was used, followed by oblique oblimin rotation. The Rasch analysis was conducted on the dimension that showed the highest proportion of variance explained. Adequate sample "n" and FAI multidimensionality were observed. Dimension 1 (primary) consisted of items 1, 2, 3, 6, and 7. All items of dimension 1 showed adequate fit to the model, being observed according to the degree of difficulty (from most difficult to easiest), respectively, items 2, 1, 3, 6, and 7. The FAI presented multidimensionality with its main dimension consisting of five reliable items with adequate fit to the composition of its structure. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
Sakakibara, Brodie M.; Miller, William C.; Backman, Catherine L.
2012-01-01
Objective To explore shortened response formats for use with the Activities-specific Balance Confidence scale and then: 1) evaluate the unidimensionality of the scale; 2) evaluate the item difficulty; 3) evaluate the scale for redundancy and content gaps; and 4) evaluate the item standard error of measurement (SEM) and internal consistency reliability among aging individuals (≥50 years) with a lower-limb amputation living in the community. Design Secondary analysis of cross-sectional survey and chart review data. Setting Out-patient amputee clinics, Ontario, Canada. Participants Four hundred forty eight community living adults, at least 50 years old (mean = 68 years), who have used a prosthesis for at least 6 months for a major unilateral lower limb amputation. Three hundred twenty five (72.5%) were men. Intervention N/a Main Outcome Measure(s) Activities-specific Balance Confidence Scale. Results A 5-option response format outperformed 4- and 6-option formats. Factor analyses confirmed a unidimensional scale. The distance between response options is not the same for all items on the scale, evident by the Partial Credit Model (PCM) having a better fit to the data than the Rating Scale Model. Two items, however, did not fit the PCM within statistical reason. Revising the wording of the two items may resolve the misfit, and improve the construct validity and lower the SEM. Overall, the difficulty of the scale’s items is appropriate for use with aging individuals with lower-limb amputation, and is most reliable (Cronbach ∝ = 0.94) for use with individuals with moderately low balance confidence levels. Conclusions The ABC-scale with a simplified 5-option response format is a valid and reliable measure of balance confidence for use with individuals aging with a lower limb amputation. PMID:21704978
Cairnduff, Victoria; Dean, Moira; Koidis, Anastasios
2016-09-01
Food preparation and storage behaviors in the home deviating from the "best practice" food safety recommendations may result in foodborne illnesses. Currently, there are limited tools available to fully evaluate the consumer knowledge, perceptions, and behavior in the area of refrigerator safety. The current study aimed to develop a valid and reliable tool in the form of a questionnaire, the Consumer Refrigerator Safety Questionnaire (CRSQ), for assessing systematically all these aspects. Items relating to refrigerator safety knowledge (n =17), perceptions (n =46), and reported behavior (n =30) were developed and pilot tested by an expert reference group and various consumer groups to assess face and content validity (n =20), item difficulty and consistency (n =55), and construct validity (n =23). The findings showed that the CRSQ has acceptable face and content validity with acceptable levels of item difficulty. Item consistency was observed for 12 of 15 in refrigerator safety knowledge. Further, all 5 of the subscales of consumer perceptions of refrigerator safety practices relating to risk of developing foodborne disease showed acceptable internal consistency (Cronbach's α value > 0.8). Construct validity of the CRSQ was shown to be very good (P = 0.022). The CRSQ exhibited acceptable test-retest reliability at 14 days with the majority of knowledge items (93.3%) and reported behavior items (96.4%) having correlation coefficients of greater than 0.70. Overall, the CRSQ was deemed valid and reliable in assessing refrigerator safety knowledge and behavior; therefore, it has the potential for future use in identifying groups of individuals at increased risk of deviating from recommended refrigerator safety practices, as well as the assessment of refrigerator safety knowledge and behavior for use before and after an intervention.
ERIC Educational Resources Information Center
Xu, Xueli; Jia, Yue
2011-01-01
Estimation of item response model parameters and ability distribution parameters has been, and will remain, an important topic in the educational testing field. Much research has been dedicated to addressing this task. Some studies have focused on item parameter estimation when the latent ability was assumed to follow a normal distribution,…
ERIC Educational Resources Information Center
Zhang, Jinming; Lu, Ting
2007-01-01
In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter…
ERIC Educational Resources Information Center
Tsutakawa, Robert K.; Lin, Hsin Ying
Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
ERIC Educational Resources Information Center
Immekus, Jason C.; Maller, Susan J.
2009-01-01
The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
Three controversies over item disclosure in medical licensure examinations
Park, Yoon Soo; Yang, Eunbae B.
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
[Perceptions on item disclosure for the Korean medical licensing examination].
Yang, Eunbae B
2015-09-01
This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Measuring the quality of life in hypertension according to Item Response Theory
Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia
2017-01-01
ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
Item Information and Discrimination Functions for Trinary PCM Items.
ERIC Educational Resources Information Center
Akkermans, Wies; Muraki, Eiji
1997-01-01
For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)
Use of Robust z in Detecting Unstable Items in Item Response Theory Models
ERIC Educational Resources Information Center
Huynh, Huynh; Meyer, Patrick
2010-01-01
The first part of this paper describes the use of the robust z[subscript R] statistic to link test forms using the Rasch (or one-parameter logistic) model. The procedure is then extended to the two-parameter and three-parameter logistic and two-parameter partial credit (2PPC) models. A real set of data was used to illustrate the extension. The…
Saudek, Kris; Treat, Robert
2015-01-01
Purpose At our institution, speculation amongst medical students and faculty exists as to whether team-based learning (TBL) can improve scores on high-stakes examinations over traditional didactic lectures. Faculty with experience using TBL developed and piloted a required TBL blood disorders (BD) module for third-year medical students on their pediatric clerkship. The purpose of this study is to analyze the BD scores from the NBME subject exams before and after the introduction of the module. Methods We analyzed institutional and national item difficulties for BD items from the NBME pediatrics content area item analysis reports from 2011 to 2014 before (pre) and after (post) the pilot (October 2012). Total scores of 590 NBME subject examination students from examinee performance profiles were analyzed pre/post. t-Tests and Cohen's d effect sizes were used to analyze item difficulties for institutional versus national scores and pre/post comparisons of item difficulties and total scores. Results BD scores for our institution were 0.65 (±0.19) compared to 0.62 (±0.15) nationally (P=0.346; Cohen's d=0.15). The average of post-consecutive BD scores for our students was 0.70(±0.21) compared to examinees nationally [0.64 (±0.15)] with a significant mean difference (P=0.031; Cohen's d=0.43). The difference in our institutions pre [0.65 (±0.19)] and post [0.70 (±0.21)] BD scores trended higher (P=0.391; Cohen's d=0.27). Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms. Conclusions Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms.
ERIC Educational Resources Information Center
Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong
2010-01-01
The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…
Monclús Cols, Ester; Nicolás Ocejo, David; Sánchez Sánchez, Miquel; Ortega Romero, Mar
2015-02-01
To detect the problems hospital emergency room staff have when prescribing and administering antibiotics. A 14-item questionnaire was designed to assess staff members' knowledge of the importance of starting antibiotic treatment promptly, assigning appropriate dosing intervals, adjusting for renal function, and switching to oral therapy. Agreement with each item was expressed on a 5-point Likert scale. Items with a rate of appropriate response of less than 75% were targeted for specific attention. Two hundred questionnaires were distributed to the staff and 150 were returned completed (response rate, 75%). The following items were targeted for attention based on rates of appropriate response of less than 75%: clear medical orders (65%), understanding the implication of early empirical antibiotic therapy on prognosis in serious infections (67%), estimation of the prevalence of renal insufficiency (42%), assumption that a creatinine serum level under < 1.6 mg/dL is safe (33%), use of glomerular filtration rate to adjust dose according to renal function (47%), and an understanding of switching from intravenous to oral treatment (60%). This study revealed the difficulties medical and nursing staff have in prescribing and administering antibiotics in a hospital emergency department. The results can facilitate improvements in antibiotic therapy by pinpointing areas to target for specific training interventions or the design of electronic prescribing aids.
Parameter Estimation in Rasch Models for Examinee-Selected Items
ERIC Educational Resources Information Center
Liu, Chen-Wei; Wang, Wen-Chung
2017-01-01
The examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using…
Perez, Kathryn E.; Hiatt, Anna; Davis, Gregory K.; Trujillo, Caleb; French, Donald P.; Terry, Mark; Price, Rebecca M.
2013-01-01
The American Association for the Advancement of Science 2011 report Vision and Change in Undergraduate Biology Education encourages the teaching of developmental biology as an important part of teaching evolution. Recently, however, we found that biology majors often lack the developmental knowledge needed to understand evolutionary developmental biology, or “evo-devo.” To assist in efforts to improve evo-devo instruction among undergraduate biology majors, we designed a concept inventory (CI) for evolutionary developmental biology, the EvoDevoCI. The CI measures student understanding of six core evo-devo concepts using four scenarios and 11 multiple-choice items, all inspired by authentic scientific examples. Distracters were designed to represent the common conceptual difficulties students have with each evo-devo concept. The tool was validated by experts and administered at four institutions to 1191 students during preliminary (n = 652) and final (n = 539) field trials. We used student responses to evaluate the readability, difficulty, discriminability, validity, and reliability of the EvoDevoCI, which included items ranging in difficulty from 0.22–0.55 and in discriminability from 0.19–0.38. Such measures suggest the EvoDevoCI is an effective tool for assessing student understanding of evo-devo concepts and the prevalence of associated common conceptual difficulties among both novice and advanced undergraduate biology majors. PMID:24297293
Kulich, Károly; Keininger, Dorothy L; Tiplady, Brian; Banerji, Donald
2015-01-01
Symptoms, particularly dyspnea, and activity limitation, have an impact on the health status and the ability to function normally in patients with chronic obstructive pulmonary disease (COPD). To develop an electronic patient diary (eDiary), qualitative patient interviews were conducted from 2009 to 2010 to identify relevant symptoms and degree of bother due to symptoms. The eDiary was completed by a subset of 209 patients with moderate-to-severe COPD in the 26-week QVA149 SHINE study. Two morning assessments (since awakening and since the last assessment) and one evening assessment were made each day. Assessments covered five symptoms ("shortness of breath," "phlegm/mucus," "chest tightness," "wheezing," and "coughing") and two impact items ("bothered by COPD" and "difficulty with activities") and were scored on a 10-point numeric scale. Patient compliance with the eDiary was 90.4% at baseline and 81.3% at week 26. Correlations between shortness of breath and impact items were >0.95. Regression analysis showed that shortness of breath was a highly significant (P<0.0001) predictor of impact items. Exploratory factor analysis gave a single factor comprising all eDiary items, including both symptoms and impact items. Shortness of breath, the total score (including five symptoms and two impact items), and the five-item symptom score from the eDiary performed well, with good consistency and reliability. The eDiary showed good sensitivity to change, with a 0.6 points reduction in the symptoms scores (on a 0-10 point scale) representing a meaningful change. The eDiary was found to be valid, reliable, and responsive. The high correlations obtained between "shortness of breath" and the ratings of "bother" and "difficulty with activities" confirmed the relevance of this symptom in patients with COPD. Future studies will be required to explore further psychometric properties and their ability to differentiate between COPD treatments.
The stroke impairment assessment set: its internal consistency and predictive validity.
Tsuji, T; Liu, M; Sonoda, S; Domen, K; Chino, N
2000-07-01
To study the scale quality and predictive validity of the Stroke Impairment Assessment Set (SIAS) developed for stroke outcome research. Rasch analysis of the SIAS; stepwise multiple regression analysis to predict discharge functional independence measure (FIM) raw scores from demographic data, the SIAS scores, and the admission FIM scores; cross-validation of the prediction rule. Tertiary rehabilitation center in Japan. One hundred ninety stroke inpatients for the study of the scale quality and the predictive validity; a second sample of 116 stroke inpatients for the cross-validation study. Mean square fit statistics to study the degree of fit to the unidimensional model; logits to express item difficulties; discharge FIM scores for the study of predictive validity. The degree of misfit was acceptable except for the shoulder range of motion (ROM), pain, visuospatial function, and speech items; and the SIAS items could be arranged on a common unidimensional scale. The difficulty patterns were identical at admission and at discharge except for the deep tendon reflexes, ROM, and pain items. They were also similar for the right- and left-sided brain lesion groups except for the speech and visuospatial items. For the prediction of the discharge FIM scores, the independent variables selected were age, the SIAS total scores, and the admission FIM scores; and the adjusted R2 was .64 (p < .0001). Stability of the predictive equation was confirmed in the cross-validation sample (R2 = .68, p < .001). The unidimensionality of the SIAS was confirmed, and the SIAS total scores proved useful for stroke outcome prediction.
Improved Classification of Mammograms Following Idealized Training
Hornsby, Adam N.; Love, Bradley C.
2014-01-01
People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making. PMID:24955325
Improved Classification of Mammograms Following Idealized Training.
Hornsby, Adam N; Love, Bradley C
2014-06-01
People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making.
Informed choice: understanding knowledge in the context of screening uptake.
Michie, Susan; Dormandy, Elizabeth; Marteau, Theresa M
2003-07-01
This study evaluates a scale measuring knowledge about a screening test and investigates the association between knowledge, uptake and attitudes towards screening. One thousand four hundred ninety-nine pregnant women completed the knowledge scale of the multidimensional measure of informed choice (MMIC). Three hundred forty-five of these women and 152 professionals providing antenatal care also rated the importance of the knowledge items. Item characteristic curves show that, with one exception, the knowledge items reflect a spread of difficulty and are able to discriminate between people. All items were seen as essential or helpful by both women and health professionals, with two items seen as particularly important and one as unimportant. There were some differences between health professionals, women with low risk results and women with high risk results. Knowledge was not associated with uptake, attitude, or the extent to which uptake was consistent with women's attitudes towards undergoing the test.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
ERIC Educational Resources Information Center
He, Yong
2013-01-01
Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
A Note on Item-Restscore Association in Rasch Models
ERIC Educational Resources Information Center
Kreiner, Svend
2011-01-01
To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test
ERIC Educational Resources Information Center
He, Wei; Reckase, Mark D.
2014-01-01
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Unidimensional Interpretations for Multidimensional Test Items
ERIC Educational Resources Information Center
Kahraman, Nilufer
2013-01-01
This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Development and initial evaluation of the SCI-FI/AT
Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-01-01
Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.
Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-05-01
To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Optimal Designs for the Rasch Model
ERIC Educational Resources Information Center
Grasshoff, Ulrike; Holling, Heinz; Schwabe, Rainer
2012-01-01
In this paper, optimal designs will be derived for estimating the ability parameters of the Rasch model when difficulty parameters are known. It is well established that a design is locally D-optimal if the ability and difficulty coincide. But locally optimal designs require that the ability parameters to be estimated are known. To attenuate this…
Wolfe, Edward W; McGill, Michael T
2011-01-01
This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok
2016-05-01
The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory
Pasca, Laura; Aragonés, Juan I.; Coello, María T.
2017-01-01
The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature. PMID:28824509
Colorado Learning Difficulties Questionnaire:Validation of a parent-report screening measure
Willcutt, Erik G.; Boada, Richard; Riddle, Margaret W.; Chhabildas, Nomita; DeFries, John C.; Pennington, Bruce F.
2011-01-01
This study evaluated the internal structure and convergent and discriminant evidence for the Colorado Learning Difficulties Questionnaire (CLDQ), a 20-item parent-report rating scale that was developed to provide a brief screening measure for learning difficulties. CLDQ ratings were obtained from parents of children in two large community samples and two samples from clinics that specialize in the assessment of learning disabilities and related disorders (total N = 8,004). Exploratory and confirmatory factor analyses revealed five correlated but separable dimensions that were labeled reading, math, social cognition, social anxiety, and spatial difficulties. Results revealed strong convergent and discriminant evidence for the CLDQ Reading scale, suggesting that this scale may provide a useful method to screen for reading difficulties in both research studies and clinical settings. Results are also promising for the other four CLDQ scales, but additional research is needed to refine each of these measures. PMID:21574721
Construction of a Computerized Adaptive Testing Version of the Quebec Adaptive Behavior Scale.
ERIC Educational Resources Information Center
Tasse, Marc J.; And Others
Multilog (Thissen, 1991) was used to estimate parameters of 225 items from the Quebec Adaptive Behavior Scale (QABS). A database containing actual data from 2,439 subjects was used for the parameterization procedures. The two-parameter-logistic model was used in estimating item parameters and in the testing strategy. MicroCAT (Assessment Systems…
Rasch Mixture Models for DIF Detection
Strobl, Carolin; Zeileis, Achim
2014-01-01
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch mixture models is sensitive to the specification of the ability distribution even when the conditional maximum likelihood approach is used. It is demonstrated in a simulation study how differences in ability can influence the latent classes of a Rasch mixture model. If the aim is only DIF detection, it is not of interest to uncover such ability differences as one is only interested in a latent group structure regarding the item difficulties. To avoid any confounding effect of ability differences (or impact), a new score distribution for the Rasch mixture model is introduced here. It ensures the estimation of the Rasch mixture model to be independent of the ability distribution and thus restricts the mixture to be sensitive to latent structure in the item difficulties only. Its usefulness is demonstrated in a simulation study, and its application is illustrated in a study of verbal aggression. PMID:29795819
Teachers' experiences supporting children after traumatic exposure.
Alisic, Eva; Bus, Marissa; Dulack, Wendel; Pennings, Lenneke; Splinter, Jessica
2012-02-01
Teachers can be instrumental in supporting children's recovery after trauma, but some work suggests that elementary school teachers are uncertain about their role and about what to do to assist children effectively after their students have been exposed to traumatic stressors. This study examined the extent to which teachers working with children from ages 8 to 12 years report similar concerns. A random sample of teachers in the Netherlands (N = 765) completed a questionnaire that included 9 items measuring difficulties on a 6-point Likert scale (potential range of total scores: 9-54). The mean total difficulty score was 29.8 (ranging from 10 to 50; SD = 7.37). On individual items, the fraction of teachers scoring 4 or more varied between 25 and 63%. A multiple regression analysis showed that teachers' total scores depended on amount of teaching experience, attendance at trauma-focused training, and the number of traumatized children they had worked with. The model explained 4% of the variance, a small effect. Because traumatic exposure in children is rather common, the findings point to a need to better understand what influences teachers' difficulties and develop trauma-informed practice in elementary schools. Copyright © 2012 International Society for Traumatic Stress Studies.
Schmitter-Edgecombe, Maureen; Parsey, Carolyn; Lamb, Richard
2014-01-01
The Instrumental Activities of Daily Living – Compensation (IADL-C) scale was developed to capture early functional difficulties and to quantify compensatory strategy use that may mitigate functional decline in the aging population. The IADL-C was validated in a sample of cognitively healthy older adults (N=184) and individuals with mild cognitive impairment (MCI; N=92) and dementia (N=24). Factor analysis and Rasch item analysis led to the 27-item IADL-C informant questionnaire with four functional domain subscales (money and self-management, home daily living, travel and event memory, and social skills). The subscales demonstrated good internal consistency (Rasch reliability 0.80 to 0.93) and test-retest reliability (Spearman coefficients 0.70 to 0.91). The IADL-C total score and subscales showed convergent validity with other IADL measures, discriminant validity with psychosocial measures, and the ability to discriminate between diagnostic groups. The money and self management subscale showed notable difficulties for individuals with MCI, whereas difficulties with home daily living became more prominent for dementia participants. Compensatory strategy use increased in the MCI group and decreased in the dementia group. PMID:25344901
ERIC Educational Resources Information Center
Haberman, Shelby J.
2009-01-01
A regression procedure is developed to link simultaneously a very large number of item response theory (IRT) parameter estimates obtained from a large number of test forms, where each form has been separately calibrated and where forms can be linked on a pairwise basis by means of common items. An application is made to forms in which a…
ERIC Educational Resources Information Center
Li, Yanmei
2012-01-01
In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
ERIC Educational Resources Information Center
Thurman, Carol
2009-01-01
The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
ITEMS Project: An online sequence for teaching mathematics and astronomy
NASA Astrophysics Data System (ADS)
Martínez, Bernat; Pérez, Josep
2010-10-01
This work describes an elearning sequence for teaching geometry and astronomy in lower secondary school created inside the ITEMS (Improving Teacher Education in Mathematics and Science) project. It is based on results from the astronomy education research about studentsŠ difficulties in understanding elementary astronomical observations and models. The sequence consists of a set of computer animations embedded in an elearning environment aimed at supporting students in learning about astronomy ideas that require the use of geometrical concepts and visual-spatial reasoning.
Methods for Linking Item Parameters.
1981-08-01
within and across data sets; all proportion-correct distributions were quite platykurtic . Biserial item-total correlations had relatively consistent...would produce a distribution of a parameters which had a larger mean and standard deviation, was more positively skewed, and was somewhat more platykurtic
A Quasi-Parametric Method for Fitting Flexible Item Response Functions
ERIC Educational Resources Information Center
Liang, Longjuan; Browne, Michael W.
2015-01-01
If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Erbacher, Monica K; Schmidt, Karen M; Boker, Steven M; Bergeman, Cindy S
2012-01-01
Positive (PA) and negative affect (NA) are important constructs in health and well-being research. Good longitudinal measurement is crucial to conducting meaningful research on relationships between affect, health, and well-being across the lifespan. One common affect measure, the PANAS, has been evaluated thoroughly with factor analysis, but not with Racsh-based latent trait models (RLTMs) such as the Partial Credit Model (PCM), and not longitudinally. Current longitudinal RLTMs can computationally handle few occasions of data. The present study compares four methods of anchoring PCMs across 56 occasions to longitudinally evaluate the psychometric properties of the PANAS plus additional items. Anchoring item parameters on mean parameter values across occasions produced more desirable results than using no anchor, using first occasion parameters as anchors, or allowing anchor values to vary across occasions. Results indicated problems with NA items, including poor category utilization, gaps in the item distribution, and a lack of easy-to-endorse items. PA items had much more desirable psychometric qualities.
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
IRT Item Parameter Scaling for Developing New Item Pools
ERIC Educational Resources Information Center
Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua
2017-01-01
Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Ten Issues in Criterion-Referenced Testing: A Response to Commonly Heard Criticisms.
ERIC Educational Resources Information Center
Curlette, William L.; Stallings, William M.
1979-01-01
The 10 criticisms of criterion-referenced tests addressed in this paper are: the domains tested; pedagogical influence; difficulty of items; cumbersome reports; reliability; arbitrary criteria; local objectives; labeling; predictive validity; and repeated testing. (SJL)
Rasch model based analysis of the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana
2010-06-01
The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear measures for persons and items from raw test scores and which can provide important insight in the structure and functioning of the test (how item difficulties are distributed within the test, how well the items fit the model, and how well the items work together to define the underlying construct). The data for the Rasch analysis come from the large-scale research conducted in 2006-07, which investigated Croatian high school students’ conceptual understanding of mechanics on a representative sample of 1676 students (age 17-18 years). The instrument used in research was the FCI. The average FCI score for the whole sample was found to be (27.7±0.4)% , indicating that most of the students were still non-Newtonians at the end of high school, despite the fact that physics is a compulsory subject in Croatian schools. The large set of obtained data was analyzed with the Rasch measurement computer software WINSTEPS 3.66. Since the FCI is routinely used as pretest and post-test on two very different types of population (non-Newtonian and predominantly Newtonian), an additional predominantly Newtonian sample ( N=141 , average FCI score of 64.5%) of first year students enrolled in introductory physics course at University of Zagreb was also analyzed. The Rasch model based analysis suggests that the FCI has succeeded in defining a sufficiently unidimensional construct for each population. The analysis of fit of data to the model found no grossly misfitting items which would degrade measurement. Some items with larger misfit and items with significantly different difficulties in the two samples of students do require further examination. The analysis revealed some problems with item distribution in the FCI and suggested that the FCI may function differently in non-Newtonian and predominantly Newtonian population. Some possible improvements of the test are suggested.
2010-01-01
Background Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula. PMID:20338031
Muhammad, Noor Azimah; Shamsuddin, Khadijah; Omar, Khairani; Shah, Shamsul Azhar; Mohd Amin, Rahmah
2014-01-01
Parenting behaviour is culturally sensitive. The aims of this study were (1) to translate the Parental Bonding Instrument into Malay (PBI-M) and (2) to determine its factorial structure and validity among the Malaysian population. The PBI-M was generated from a standard translation process and comprehension testing. The validation study of the PBI-M was administered to 248 college students aged 18 to 22 years. Participants in the comprehension testing had difficulty understanding negative items. Five translated double negative items were replaced with five positive items with similar meanings. Exploratory factor analysis showed a three-factor model for the PBI-M with acceptable reliability. Four negative items (items 3, 4, 8, and 16) and item 19 were omitted from the final PBI-M list because of incorrect placement or low factor loading (< 0.32). Out of the final 20 items of the PBI-M, there were 10 items for the care factor, five items for the autonomy factor and five items for the overprotection factor. All the items loaded positively on their respective factors. The Malaysian population favoured positive items in answering questions. The PBI-M confirmed the three-factor model that consisted of care, autonomy and overprotection. The PBI-M is a valid and reliable instrument to assess the Malaysian parenting style. Confirmatory factor analysis may further support this finding. Malaysia, parenting, questionnaire, validity.
Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E
2009-11-01
The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.
ERIC Educational Resources Information Center
Store, Davie
2013-01-01
The impact of particular types of context effects on actual scores is less understood although there has been some research carried out regarding certain types of context effects under the nonequivalent anchor test (NEAT) design. In addition, the issue of the impact of item context effects on scores has not been investigated extensively when item…
Generating constrained randomized sequences: item frequency matters.
French, Robert M; Perruchet, Pierre
2009-11-01
All experimental psychologists understand the importance of randomizing lists of items. However, randomization is generally constrained, and these constraints-in particular, not allowing immediately repeated items-which are designed to eliminate particular biases, frequently engender others. We describe a simple Monte Carlo randomization technique that solves a number of these problems. However, in many experimental settings, we are concerned not only with the number and distribution of items but also with the number and distribution of transitions between items. The algorithm mentioned above provides no control over this. We therefore introduce a simple technique that uses transition tables for generating correctly randomized sequences. We present an analytic method of producing item-pair frequency tables and item-pair transitional probability tables when immediate repetitions are not allowed. We illustrate these difficulties and how to overcome them, with reference to a classic article on word segmentation in infants. Finally, we provide free access to an Excel file that allows users to generate transition tables with up to 10 different item types, as well as to generate appropriately distributed randomized sequences of any length without immediately repeated elements. This file is freely available from http://leadserv.u-bourgogne.fr/IMG/xls/TransitionMatrix.xls.
Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Bornia, Antonio Cezar; de Andrade, Dalton Francisco; Campos, Lucila Maria de Souza
2012-01-01
Growing challenges with respect to preserving the environment have forced changes in company operational structures. Thus, the objective of this article is to measure the evidence of Environmental Management using the Item Response Theory, based on website analysis from Brazilian industrial companies from sectors defined through the scope of the research. This is a qualitative, exploratory, and descriptive study related to an information collection and analysis instrument. The general view of the research problem with respect to the phenomenon under study in based on multi-case studies, with the methodological outline based on the theoretical reference used. Primary data was gathered from 270 company websites from 7 different Brazilian sectors and led to the creation of 26 items approved by environmental specialists. The results were attained with the measuring of Environmental Management evidence via the Item Response Theory, providing a clear order of the items involved based on each item's level of difficulty, quality, and propriety. This permitted the measurement of each item's quality and propriety, as well as that of the respondents, placing them on the same analysis scale. Increasing the number of items and companies involved is suggested fEor future research in order to permit broader sector analysis.
NASA Technical Reports Server (NTRS)
Berg, S. L.; Sheridan, T. B.
1984-01-01
Four highly experienced Air Force pilots each flew four simulated flight scenarios. Two scenarios required a great deal of aircraft maneuvering. The other two scenarios involved less maneuvering, but required remembering a number of items. All scenarios were designed to be equaly challenging. Pilot's Subjective Ratings for Activity-level, Complexity, Difficulty, Stress, and Workload were higher for the manuevering scenarios than the memory scenarios. At a moderate workload level, keeping the pilots active resulted in better aircraft control. When required to monitor and remember items, aircraft control tended to decrease. Pilots tended to weigh information about the spatial positioning and performance of their aircraft more heavily than other items.
ERIC Educational Resources Information Center
Michaelides, Michalis P.
2006-01-01
Consistent behavior is a desirable characteristic that common items are expected to have when administered to different groups. Findings from the literature have established that items do not always behave in consistent ways; item indices and IRT item parameter estimates of the same items differ when obtained from different administrations.…
ERIC Educational Resources Information Center
Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.
2011-01-01
The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
Stress Producing Conditions in the Secondary Classroom.
ERIC Educational Resources Information Center
Bruner, Anna L.; And Others
Secondary school teachers in the Houston Independent School District were asked for their perceptions of variables that contributed to difficulty of teaching. A 60-item questionnaire included variables from school environment categories: student characteristics, managerial efficiency, instructional program, material resources, teacher personal…
Accommodations for Multiple Choice Tests
ERIC Educational Resources Information Center
Trammell, Jack
2011-01-01
Students with learning or learning-related disabilities frequently struggle with multiple choice assessments due to difficulty discriminating between items, filtering out distracters, and framing a mental best answer. This Practice Brief suggests accommodations and strategies that disability service providers can utilize in conjunction with…
Ammerlaan, Judy W; van Os-Medendorp, Harmieke; Sont, Jacob K; Elsworth, Gerald R; Osborne, Richard H
2017-01-31
The Health Education Impact Questionnaire (heiQ) evaluates the effectiveness of health education and self-management programs provided to people dealing with a wide range of conditions. Aim of this study was to translate, culturally adapt and validate the Dutch translation of the heiQ and to compare the results with the English, German and French translations. A systematic translation process was undertaken. Psychometric properties were studied among patients with arthritis, atopic dermatitis, food allergy and asthma (n = 286). Factorial validity using confirmatory factor analysis, item difficulty (D), item remainder correlation and composite reliability were conducted. Stability was tested using the intra-class correlation coefficient (ICC). Items were well understood and only minor language adjustments were required. Confirmatory fit indices were >0.95 and item difficulty was D ≥ 0.65 for all items in scales showing acceptable fit indices, except for the reversed Emotional distress scale. Composite reliability ranged between 0.67 and 0.85. Test-retest reliability (n = 93) ICC varied between 0.61 and 0.84. Comparisons with other translations showed comparable fit indices. A lower ICC on Self-monitoring and insight scale was observed. The Dutch translation of the heiQ was found to be well understood and user friendly by patients with Rheumatoid Arthritis, Atopic Dermatitis, Food allergy and asthma and to have robust psychometric properties for evaluating the impact of health education and self-management programs. Given the wide applications of the heiQ and the comparability of the Dutch results with the English, German and French version, the heiQ is a practical and useful questionnaire to evaluate the impact of self-management support programs in different countries and populations with different diseases.
Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.
2007-01-01
Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351
Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H
2007-01-01
Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.
Translation and validation of the Malay version of the Stroke Knowledge Test.
Sowtali, Siti Noorkhairina; Yusoff, Dariah Mohd; Harith, Sakinah; Mohamed, Monniaty
2016-04-01
To date, there is a lack of published studies on assessment tools to evaluate the effectiveness of stroke education programs. This study developed and validated the Malay language version of the Stroke Knowledge Test research instrument. This study involved translation, validity, and reliability phases. The instrument underwent backward and forward translation of the English version into the Malay language. Nine experts reviewed the content for consistency, clarity, difficulty, and suitability for inclusion. Perceived usefulness and utilization were obtained from experts' opinions. Later, face validity assessment was conducted with 10 stroke patients to determine appropriateness of sentences and grammar used. A pilot study was conducted with 41 stroke patients to determine the item analysis and reliability of the translated instrument using the Kuder Richardson 20 or Cronbach's alpha. The final Malay version Stroke Knowledge Test included 20 items with good content coverage, acceptable item properties, and positive expert review ratings. Psychometric investigations suggest that Malay version Stroke Knowledge Test had moderate reliability with Kuder Richardson 20 or Cronbach's alpha of 0.58. Improvement is required for Stroke Knowledge Test items with unacceptable difficulty indices. Overall, the average rating of perceived usefulness and perceived utility of the instruments were both 72.7%, suggesting that reviewers were likely to use the instruments in their facilities. Malay version Stroke Knowledge Test was a valid and reliable tool to assess educational needs and to evaluate stroke knowledge among participants of group-based stroke education programs in Malaysia.
A Methodological Study of Order Effects in Reporting Relational Aggression Experiences.
Serico, Jennifer M; NeMoyer, Amanda; Goldstein, Naomi E S; Houck, Mark; Leff, Stephen S
2018-03-01
Unlike the overt nature of physical aggression, which lends itself to simpler and more direct methods of investigation, the often-masked nature of relational aggression has led to difficulties and debate regarding the most effective tools of study. Given concerns with the accuracy of third-party relational aggression reports, especially as individuals age, self-report measures may be particularly useful when assessing experiences with relational aggression. However, it is important to recognize validity concerns-in particular, the potential effects of item order presentation-associated with self-report of relational aggression perpetration and victimization. To investigate this issue, surveys were administered and completed by 179 young adults randomly assigned to one of four survey conditions reflecting manipulation of item order. Survey conditions included presentation of (a) perpetration items only, (b) victimization items only, (c) perpetration items followed by victimization items, and (d) victimization items followed by perpetration items. Results revealed that participants reported perpetrating relational aggression significantly more often when asked only about perpetration or when asked about perpetration before victimization, compared with participants who were asked about victimization before perpetration. Item order manipulation did not result in significant differences in self-reported victimization experiences. Results of this study indicate a need for greater consideration of item order when conducting research using self-report data and the importance of additional investigation into which form of item presentation elicits the most accurate self-report information.
Haverman, Lotte; Grootenhuis, Martha A; Raat, Hein; van Rossum, Marion A J; van Dulmen-den Broeder, Eline; Hoppenbrouwers, Karel; Correia, Helena; Cella, David; Roorda, Leo D; Terwee, Caroline B
2016-03-01
The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children. It has the potential to be more valid, reliable, and responsive than existing PROMs. The items banks are designed to be self-reported and completed by children aged 8-18 years. The PROMIS items can be administered in short forms or through computerized adaptive testing. This paper describes the translation and cultural adaption of nine PROMIS item banks (151 items) for children in Dutch-Flemish. The translation was performed by FACITtrans using standardized PROMIS methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three independent reviews (at least two Dutch, one Flemish), and pretesting in 24 children from the Netherlands and Flanders. For some items, it was necessary to have separate translations for Dutch and Flemish: physical function-mobility (three items), anger (one item), pain interference (two items), and asthma impact (one item). Challenges faced in the translation process included scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items, or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The Dutch-Flemish PROMIS items are linguistically equivalent to the original USA version. Short forms are now available for use, and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
Hatt, Sarah R; Leske, David A; Wernimont, Suzanne M; Birch, Eileen E; Holmes, Jonathan M
2017-03-01
A rating scale is a critical component of patient-reported outcome instrument design, but the optimal rating scale format for pediatric use has not been investigated. We compared rating scale performance when administering potential questionnaire items to children with eye disorders and their parents. Three commonly used rating scales were evaluated: frequency (never, sometimes, often, always), severity (not at all, a little, some, a lot), and difficulty (not difficult, a little difficult, difficult, very difficult). Ten patient-derived items were formatted for each rating scale, and rating scale testing order was randomized. Both child and parent were asked to comment on any problems with, or a preference for, a particular scale. Any confusion about options or inability to answer was recorded. Twenty-one children, aged 5-17 years, with strabismus, amblyopia, or refractive error were recruited, each with one of their parents. Of the first 10 children, 4 (40%) had problems using the difficulty scale, compared with 1 (10%) using frequency, and none using severity. The difficulty scale was modified, replacing the word "difficult" with "hard." Eleven additional children (plus parents) then completed all 3 questionnaires. No children had problems using any scale. Four (36%) parents had problems using the difficulty ("hard") scale and 1 (9%) with frequency. Regarding preference, 6 (55%) of 11 children and 5 (50%) of 10 parents preferred using the frequency scale. Children and parents found the frequency scale and question format to be the most easily understood. Children and parents also expressed preference for the frequency scale, compared with the difficulty and severity scales. We recommend frequency rating scales for patient-reported outcome measures in pediatric populations.
van de Graaf, Elizabeth S; Borsboom, Gerard J J M; van der Sterre, Geertje W; Felius, Joost; Simonsz, Huibert J; Kelderman, Henk
2017-09-01
The Adult Strabismus Quality of Life Questionnaire (AS-20) and the Amblyopia & Strabismus Questionnaire (A&SQ) both measure health-related quality of life in strabismus patients. We evaluated to what extent these instruments cover similar domains by identifying the underlying quality-of-life factors of the combined questionnaires. Participants were adults from a historic cohort with available orthoptic childhood data documenting strabismus and/or amblyopia. They had previously completed the A&SQ and were now asked to complete the AS-20. Factor analysis was performed on the correlation-matrix of the combined AS-20 and A&SQ data to identify common underlying factors. The identified factors were correlated with the clinical variables of angle of strabismus, degree of binocular vision, and visual acuity of the worse eye. One hundred ten patients completed both questionnaires (mean age, 44 years; range, 38-51 years). Six factors were found that together explained 78% of the total variance. The factor structure was dominated by the first four factors. One factor contained psychosocial and social-contact items, and another factor depth-perception items from both questionnaires. A third factor contained seven items-only from the AS-20-on eye strain, stress, and difficulties with reading and with concentrating. A fourth factor contained seven items-only from the A&SQ-on fear of losing the better eye and visual disorientation, specific for amblyopia. Current visual acuity of the worse eye correlated with depth-perception items and vision-related items, whereas current binocular vision correlated with psychosocial and social-contact items, in 93 patients. Factor analysis suggests that the AS-20 and A&SQ measure a similar psychosocial quality-of-life domain. However, functional problems like avoidance of reading, difficulty in concentrating, eye stress, reading problems, inability to enjoy hobbies, and need for frequent breaks when reading are represented only in the AS-20. During the development of the A&SQ, asthenopia items were considered insufficiently specific for strabismus and were excluded a priori. The patients who generated the items for the AS-20 had, in majority, adulthood-onset strabismus and diplopia and were, hence, more likely to develop such complaints than our adult patients with childhood-onset strabismus and/or amblyopia.
Developing an African youth psychosocial assessment: an application of item response theory.
Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise
2014-06-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Developing an African youth psychosocial assessment: an application of item response theory
BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE
2014-01-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
[SOMS-2: translation into portuguese of the screening for Somatoform Disorders].
Fabião, Cristina; Costa E Silva, Carolina; Fleming, Manuela; Barbosa, António
2008-01-01
The diagnosis of Somatization Disorder (SD) requires the presence of somatic medically unexplained symptoms (MUS) which must be assessed so that organic diseases may be excluded. SOMS-2 is a self-report measure for SD that assesses medically unexplained symptoms by requiring participants to answer affirmatively and qualify any of the complaints as MUS, only if they have obtained from his doctor the opinion that the said complaint is not due to an organic disease. According to the authors, original SOMS-2 has a good internal consistency with Cronbach's a = .87 and a good correlation between selfratings and interview (r = .75). After obtaining the author's permission, translation from and into English has been made by experienced translators. The resulting questionnaire has been used on a small group of patients. Afterwards the items in which there were difficulties in understanding during the pretest were identified and experienced practitioners were asked for suggestions. The resulting version was answered by 123 primary health care patients (sample I). After some modifications of the SOMS-2, another group of 190 primary health care patients answered the questionnaire (sample II). Most patients, in the first sample, found it difficult to understand that, in order to answer affirmatively it was necessary to answer three questions: 1) is the symptom present? 2) has your doctor found no clear causes for the symptom? 3) does the symptom affect your well-being? The difficulties in understanding items 21 and 45 (pre-test) were confirmed. Items 11, 28 and 38 were more easily understood when worded differently. In sample I, less than 5% of positive answers were given to items 20, 21, 23, 40, 43, 45, and 51. Probably because of the low education level of the Portuguese population which this sample reflects, difficulties in carrying out the instructions given at the beginning made it advisable to modify the SOMS-2, so that the three implicit questions in each question of the SOMS-2 were divided into two columns (two explicit questions). Simultaneously attention must continue on controlling severity criterion (the third implicit question). After phase I, the items with an answer rate of less than 5% were eliminated. The majority of them are coincident with the low answer rate items found by the authors of the original version. The next step is to study the internal consistency and the correlation between results of self-ratings and interview, of the resulting version, in order to establish the validity of the SOMS-2 in these populations.
Development and community-based validation of eight item banks to assess mental health.
Batterham, Philip J; Sunderland, Matthew; Carragher, Natacha; Calear, Alison L
2016-09-30
There is a need for precise but brief screening of mental health problems in a range of settings. The development of item banks to assess depression and anxiety has resulted in new adaptive and static screeners that accurately assess severity of symptoms. However, expansion to a wider array of mental health problems is required. The current study developed item banks for eight mental health problems: social anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder, adult attention-deficit hyperactivity disorder, drug use, psychosis and suicidality. The item banks were calibrated in a population-based Australian adult sample (N=3175) by administering large item pools (45-75 items) and excluding items on the basis of local dependence or measurement non-invariance. Item Response Theory parameters were estimated for each item bank using a two-parameter graded response model. Each bank consisted of 19-47 items, demonstrating excellent fit and precision across a range of -1 to 3 standard deviations from the mean. No previous study has developed such a broad range of mental health item banks. The calibrated item banks will form the basis of a new system of static and adaptive measures to screen for a broad array of mental health problems in the community. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José
2013-11-01
To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.
2015-01-01
Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…