assessment item format: Topics by Science.gov

Sample records for assessment item format

Formative Assessment in High School Chemistry Teaching: Investigating the Alignment of Teachers' Goals with Their Items

ERIC Educational Resources Information Center

Sandlin, Benjamin; Harshman, Jordan; Yezierski, Ellen

2015-01-01

A 2011 report by the Department of Education states that understanding how teachers use results from formative assessments to guide their practice is necessary to improve instruction. Chemistry teachers have goals for items in their formative assessments, but the degree of alignment between what is assessed by these items and the teachers' goals…
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all
Sources of difficulty in assessment: example of PISA science items

NASA Astrophysics Data System (ADS)

Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie

2017-03-01

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Multi-Item Direct Behavior Ratings: Dependability of Two Levels of Assessment Specificity

ERIC Educational Resources Information Center

Volpe, Robert J.; Briesch, Amy M.

2015-01-01

Direct Behavior Rating-Multi-Item Scales (DBR-MIS) have been developed as formative measures of behavioral assessment for use in school-based problem-solving models. Initial research has examined the dependability of composite scores generated by summing all items comprising the scales. However, it has been argued that DBR-MIS may offer assessment…
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

PubMed

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

ERIC Educational Resources Information Center

Wan, Lei; Henly, George A.

2012-01-01

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
The Impact of Test Dimensionality, Common-Item Set Format, and Scale Linking Methods on Mixed-Format Test Equating

ERIC Educational Resources Information Center

Öztürk-Gübes, Nese; Kelecioglu, Hülya

2016-01-01

The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Assessment of Computer and Information Literacy in ICILS 2013: Do Different Item Types Measure the Same Construct?

ERIC Educational Resources Information Center

Ihme, Jan Marten; Senkbeil, Martin; Goldhammer, Frank; Gerick, Julia

2017-01-01

The combination of different item formats is found quite often in large scale assessments, and analyses on the dimensionality often indicate multi-dimensionality of tests regarding the task format. In ICILS 2013, three different item types (information-based response tasks, simulation tasks, and authoring tasks) were used to measure computer and…
A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Thurman, Carol

2009-01-01

The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
The Effects of Item Preview on Video-Based Multiple-Choice Listening Assessments

ERIC Educational Resources Information Center

Koyama, Dennis; Sun, Angela; Ockey, Gary J.

2016-01-01

Multiple-choice formats remain a popular design for assessing listening comprehension, yet no consensus has been reached on how multiple-choice formats should be employed. Some researchers argue that test takers must be provided with a preview of the items prior to the input (Buck, 1995; Sherman, 1997); others argue that a preview may decrease the…
Item Response Theory Models for Wording Effects in Mixed-Format Scales

ERIC Educational Resources Information Center

Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

2015-01-01

Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
The assessment of a structured online formative assessment program: a randomised controlled trial

PubMed Central

2014-01-01

Background Online formative assessment continues to be an important area of research and methods which actively engage the learner and provide useful learning outcomes are of particular interest. This study reports on the outcomes of a two year study of medical students using formative assessment tools. Method The study was conducted over two consecutive years using two different strategies for engaging students. The Year 1 strategy involved voluntary use of the formative assessment tool by 129 students. In Year 2, a second cohort of 130 students was encouraged to complete the formative assessment by incorporating summative assessment elements into it. Outcomes from pre and post testing students around the formative assessment intervention were used as measures of learning. To compare improvement scores between the two years a two-way Analysis of Variance (ANOVA) model was fitted to the data. Results The ANOVA model showed that there was a significant difference in improvement scores between students in the two years (mean improvement percentage 19% vs. 38.5%, p < 0.0001). Students were more likely to complete formative assessment items if they had a summative component. In Year 2, the time spent using the formative assessment tool had no impact on student improvement, nor did the number of assessment items completed. Conclusion The online medium is a valuable learning resource, capable of providing timely formative feedback and stimulating student-centered learning. However the production of quality content is a time-consuming task and careful consideration must be given to the strategies employed to ensure its efficacy. Course designers should consider the potential positive impact summative components to formative assessment may have on student engagement and outcomes. PMID:24400883
Caries Risk Assessment Item Importance

PubMed Central

Chaffee, B.W.; Featherstone, J.D.B.; Gansky, S.A.; Cheng, J.; Zhan, L.

2016-01-01

Caries risk assessment (CRA) is widely recommended for dental caries management. Little is known regarding how practitioners use individual CRA items to determine risk and which individual items independently predict clinical outcomes in children younger than 6 y. The objective of this study was to assess the relative importance of pediatric CRA items in dental providers’ decision making regarding patient risk and in association with clinically evident caries, cross-sectionally and longitudinally. CRA information was abstracted retrospectively from electronic patient records of children initially aged 6 to 72 mo at a university pediatric dentistry clinic (n = 3,810 baseline; n = 1,315 with follow-up). The 17-item CRA form included caries risk indicators, caries protective items, and clinical indicators. Conditional random forests classification trees were implemented to identify and assign variable importance to CRA items independently associated with baseline high-risk designation, baseline evident tooth decay, and follow-up evident decay. Thirteen individual CRA items, including all clinical indicators and all but 1 risk indicator, were independently and statistically significantly associated with student/resident providers’ caries risk designation. Provider-assigned baseline risk category was strongly associated with follow-up decay, which increased from low (20.4%) to moderate (30.6%) to high/extreme risk patients (68.7%). Of baseline CRA items, before adjustment, 12 were associated with baseline decay and 7 with decay at follow-up; however, in the conditional random forests models, only the clinical indicators (evident decay, dental plaque, and recent restoration placement) and 1 risk indicator (frequent snacking) were independently and statistically significantly associated with future disease, for which baseline evident decay was the strongest predictor. In this predominantly high-risk population under caries-preventive care, more individual CRA items
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd

The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Shih, Ching-Lin

2010-01-01

Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

ERIC Educational Resources Information Center

Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

2013-01-01

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test

ERIC Educational Resources Information Center

Ho, Tsung-Han; Dodd, Barbara G.

2012-01-01

In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler…
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…

The Impact of Reading Self-Efficacy and Task Value on Reading Comprehension Scores in Different Item Formats

ERIC Educational Resources Information Center

Solheim, Oddny Judith

2011-01-01

It has been hypothesized that students with low self-efficacy will struggle with complex reading tasks in assessment situations. In this study we examined whether perceived reading self-efficacy and reading task value uniquely predicted reading comprehension scores in two different item formats in a sample of fifth-grade students. Results showed…
Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

PubMed

Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

2018-06-01

This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
Development and assessment of floor and ceiling items for the PROMIS physical function item bank

PubMed Central

2013-01-01

Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at
Automatic Item Generation of Probability Word Problems

ERIC Educational Resources Information Center

Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

2009-01-01

Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

ERIC Educational Resources Information Center

Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

2013-01-01

Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts

ERIC Educational Resources Information Center

Boo, Hong Kwen

2007-01-01

Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

ERIC Educational Resources Information Center

Arendasy, Martin E.; Sommer, Markus

2012-01-01

The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Item Response Theory for Peer Assessment

ERIC Educational Resources Information Center

Uto, Masaki; Ueno, Maomi

2016-01-01

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Audio Adapted Assessment Data: Does the Addition of Audio to Written Items Modify the Item Calibration?

ERIC Educational Resources Information Center

Snyder, James

2010-01-01

This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…
Evaluation of Automatic Item Generation Utilities in Formative Assessment Application for Korean High School Students

ERIC Educational Resources Information Center

Choi, Jaehwa; Kim, HeeKyoung; Pak, Seohong

2018-01-01

The recent interests in research in the assessment field have been rapidly shifting from decision-maker-centered assessments to learner-centered assessments (i.e., diagnostic and/or formative assessments). In particular, it is a very important research topic in this field to analyze how these learner-centered assessments are developed more…
Better assessment of physical function: item improvement is neglected but essential.

PubMed

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models
Better assessment of physical function: item improvement is neglected but essential

PubMed Central

2009-01-01

Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model

ERIC Educational Resources Information Center

Wang, Wen-Chung; Wilson, Mark

2005-01-01

This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Using Mutual Information for Adaptive Item Comparison and Student Assessment

ERIC Educational Resources Information Center

Liu, Chao-Lin

2005-01-01

The author analyzes properties of mutual information between dichotomous concepts and test items. The properties generalize some common intuitions about item comparison, and provide principled foundations for designing item-selection heuristics for student assessment in computer-assisted educational systems. The proposed item-selection strategies…
Item generation and design testing of a questionnaire to assess degenerative joint disease-associated pain in cats.

PubMed

Zamprogno, Helia; Hansen, Bernie D; Bondell, Howard D; Sumrell, Andrea Thomson; Simpson, Wendy; Robertson, Ian D; Brown, James; Pease, Anthony P; Roe, Simon C; Hardie, Elizabeth M; Wheeler, Simon J; Lascelles, B Duncan X

2010-12-01

To determine the items (question topics) for a subjective instrument to assess degenerative joint disease (DJD)-associated chronic pain in cats and determine the instrument design most appropriate for use by cat owners. 100 randomly selected client-owned cats from 6 months to 20 years old. Cats were evaluated to determine degree of radiographic DJD and signs of pain throughout the skeletal system. Two groups were identified: high DJD pain and low DJD pain. Owner-answered questions about activity and signs of pain were compared between the 2 groups to define items relating to chronic DJD pain. Interviews with 45 cat owners were performed to generate items. Fifty-three cat owners who had not been involved in any other part of the study, 19 veterinarians, and 2 statisticians assessed 6 preliminary instrument designs. 22 cats were selected for each group; 19 important items were identified, resulting in 12 potential items for the instrument; and 3 additional items were identified from owner interviews. Owners and veterinarians selected a 5-point descriptive instrument design over 11-point or visual analogue scale formats. Behaviors relating to activity were substantially different between healthy cats and cats with signs of DJD-associated pain. Fifteen items were identified as being potentially useful, and the preferred instrument design was identified. This information could be used to construct an owner-based questionnaire to assess feline DJD-associated pain. Once validated, such a questionnaire would assist in evaluating potential analgesic treatments for these patients.
Effects of Test Format, Self Concept and Anxiety on Item Response Changing Behaviour

ERIC Educational Resources Information Center

Afolabi, E. R. I.

2007-01-01

The study examined the effects of item format, self-concept and anxiety on response changing behaviour. Four hundred undergraduate students who offered a counseling psychology course in a Nigerian university participated in the study. Students' answers in multiple--choice and true--false formats of an achievement test were observed for response…
Writing, Evaluating and Assessing Data Response Items in Economics.

ERIC Educational Resources Information Center

Trotman-Dickenson, D. I.

1989-01-01

Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
The Structure of the Narcissistic Personality Inventory With Binary and Rating Scale Items.

PubMed

Boldero, Jennifer M; Bell, Richard C; Davies, Richard C

2015-01-01

Narcissistic Personality Inventory (NPI) items typically have a forced-choice format, comprising a narcissistic and a nonnarcissistic statement. Recently, some have presented the narcissistic statements and asked individuals to either indicate whether they agree or disagree that the statements are self-descriptive (i.e., a binary response format) or to rate the extent to which they agree or disagree that these statements are self-descriptive on a Likert scale (i.e., a rating response format). The current research demonstrates that when NPI items have a binary or a rating response format, the scale has a bifactor structure (i.e., the items load on a general factor and on 6 specific group factors). Indexes of factor strength suggest that the data are unidimensional enough for the NPI's general factor to be considered a measure of a narcissism latent trait. However, the rating item general factor assessed more narcissism components than the binary item one. The positive correlations of the NPI's general factor, assessed when items have a rating response format, were moderate with self-esteem, strong with a measure of narcissistic grandiosity, and weak with 2 measures of narcissistic vulnerability. Together, the results suggest that using a rating format for items enhances the information provided by the NPI.

Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

ERIC Educational Resources Information Center

Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

2018-01-01

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
The Effect of Position and Format on the Difficulty of Assessment Exercises.

ERIC Educational Resources Information Center

Burton, Nancy W.; And Others

Assessment exercises (items) in three different formats--multiple-choice with an "I don't know" (IDK) option, multiple-choice without the IDK, and open-ended--were placed at the beginning, middle and end of 45-minute assessment packages (instruments). A balanced incomplete blocks analysis of variance was computed to determine the biasing…
Enhancing self-report assessment of PTSD: development of an item bank.

PubMed

Del Vecchio, Nicole; Elwy, A Rani; Smith, Eric; Bottonari, Kathryn A; Eisen, Susan V

2011-04-01

The authors report results of work to enhance self-report posttraumatic stress disorder (PTSD) assessment by developing an item bank for use in a computer-adapted test. Computer-adapted tests have great potential to decrease the burden of PTSD assessment and outcomes monitoring. The authors conducted a systematic literature review of PTSD instruments, created a database of items, performed qualitative review and readability analysis, and conducted cognitive interviews with veterans diagnosed with PTSD. The systematic review yielded 480 studies in which 41 PTSD instruments comprising 993 items met inclusion criteria. The final PTSD item bank includes 104 items representing each of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV; American Psychiatric Association [APA], 1994), PTSD symptom clusters (reexperiencing, avoidance, and hyperarousal), and 3 additional subdomains (depersonalization, guilt, and sexual problems) that expanded the assessment item pool. Copyright © 2011 International Society for Traumatic Stress Studies.
Primary Science Assessment Item Setters' Misconceptions Concerning the State Changes of Water

ERIC Educational Resources Information Center

Boo, Hong Kwen

2006-01-01

Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the questions setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed Central

Feinberg, Richard A.; Clauser, Amanda L.

2016-01-01

ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Influence of Context on Item Parameters in Forced-Choice Personality Assessments

ERIC Educational Resources Information Center

Lin, Yin; Brown, Anna

2017-01-01

A fundamental assumption in computerized adaptive testing is that item parameters are invariant with respect to context--items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the…
Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

NASA Astrophysics Data System (ADS)

Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly

2014-09-01

A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.
Developing an African youth psychosocial assessment: an application of item response theory.

PubMed

Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise

2014-06-01

This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Developing an African youth psychosocial assessment: an application of item response theory

PubMed Central

BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE

2014-01-01

This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
A confirmative clinimetric analysis of the 36-item Family Assessment Device.

PubMed

Timmerby, Nina; Cosci, Fiammetta; Watson, Maggie; Csillag, Claudio; Schmitt, Florence; Steck, Barbara; Bech, Per; Thastum, Mikael

2018-02-07

The Family Assessment Device (FAD) is a 60-item questionnaire widely used to evaluate self-reported family functioning. However, the factor structure as well as the number of items has been questioned. A shorter and more user-friendly version of the original FAD-scale, the 36-item FAD, has therefore previously been proposed, based on findings in a nonclinical population of adults. We aimed in this study to evaluate the brief 36-item version of the FAD in a clinical population. Data from a European multinational study, examining factors associated with levels of family functioning in adult cancer patients' families, were used. Both healthy and ill parents completed the 60-item version FAD. The psychometric analyses conducted were Principal Component Analysis and Mokken-analysis. A total of 564 participants were included. Based on the psychometric analysis we confirmed that the 36-item version of the FAD has robust psychometric properties and can be used in clinical populations. The present analysis confirmed that the 36-item version of the FAD (18 items assessing 'well-being' and 18 items assessing 'dysfunctional' family function) is a brief scale where the summed total score is a valid measure of the dimensions of family functioning. This shorter version of the FAD is, in accordance with the concept of 'measurement-based care', an easy to use scale that could be considered when the aim is to evaluate self-reported family functioning.
Development of the Assessment Items of Debris Flow Using the Delphi Method

NASA Astrophysics Data System (ADS)

Byun, Yosep; Seong, Joohyun; Kim, Mingi; Park, Kyunghan; Yoon, Hyungkoo

2016-04-01

In recent years in Korea, Typhoon and the localized extreme rainfall caused by the abnormal climate has increased. Accordingly, debris flow is becoming one of the most dangerous natural disaster. This study aimed to develop the assessment items which can be used for conducting damage investigation of debris flow. Delphi method was applied to classify the realms of assessment items. As a result, 29 assessment items which can be classified into 6 groups were determined.
Item response theory analysis of Working Alliance Inventory, revised response format, and new Brief Alliance Inventory.

PubMed

Mallinckrodt, Brent; Tekie, Yacob T

2016-11-01

The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
The Effect of the Multiple-Choice Item Format on the Measurement of Knowledge of Language Structure

ERIC Educational Resources Information Center

Currie, Michael; Chiramanee, Thanyapa

2010-01-01

Noting the widespread use of multiple-choice items in tests in English language education in Thailand, this study compared their effect against that of constructed-response items. One hundred and fifty-two university undergraduates took a test of English structure first in constructed-response format, and later in three, stem-equivalent…
The Impact of Settable Test Item Exposure Control Interface Format on Postsecondary Business Student Test Performance

ERIC Educational Resources Information Center

Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.

2005-01-01

The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
Test Item Linguistic Complexity and Assessments for Deaf Students

ERIC Educational Resources Information Center

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64…
A Multilevel Assessment of Differential Item Functioning.

ERIC Educational Resources Information Center

Shen, Linjun

A multilevel approach was proposed for the assessment of differential item functioning and compared with the traditional logistic regression approach. Data from the Comprehensive Osteopathic Medical Licensing Examination for 2,300 freshman osteopathic medical students were analyzed. The multilevel approach used three-level hierarchical generalized…
Assessment of Differential Item Functioning in the Experiences of Discrimination Index

PubMed Central

Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro

2011-01-01

The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
Data Collection Design for Equivalent Groups Equating: Using a Matrix Stratification Framework for Mixed-Format Assessment

ERIC Educational Resources Information Center

Mbella, Kinge Keka

2012-01-01

Mixed-format assessments are increasingly being used in large scale standardized assessments to measure a continuum of skills ranging from basic recall to higher order thinking skills. These assessments are usually comprised of a combination of (a) multiple-choice items which can be efficiently scored, have stable psychometric properties, and…
Item Response Theory and Health Outcomes Measurement in the 21st Century

PubMed Central

Hays, Ron D.; Morales, Leo S.; Reise, Steve P.

2006-01-01

Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088

The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

2016-04-01

The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
Methodology for the development and calibration of the SCI-QOL item banks.

PubMed

Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

2015-05-01

To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary.

PubMed

Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

PubMed Central

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2016-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Recommended core items to assess e-cigarette use in population-based surveys.

PubMed

Pearson, Jennifer L; Hitchman, Sara C; Brose, Leonie S; Bauld, Linda; Glasser, Allison M; Villanti, Andrea C; McNeill, Ann; Abrams, David B; Cohen, Joanna E

2018-05-01

A consistent approach using standardised items to assess e-cigarette use in both youth and adult populations will aid cross-survey and cross-national comparisons of the effect of e-cigarette (and tobacco) policies and improve our understanding of the population health impact of e-cigarette use. Focusing on adult behaviour, we propose a set of e-cigarette use items, discuss their utility and potential adaptation, and highlight e-cigarette constructs that researchers should avoid without further item development. Reliable and valid items will strengthen the emerging science and inform knowledge synthesis for policy-making. Building on informal discussions at a series of international meetings of 65 experts from 15 countries, the authors provide recommendations for assessing e-cigarette use behaviour, relative perceived harm, device type, presence of nicotine, flavours and reasons for use. We recommend items assessing eight core constructs: e-cigarette ever use, frequency of use and former daily use; relative perceived harm; device type; primary flavour preference; presence of nicotine; and primary reason for use. These items should be standardised or minimally adapted for the policy context and target population. Researchers should be prepared to update items as e-cigarette device characteristics change. A minimum set of e-cigarette items is proposed to encourage consensus around items to allow for cross-survey and cross-jurisdictional comparisons of e-cigarette use behaviour. These proposed items are a starting point. We recognise room for continued improvement, and welcome input from e-cigarette users and scientific colleagues. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework.

PubMed

Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A

2006-11-01

To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
The modified Memorial Symptom Assessment Scale Short Form: a modified response format and rational scoring rules.

PubMed

Sharp, J L; Gough, K; Pascoe, M C; Drosdowsky, A; Chang, V T; Schofield, P

2018-07-01

The Memorial Symptom Assessment Scale Short Form (MSAS-SF) is a widely used symptom assessment instrument. Patients who self-complete the MSAS-SF have difficulty following the two-part response format, resulting in incorrectly completed responses. We describe modifications to the response format to improve useability, and rational scoring rules for incorrectly completed items. The modified MSAS-SF was completed by 311 women in our Peer and Nurse support Trial to Assist women in Gynaecological Oncology; the PeNTAGOn study. Descriptive statistics were used to summarise completion of the modified MSAS-SF, and provide symptom statistics before and after applying the rational scoring rules. Spearman's correlations with the Functional Assessment for Cancer Therapy-General (FACT-G) and Hospital Anxiety and Depression Scale (HADS) were assessed. Correct completion of the modified MSAS-SF items ranged from 91.5 to 98.7%. The rational scoring rules increased the percentage of useable responses on average 4% across all symptoms. MSAS-SF item statistics were similar with and without the scoring rules. The pattern of correlations with FACT-G and HADS was compatible with prior research. The modified MSAS-SF was useable for self-completion and responses demonstrated validity. The rational scoring rules can minimise loss of data from incorrectly completed responses. Further investigation is recommended.
Methodology for the development and calibration of the SCI-QOL item banks

PubMed Central

Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David

2015-01-01

Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Item Difficulty Modeling of Paragraph Comprehension Items

ERIC Educational Resources Information Center

Gorin, Joanna S.; Embretson, Susan E.

2006-01-01

Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…
Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

ERIC Educational Resources Information Center

Saß, Steffani; Schütte, Kerstin

2016-01-01

Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…
Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.

ERIC Educational Resources Information Center

Wang, Ning; Lane, Suzanne

This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…
Putting Interoperability to the Test: Building a Large Reusable Assessment Item Bank

ERIC Educational Resources Information Center

Sclater, Niall; MacDonald, Mary

2004-01-01

The COLA project has been developing a large bank of assessment items for units across the Scottish further education curriculum since May 2003. These will be made available to learners mainly via colleges' virtual learning environments (VLEs). Many people have been involved in the development of the COLA assessment item bank to ensure a high…
The Fantastic Four of Mathematics Assessment Items

ERIC Educational Resources Information Center

Greenlees, Jane

2011-01-01

In this article, the author makes reference to four comic book characters to make the point that together they are a formidable team, but on their own they are vulnerable. She examines the four components of mathematics assessment items and the need for implicit instruction within the classroom for student success. Just like the "Fantastic Four"…
Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

ERIC Educational Resources Information Center

Lee, Won-Chan

2010-01-01

In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…
Calibration of an Item Bank for the Assessment of Basque Language Knowledge

ERIC Educational Resources Information Center

Lopez-Cuadrado, Javier; Perez, Tomas A.; Vadillo, Jose A.; Gutierrez, Julian

2010-01-01

The main requisite for a functional computerized adaptive testing system is the need of a calibrated item bank. This text presents the tasks carried out during the calibration of an item bank for assessing knowledge of Basque language. It has been done in terms of the 3-parameter logistic model provided by the item response theory. Besides, this…
Development and community-based validation of eight item banks to assess mental health.

PubMed

Batterham, Philip J; Sunderland, Matthew; Carragher, Natacha; Calear, Alison L

2016-09-30

There is a need for precise but brief screening of mental health problems in a range of settings. The development of item banks to assess depression and anxiety has resulted in new adaptive and static screeners that accurately assess severity of symptoms. However, expansion to a wider array of mental health problems is required. The current study developed item banks for eight mental health problems: social anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder, adult attention-deficit hyperactivity disorder, drug use, psychosis and suicidality. The item banks were calibrated in a population-based Australian adult sample (N=3175) by administering large item pools (45-75 items) and excluding items on the basis of local dependence or measurement non-invariance. Item Response Theory parameters were estimated for each item bank using a two-parameter graded response model. Each bank consisted of 19-47 items, demonstrating excellent fit and precision across a range of -1 to 3 standard deviations from the mean. No previous study has developed such a broad range of mental health item banks. The calibrated item banks will form the basis of a new system of static and adaptive measures to screen for a broad array of mental health problems in the community. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows.

PubMed

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A; Yuan, Christina M

2018-04-01

Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than 'important'. The content validity index was 0.91. Ninety-five percent of positive-point items were easy-medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46-51], κ = 0.68 (95% CI 0.59-0.77), Cronbach's α = 0.84. We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43-45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress.
Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

PubMed

Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

2017-02-01

The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.

PubMed

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-03-01

The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment

PubMed Central

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-01-01

Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861

A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test

ERIC Educational Resources Information Center

Liang, Tie; Wells, Craig S.

2015-01-01

Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…
Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

ERIC Educational Resources Information Center

Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

2012-01-01

Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

PubMed

LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

2015-04-01

Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Development and evaluation of CAHPS survey items assessing how well healthcare providers address health literacy.

PubMed

Weidmer, Beverly A; Brach, Cindy; Hays, Ron D

2012-09-01

The complexity of health information often exceeds patients' skills to understand and use it. To develop survey items assessing how well healthcare providers communicate health information. Domains and items for the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Item Set for Addressing Health Literacy were identified through an environmental scan and input from stakeholders. The draft item set was translated into Spanish and pretested in both English and Spanish. The revised item set was field tested with a randomly selected sample of adult patients from 2 sites using mail and telephonic data collection. Item-scale correlations, confirmatory factor analysis, and internal consistency reliability estimates were estimated to assess how well the survey items performed and identify composite measures. Finally, we regressed the CAHPS global rating of the provider item on the CAHPS core communication composite and the new health literacy composites. A total of 601 completed surveys were obtained (52% response rate). Two composite measures were identified: (1) Communication to Improve Health Literacy (16 items); and (2) How Well Providers Communicate About Medicines (6 items). These 2 composites were significantly uniquely associated with the global rating of the provider (communication to improve health literacy: P<0.001, b=0.28; and communication about medicines composite: P=0.02, b=0.04). The 2 composites and the CAHPS core communication composite accounted for 51% of the variance in the global rating of the provider. A 5-item subset of the Communication to Improve Health Literacy composite accounted for 90% of the variance of the original 16-item composite. This study provides support for reliability and validity of the CAHPS Item Set for Addressing Health Literacy. These items can serve to assess whether healthcare providers have communicated effectively with their patients and as a tool for quality improvement.
Formative Assessment Using Direct Behavior Ratings: Evaluating Intervention Effects of Daily Behavior Report Cards

ERIC Educational Resources Information Center

Sims, Wesley A.; Riley-Tillman, Chris; Cohen, Daniel R.

2017-01-01

This study examined the treatment sensitivity of "Direct Behavior Rating-Single Item Scales" (DBR-SIS) in response to an evidence-based intervention delivered in a single-case, multiple-baseline design. DBR-SIS was used as a formative assessment in conjunction with a frequently used intervention in schools, a Daily Behavior Report Card…
A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

ERIC Educational Resources Information Center

Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

2011-01-01

The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
Using Kernel Equating to Assess Item Order Effects on Test Scores

ERIC Educational Resources Information Center

Moses, Tim; Yang, Wen-Ling; Wilson, Christine

2007-01-01

This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test…
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

ERIC Educational Resources Information Center

Li, Xiaomin; Wang, Wen-Chung

2015-01-01

The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
Combining item response theory with multiple imputation to equate health assessment questionnaires.

PubMed

Gu, Chenyang; Gutman, Roee

2017-09-01

The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.
The Psychometric Structure of Items Assessing Autogynephilia.

PubMed

Hsu, Kevin J; Rosenthal, A M; Bailey, J Michael

2015-07-01

Autogynephilia, or paraphilic sexual arousal in a man to the thought or image of himself as a woman, manifests in a variety of different behaviors and fantasies. We examined the psychometric structure of 22 items assessing five known types of autogynephilia by subjecting them to exploratory factor analysis in a sample of 149 autogynephilic men. Results of oblique factor analyses supported the ability to distinguish five group factors with suitable items. Results of hierarchical factor analyses suggest that the five group factors were strongly underlain by a general factor of autogynephilia. Because the general factor accounted for a much greater amount of the total variance of the 22 items than did the group factors, the types of autogynephilia that a man has seem less important than the degree to which he has autogynephilia. However, the five types of autogynephilia remain conceptually useful because meaningful distinctions were found among them, including differential rates of endorsement and differential ability to predict other relevant variables like gender dysphoria. Factor-derived scales and subscales demonstrated good internal consistency reliabilities, and validity, with large differences found between autogynephilic men and heterosexual male controls. Future research should attempt to replicate our findings, which were mostly exploratory.
Using Out-of-Level Items in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Wei, Hua; Lin, Jie

2015-01-01

Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate…
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

ERIC Educational Resources Information Center

Alsadaawi, Abdullah Saleh

2017-01-01

The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Factoring handedness data: I. Item analysis.

PubMed

Messinger, H B; Messinger, M I

1995-12-01

Recently in this journal Peters and Murphy challenged the validity of factor analyses done on bimodal handedness data, suggesting instead that right- and left-handers be studied separately. But bimodality may be avoidable if attention is paid to Oldfield's questionnaire format and instructions for the subjects. Two characteristics appear crucial: a two-column LEFT-RIGHT format for the body of the instrument and what we call Oldfield's Admonition: not to indicate strong preference for handedness item, such as write, unless "... the preference is so strong that you would never try to use the other hand unless absolutely forced to...". Attaining unimodality of an item distribution would seem to overcome the objections of Peters and Murphy. In a 1984 survey in Boston we used Oldfield's ten-item questionnaire exactly as published. This produced unimodal item distributions. With reflection of the five-point item scale and a logarithmic transformation, we achieved a degree of normalization for the items. Two surveys elsewhere based on Oldfield's 20-item list but with changes in the questionnaire format and the instructions, yielded markedly different item distributions with peaks at each extreme and sometimes in the middle as well.
Formative and Summative Assessment in Veterinary Pathology and Other Courses at a Mexican Veterinary College.

PubMed

Valero, Germán; Cárdenas, Paula

The Faculty of Veterinary Medicine and Animal Science of the National Autonomous University of Mexico (UNAM) uses the Moodle learning management system for formative and summative computer assessment. The authors of this article-the teacher primarily responsible for Moodle implementation and a researcher who is a recent Moodle adopter-describe and discuss the students' and teachers' attitudes to summative and formative computer assessment in Moodle. Item analysis of quiz results helped us to identify and fix poorly performing questions, which greatly reduced student complaints and improved objective assessment. The use of certainty-based marking (CBM) in formative assessment in veterinary pathology was well received by the students and should be extended to more courses. The importance of having proficient computer support personnel should not be underestimated. A properly translated language pack is essential for the use of Moodle in a language other than English.
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows

PubMed Central

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A

2018-01-01

Abstract Background Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. Methods The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than ‘important’. The content validity index was 0.91. Ninety-five percent of positive-point items were easy–medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46–51], κ = 0.68 (95% CI 0.59–0.77), Cronbach’s α = 0.84. Results We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43–45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. Conclusions The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress. PMID:29644053
Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

ERIC Educational Resources Information Center

Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

2012-01-01

We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…
Formative Assessment Probes: Is It a Rock? Continuous Formative Assessment

ERIC Educational Resources Information Center

Keeley, Page

2013-01-01

A lesson plan is provided for a formative assessment probe entitled "Is It a Rock?" This probe is designed for teaching elementary school students about rocks through the use of a formative assessment classroom technique (FACT) known as the group Frayer Model. FACT activates students' thinking about a concept and can be used to…
Better Formative Assessment

ERIC Educational Resources Information Center

Clinchot, Michael; Ngai, Courtney; Huie, Robert; Talanquer, Vicente; Lambertz, Jennifer; Banks, Gregory; Weinrich, Melissa; Lewis, Rebecca; Pelletier, Pamela; Sevian, Hannah

2017-01-01

Formative assessment has been defined as the process "to recognize and respond to student learning to enhance that learning during the learning." Formative assessment helps teachers identify strengths and weaknesses in their students' understanding, focuses students' attention on relevant information and ideas, and provides scaffolds…
Assessment of the Item Selection and Weighting in the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis

PubMed Central

MAHR, ALFRED D.; NEOGI, TUHINA; LAVALLEY, MICHAEL P.; DAVIS, JOHN C.; HOFFMAN, GARY S.; MCCUNE, W. JOSEPH; SPECKS, ULRICH; SPIERA, ROBERT F.; ST.CLAIR, E. WILLIAM; STONE, JOHN H.; MERKEL, PETER A.

2013-01-01

Objective To assess the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis (BVAS/WG) with respect to its selection and weighting of items. Methods This study used the BVAS/WG data from the Wegener's Granulomatosis Etanercept Trial. The scoring frequencies of the 34 predefined items and any “other” items added by clinicians were calculated. Using linear regression with generalized estimating equations in which the physician global assessment (PGA) of disease activity was the dependent variable, we computed weights for all predefined items. We also created variables for clinical manifestations frequently added as other items, and computed weights for these as well. We searched for the model that included the items and their generated weights yielding an activity score with the highest R2 to predict the PGA. Results We analyzed 2,044 BVAS/WG assessments from 180 patients; 734 assessments were scored during active disease. The highest R2 with the PGA was obtained by scoring WG activity based on the following items: the 25 predefined items rated on ≥5 visits, the 2 newly created fatigue and weight loss variables, the remaining minor other and major other items, and a variable that signified whether new or worse items were present at a specific visit. The weights assigned to the items ranged from 1 to 21. Compared with the original BVAS/WG, this modified score correlated significantly more strongly with the PGA. Conclusion This study suggests possibilities to enhance the item selection and weighting of the BVAS/WG. These changes may increase this instrument's ability to capture the continuum of disease activity in WG. PMID:18512722
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

2017-01-01

Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…

Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Formative Assessment: Simply, No Additives

ERIC Educational Resources Information Center

Roskos, Kathleen; Neuman, Susan B.

2012-01-01

Among the types of assessment the closest to daily reading instruction is formative assessment. In contrast to summative assessment, which occurs after instruction, formative assessment involves forming judgments frequently in the flow of instruction. Key features of formative assessment include identifying gaps between where students are and…
Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

PubMed Central

Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

2017-01-01

We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182
Distinctions between Item Format and Objectivity in Scoring.

ERIC Educational Resources Information Center

Terwilliger, James S.

This paper clarifies important distinctions in item writing and item scoring and considers the implications of these distinctions for developing guidelines related to test construction for training teachers. The terminology used to describe and classify paper and pencil test questions frequently confuses two distinct features of questions:…
The value of item response theory in clinical assessment: a review.

PubMed

Thomas, Michael L

2011-09-01

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical assessment are reviewed to appraise its current and potential value. Benefits of IRT include comprehensive analyses and reduction of measurement error, creation of computer adaptive tests, meaningful scaling of latent variables, objective calibration and equating, evaluation of test and item bias, greater accuracy in the assessment of change due to therapeutic intervention, and evaluation of model and person fit. The theory may soon reinvent the manner in which tests are selected, developed, and scored. Although challenges remain to the widespread implementation of IRT, its application to clinical assessment holds great promise. Recommendations for research, test development, and clinical practice are provided.
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

ERIC Educational Resources Information Center

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…
Assessing Dimensionality of Noncompensatory Multidimensional Item Response Theory with Complex Structures

ERIC Educational Resources Information Center

Svetina, Dubravka

2013-01-01

The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…
Identifying items to assess methodological quality in physical therapy trials: a factor analysis.

PubMed

Armijo-Olivo, Susan; Cummings, Greta G; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd

2014-09-01

Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). A methodological research design was used, and an EFA was performed. Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor analysis of these results are needed to validate these items. �
Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

ERIC Educational Resources Information Center

Smith, Clifton L.; And Others

This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…
Teacher Learning of Technology Enhanced Formative Assessment

NASA Astrophysics Data System (ADS)

Feldman, Allan; Capobianco, Brenda M.

2008-02-01

This study examined the integration of technology enhanced formative assessment (FA) into teachers' practice. Participants were high school physics teachers interested in improving their use of a classroom response system (CRS) to promote FA. Data were collected using interviews, direct classroom observations, and collaborative discussions. The physics teachers engaged in collaborative action research (AR) to learn how to use FA and CRS to promote student and teacher learning. Data were analyzed using open coding, cross-case analysis, and content analysis. Results from data analysis allowed researchers to construct a model for knowledge skills necessary for the integration of technology enhanced FA into teachers' practice. The model is as a set of four technologies: hardware and software; methods for constructing FA items; pedagogical methods; and curriculum integration. The model is grounded in the idea that teachers must develop these respective technologies as they interact with the CRS (i.e., hardware and software, item construction) and their existing practice (i.e., pedagogical methods, curriculum). Implications are that for teachers to make FA an integral part of their practice using CRS, they must: 1) engage in the four technologies; 2) understand the nature of FA; and 3) collaborate with other interested teachers through AR.
A Comparison of Three Test Formats to Assess Word Difficulty

ERIC Educational Resources Information Center

Culligan, Brent

2015-01-01

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
IRT Item Parameter Scaling for Developing New Item Pools

ERIC Educational Resources Information Center

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua

2017-01-01

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…
A Framework for Dimensionality Assessment for Multidimensional Item Response Models

ERIC Educational Resources Information Center

Svetina, Dubravka; Levy, Roy

2014-01-01

A framework is introduced for considering dimensionality assessment procedures for multidimensional item response models. The framework characterizes procedures in terms of their confirmatory or exploratory approach, parametric or nonparametric assumptions, and applicability to dichotomous, polytomous, and missing data. Popular and emerging…
Goodness-of-Fit Assessment of Item Response Theory Models

ERIC Educational Resources Information Center

Maydeu-Olivares, Alberto

2013-01-01

The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

PubMed

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Gender-Based Differential Item Performance in Mathematics Achievement Items.

ERIC Educational Resources Information Center

Doolittle, Allen E.; Cleary, T. Anne

1987-01-01

Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

PubMed

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Developing and evaluating innovative items for the NCLEX: Part 2, item characteristics and cognitive processing.

PubMed

Wendt, Anne; Harmes, J Christine

2009-01-01

This article is a continuation of the research on the development and evaluation of innovative item formats for the NCLEX examinations that was published in the March/April 2009 edition of Nurse Educator. The authors discuss the innovative item templates and evaluate the statistical characteristics and level of cognitive processing required to answer the examination items.

The Consumer Assessment of Healthcare Providers and Systems (CAHPS) cultural competence (CC) item set.

PubMed

Weech-Maldonado, Robert; Carle, Adam; Weidmer, Beverly; Hurtado, Margarita; Ngo-Metzger, Quyen; Hays, Ron D

2012-09-01

There is a need for reliable and valid measures of cultural competence (CC) from the patient's perspective. This paper evaluates the reliability and validity of the Consumer Assessments of Healthcare Providers and Systems (CAHPS) CC item set. Using 2008 survey data, we assessed the internal consistency of the CAHPS CC scales using the Cronbach α's and examined the validity of the measures using exploratory and confirmatory factor analysis, multitrait scaling analysis, and regression analysis. A random stratified sample (based on race/ethnicity and language) of 991 enrollees, younger than 65 years, from 2 Medicaid managed care plans in California and New York. CAHPS CC item set after excluding screener items and ratings. Confirmatory factor analysis (Comparative Fit Index=0.98, Tucker Lewis Index=0.98, and Root Mean Square Error or Approximation=0.06) provided support for a 7-factor structure: Doctor Communication--Positive Behaviors, Doctor Communication--Negative Behaviors, Doctor Communication--Health Promotion, Doctor Communication--Alternative Medicine, Shared Decision-Making, Equitable Treatment, and Trust. Item-total correlations (corrected for item overlap) for the 7 scales exceeded 0.40. Exploratory factor analysis showed support for 1 additional factor: Access to Interpreter Services. Internal consistency reliability estimates ranged from 0.58 (Alternative Medicine) to 0.92 (Positive Behaviors) and was 0.70 or higher for 4 of the 8 composites. All composites were positively and significantly associated with the overall doctor rating. The CAHPS CC 26-item set demonstrates adequate measurement properties and can be used as a supplemental item set to the CAHPS Clinician and Group Surveys in assessing culturally competent care from the patient's perspective.
Identifying Items to Assess Methodological Quality in Physical Therapy Trials: A Factor Analysis

PubMed Central

Cummings, Greta G.; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd

2014-01-01

Background Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. Objective The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). Design A methodological research design was used, and an EFA was performed. Methods Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Results Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Limitation Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. Conclusions To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor
Assessment of the quality and applicability of an e-portfolio capstone assessment item within a bachelor of midwifery program.

PubMed

Baird, Kathleen; Gamble, Jenny; Sidebotham, Mary

2016-09-01

Education programs leading to professional licencing need to ensure assessments throughout the program are constructively aligned and mapped to the specific professional expectations. Within the final year of an undergraduate degree, a student is required to transform and prepare for professional practice. Establishing assessment items that are authentic and able to reflect this transformation is a challenge for universities. This paper both describes the considerations around the design of a capstone assessment and evaluates, from an academics perspective, the quality and applicability of an e-portfolio as a capstone assessment item for undergraduate courses leading to a professional qualification. The e-portfolio was seen to meet nine quality indicators for assessment. Academics evaluated the e-portfolio as an authentic assessment item that would engage the students and provide them with a platform for ongoing professional development and lifelong learning. The processes of reflection on strengths, weaknesses, opportunities and threats, comparison of clinical experiences with national statistics, preparation of professional philosophy and development of a curriculum vitae, whilst recognised as comprehensive and challenging were seen as highly valuable to the student transforming into the profession. Copyright © 2016 Elsevier Ltd. All rights reserved.
Formats for Assessing Students' Self-Assessment Abilities.

ERIC Educational Resources Information Center

Miller, Maurice; Turner, Tamrah

The paper examines some self-assessment techniques used with handicapped students and discusses the advantages and disadvantages of these techniques. The use of self-rating scales is reviewed, and questionable results are cited. Another method, in which students view an item and estimate whether they can perform it before attempting it…
Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Wyse, Adam E.; Albano, Anthony D.

2015-01-01

This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…
Assessing the Validity of a Single-Item HIV Risk Stage-of-Change Measure

ERIC Educational Resources Information Center

Napper, Lucy E.; Branson, Catherine M.; Fisher, Dennis G.; Reynolds, Grace L.; Wood, Michelle M.

2008-01-01

This study examined the validity of a single-item measure of HIV risk stage of change that HIV prevention contractors were required to collect by the California State Office of AIDS. The single-item measure was compared to the more conventional University of Rhode Island Change Assessment (URICA). Participants were members of Los Angeles…
Gender differences in national assessment of educational progress science items: What does i don't know really mean?

NASA Astrophysics Data System (ADS)

Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
Gender-Related Differential Item Functioning on a Middle-School Mathematics Performance Assessment.

ERIC Educational Resources Information Center

Lane, Suzanne; And Others

This study examined gender-related differential item functioning (DIF) using a mathematics performance assessment, the QUASAR Cognitive Assessment Instrument (QCAI), administered to middle school students. The QCAI was developed for the Quantitative Understanding: Amplifying Student Achievement and Reading (QUASAR) project, which focuses on…
34 CFR 200.8 - Assessment reports.

Code of Federal Regulations, 2013 CFR

2013-07-01

... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.

Code of Federal Regulations, 2014 CFR

2014-07-01

... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.

Code of Federal Regulations, 2012 CFR

2012-07-01

... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.

Code of Federal Regulations, 2010 CFR

2010-07-01

... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.

Code of Federal Regulations, 2011 CFR

2011-07-01

... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
Modeling Composite Assessment Data Using Item Response Theory

PubMed Central

Ueckert, Sebastian

2018-01-01

Composite assessments aim to combine different aspects of a disease in a single score and are utilized in a variety of therapeutic areas. The data arising from these evaluations are inherently discrete with distinct statistical properties. This tutorial presents the framework of the item response theory (IRT) for the analysis of this data type in a pharmacometric context. The article considers both conceptual (terms and assumptions) and practical questions (modeling software, data requirements, and model building). PMID:29493119
Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.

ERIC Educational Resources Information Center

Smith, Clifton L.; And Others

This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…
Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

PubMed

Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

2015-03-01

The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. Copyright © 2014 John Wiley & Sons, Ltd.
Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

ERIC Educational Resources Information Center

Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

2017-01-01

We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…
Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

ERIC Educational Resources Information Center

Parish, Jane A.; Karisch, Brandi B.

2013-01-01

Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
Development and validation of a ten-item questionnaire with explanatory illustrations to assess upper extremity disorders: favorable effect of illustrations in the item reduction process.

PubMed

Kurimoto, Shigeru; Suzuki, Mikako; Yamamoto, Michiro; Okui, Nobuyuki; Imaeda, Toshihiko; Hirata, Hitoshi

2011-11-01

The purpose of this study is to develop a short and valid measure for upper extremity disorders and to assess the effect of attached illustrations in item reduction of a self-administered disability questionnaire while retaining psychometric properties. A validated questionnaire used to assess upper extremity disorders, the Hand20, was reduced to ten items using two item-reduction techniques. The psychometric properties of the abbreviated form, the Hand10, were evaluated on an independent sample that was used for the shortening process. Validity, reliability, and responsiveness of the Hand10 were retained in the item reduction process. It was possible that the use of explanatory illustrations attached to the Hand10 helped with its reproducibility. The illustrations for the Hand10 promoted text comprehension and motivation to answer the items. These changes resulted in high acceptability; more than 99.3% of patients, including 98.5% of elderly patients, could complete the Hand10 properly. The illustrations had favorable effects on the item reduction process and made it possible to retain precision of the instrument. The Hand10 is a reliable and valid instrument for individual-level applications with the advantage of being compact and broadly applicable, even in elderly individuals.
Weighted Association Rule Mining for Item Groups with Different Properties and Risk Assessment for Networked Systems

NASA Astrophysics Data System (ADS)

Kim, Jungja; Ceong, Heetaek; Won, Yonggwan

In market-basket analysis, weighted association rule (WAR) discovery can mine the rules that include more beneficial information by reflecting item importance for special products. In the point-of-sale database, each transaction is composed of items with similar properties, and item weights are pre-defined and fixed by a factor such as the profit. However, when items are divided into more than one group and the item importance must be measured independently for each group, traditional weighted association rule discovery cannot be used. To solve this problem, we propose a new weighted association rule mining methodology. The items should be first divided into subgroups according to their properties, and the item importance, i.e. item weight, is defined or calculated only with the items included in the subgroup. Then, transaction weight is measured by appropriately summing the item weights from each subgroup, and the weighted support is computed as the fraction of the transaction weights that contains the candidate items relative to the weight of all transactions. As an example, our proposed methodology is applied to assess the vulnerability to threats of computer systems that provide networked services. Our algorithm provides both quantitative risk-level values and qualitative risk rules for the security assessment of networked computer systems using WAR discovery. Also, it can be widely used for new applications with many data sets in which the data items are distinctly separated.

Item and Testlet Position Effects in Computer-Based Alternate Assessments for Students with Disabilities

ERIC Educational Resources Information Center

Bulut, Okan; Lei, Ming; Guo, Qi

2018-01-01

Item positions in educational assessments are often randomized across students to prevent cheating. However, if altering item positions results in any significant impact on students' performance, it may threaten the validity of test scores. Two widely used approaches for detecting position effects -- logistic regression and hierarchical…
Evaluting the Validity of Technology-Enhanced Educational Assessment Items and Tasks: An Emprical Approach to Studying Item Features and Scoring Rubrics

ERIC Educational Resources Information Center

Thomas, Ally

2016-01-01

With the advent of the newly developed Common Core State Standards and the Next Generation Science Standards, innovative assessments, including technology-enhanced items and tasks, will be needed to meet the challenges of developing valid and reliable assessments in a world of computer-based testing. In a recent critique of the next generation…
Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

PubMed

Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

2015-08-19

Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these
Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments

ERIC Educational Resources Information Center

Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D.

2012-01-01

In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments.

PubMed

Tarrant, Marie; Knierim, Aimee; Hayes, Sasha K; Ware, James

2006-12-01

Multiple-choice questions are a common assessment method in nursing examinations. Few nurse educators, however, have formal preparation in constructing multiple-choice questions. Consequently, questions used in baccalaureate nursing assessments often contain item-writing flaws, or violations to accepted item-writing guidelines. In one nursing department, 2770 MCQs were collected from tests and examinations administered over a five-year period from 2001 to 2005. Questions were evaluated for 19 frequently occurring item-writing flaws, for cognitive level, for question source, and for the distribution of correct answers. Results show that almost half (46.2%) of the questions contained violations of item-writing guidelines and over 90% were written at low cognitive levels. Only a small proportion of questions were teacher generated (14.1%), while 36.2% were taken from testbanks and almost half (49.4%) had no source identified. MCQs written at a lower cognitive level were significantly more likely to contain item-writing flaws. While there was no relationship between the source of the question and item-writing flaws, teacher-generated questions were more likely to be written at higher cognitive levels (p<0.001). Correct answers were evenly distributed across all four options and no bias was noted in the placement of correct options. Further training in item-writing is recommended for all faculty members who are responsible for developing tests. Pre-test review and quality assessment is also recommended to reduce the occurrence of item-writing flaws and to improve the quality of test questions.
Informed and Uninformed Naïve Assessment Constructors' Strategies for Item Selection

ERIC Educational Resources Information Center

Fives, Helenrose; Barnes, Nicole

2017-01-01

We present a descriptive analysis of 53 naïve assessment constructors' explanations for selecting test items to include on a summative assessment. We randomly assigned participants to an informed and uninformed condition (i.e., informed participants read an article describing a Table of Specifications). Through recursive thematic analyses of…
Item response theory in personality assessment: a demonstration using the MMPI-2 depression scale.

PubMed

Childs, R A; Dahlstrom, W G; Kemp, S M; Panter, A T

2000-03-01

Item response theory (IRT) analyses have, over the past 3 decades, added much to our understanding of the relationships among and characteristics of test items, as revealed in examinees response patterns. Assessment instruments used outside the educational context have only infrequently been analyzed using IRT, however. This study demonstrates the relevance of IRT to personality data through analyses of Scale 2 (the Depression Scale) on the revised Minnesota Multiphasic Personality Inventory (MMPI-2). A rich set of hypotheses regarding the items on this scale, including contrasts among the Harris-Lingoes and Wiener-Harmon subscales and differences in the items measurement characteristics for men and women, are investigated through the IRT analyses.
Measuring everyday functional competence using the Rasch assessment of everyday activity limitations (REAL) item bank.

PubMed

Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J

2017-11-01

Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
Are reflective models appropriate for very short scales? Proofs of concept of formative models using the Ten-Item Personality Inventory.

PubMed

Myszkowski, Nils; Storme, Martin; Tavani, Jean-Louis

2018-04-27

Because of their length and objective of broad content coverage, very short scales can show limited internal consistency and structural validity. We argue that it is because their objectives may be better aligned with formative investigations than with reflective measurement methods that capitalize on content overlap. As proofs of concept of formative investigations of short scales, we investigate the Ten Item Personality Inventory (TIPI). In Study 1, we administered the TIPI and the Big Five Inventory (BFI) to 938 adults, and fitted a formative Multiple Indicator Multiple Causes model, which consisted of the TIPI items forming 5 latent variables, which in turn predicted the 5 BFI scores. These results were replicated in Study 2, on a sample of 759 adults, with, this time, the Revised NEO Personality Inventory (NEO-PI-R) as the external criterion. The models fit the data adequately, and moderate to strong significant effects (.37<|β|<.69, all p<.001) of all 5 latent formative variables on their corresponding BFI and NEOPI-R scores were observed. This study presents a formative approach that we propose to be more consistent with the aims of scales with broad content and short length like the TIPI. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment

ERIC Educational Resources Information Center

Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas

2015-01-01

The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…
Formative assessment: a student perspective.

PubMed

Hill, D A; Guinea, A I; McCarthy, W H

1994-09-01

An educator's view would be that formative assessment has an important role in the learning process. This study was carried out to obtain a student perspective of the place of formative assessment in the curriculum. Final-year medical students at Royal Prince Alfred Hospital took part in four teaching sessions, each structured to integrate teaching with assessment. Three assessment methods were used; the group objective structured clinical examination (G-OSCE), structured short answer (SSA) questions and a pre/post-test multiple choice questionnaire (MCQ). Teaching sessions were conducted on the subject areas of traumatology, the 'acute abdomen', arterial disorders and cancer. Fifty-five students, representing 83% of those who took part in the programme, responded to a questionnaire where they were asked to rate (on a 5-point Likert scale) their response to general questions about formative assessment and 13 specific questions concerning the comparative value of the three assessment modalities. Eighty-nine per cent of respondents felt that formative assessment should be incorporated into the teaching process. The SSA assessment was regarded as the preferred modality to reinforce previous teaching and test problem-solving skills. The MCQ was the least favoured assessment method. The effect size variable between the total scores for the SSA and MCQ was 0.64. The variable between G-OSCE and SSA/MCQ was 0.26 and 0.33 respectively. Formative assessment is a potentially powerful method to direct learning behaviour. Students should have input into the methods used.
Successful Student Writing through Formative Assessment

ERIC Educational Resources Information Center

Tuttle, Harry Grover

2010-01-01

Use formative assessment to dramatically improve your students' writing. In "Successful Student Writing Through Formative Assessment", educator and international speaker Harry G. Tuttle shows you how to guide middle and high school students through the prewriting, writing, and revision processes using formative assessment techniques that work.…
TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

ERIC Educational Resources Information Center

Brese, Falk, Ed.

2012-01-01

The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
Differential Item Functioning by Gender on a Large-Scale Science Performance Assessment: A Comparison across Grade Levels.

ERIC Educational Resources Information Center

Holweger, Nancy; Taylor, Grace

The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…
Differential Item Functioning Analysis Using Rasch Item Information Functions

ERIC Educational Resources Information Center

Wyse, Adam E.; Mapuranga, Raymond

2009-01-01

Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
An Investigation of Item Fit Statistics for Mixed IRT Models

ERIC Educational Resources Information Center

Chon, Kyong Hee

2009-01-01

The purpose of this study was to investigate procedures for assessing model fit of IRT models for mixed format data. In this study, various IRT model combinations were fitted to data containing both dichotomous and polytomous item responses, and the suitability of the chosen model mixtures was evaluated based on a number of model fit procedures.…
A Comparison of Item Fit Statistics for Mixed IRT Models

ERIC Educational Resources Information Center

Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B.

2010-01-01

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…
Formative Assessment: Responding to Your Students

ERIC Educational Resources Information Center

Tuttle, Harry Grover

2009-01-01

This "how-to" book on formative assessment is filled with practical suggestions for teachers who want to use formative assessment in their classrooms. With practical strategies, tools, and examples for teachers of all subjects and grade levels, this book shows you how to use formative assessment to promote successful student learning. Topics…
Development and Calibration of an Item Bank for PE Metrics Assessments: Standard 1

ERIC Educational Resources Information Center

Zhu, Weimo; Fox, Connie; Park, Youngsik; Fisette, Jennifer L.; Dyson, Ben; Graber, Kim C.; Avery, Marybell; Franck, Marian; Placek, Judith H.; Rink, Judy; Raynes, De

2011-01-01

The purpose of this study was to develop and calibrate an assessment system, or bank, using the latest measurement theories and methods to promote valid and reliable student assessment in physical education. Using an anchor-test equating design, a total of 30 items or assessments were administered to 5,021 (2,568 boys and 2,453 girls) students in…
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

Science Teachers' Use of a Concept Map Marking Guide as a Formative Assessment Tool for the Concept of Energy

ERIC Educational Resources Information Center

Won, Mihye; Krabbe, Heiko; Ley, Siv Ling; Treagust, David F.; Fischer, Hans E.

2017-01-01

In this study, we investigated the value of a concept map marking guide as an alternative formative assessment tool for science teachers to adopt for the topic of energy. Eight high school science teachers marked students' concept maps using an itemized holistic marking guide. Their marking was compared with the researchers' marking and the scores…
Exploring Formative Assessment as a Tool for Learning: Students' Experiences of Different Methods of Formative Assessment

ERIC Educational Resources Information Center

Weurlander, Maria; Soderberg, Magnus; Scheja, Max; Hult, Hakan; Wernerson, Annika

2012-01-01

This study aims to provide a greater insight into how formative assessments are experienced and understood by students. Two different formative assessment methods, an individual, written assessment and an oral group assessment, were components of a pathology course within a medical curriculum. In a cohort of 70 students, written accounts were…
Development of Rasch-based item banks for the assessment of work performance in patients with musculoskeletal diseases.

PubMed

Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A

2013-12-01

This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.
The Value of Item Response Theory in Clinical Assessment: A Review

ERIC Educational Resources Information Center

Thomas, Michael L.

2011-01-01

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…
[Diagnostic and formative assessment of competencies at the beginning of undergraduate medical internship].

PubMed

Martínez-González, Adrián; Lifshitz-Guinzberg, Alberto; Trejo-Mejía, Juan Andrés; Torruco-García, Uri; Fortoul-van der Goes, Teresa I; Flores-Hernández, Fernando; Peña-Balderas, Jorge; Martínez-Franco, Adrián Israel; Hernández-Nava, Alejandro; Elena-González, Diana; Sánchez-Mendiola, Melchor

2017-01-01

Research on diagnostic and formative assessment competencies during undergraduate medical training is scarce in Latin America. To assess the level of clinical competence of students at the beginning of their medical internship in a new curriculum. This was an observational cross-sectional study in UNAM Faculty of Medicine students in Mexico City: a formative assessment of the second class of Curriculum 2010 students as part of the integral evaluation of the program. The assessment had two components: theoretical and practical. We assessed 577 students (65.5%) of the 880 total population that finished the 9th semester of Curriculum 2010. The written exam consisted of 232 items, with a mean of 61.0 ± 19.6, a difficulty index of 0.61, and Cronbach's alpha of 0.89. The mean of the objective structured clinical examination (OSCE) was 62.2 ± 16.8, with a mean Cronbach's alpha of 0.51. Results were analyzed by knowledge area and exam stations. The overall results provide evidence that students achieve sufficiently the competencies established in the curriculum at the beginning of the internship, that they have the necessary foundation for learning new and more complex information, and integrate it with existing knowledge to achieve significant learning and continue their training.
Formative Assessment: Assessment Is for Self-Regulated Learning

ERIC Educational Resources Information Center

Clark, Ian

2012-01-01

The article draws from 199 sources on assessment, learning, and motivation to present a detailed decomposition of the values, theories, and goals of formative assessment. This article will discuss the extent to which formative feedback actualizes and reinforces self-regulated learning (SRL) strategies among students. Theoreticians agree that SRL…
What Form of Mathematics Are Assessments Assessing? The Case of Multiplication and Division in Fourth Grade NAEP Items

ERIC Educational Resources Information Center

Kosko Karl W.; Singh, Rashmi

2018-01-01

Multiplicative reasoning is a key concept in elementary school mathematics. Item statistics reported by the National Assessment of Educational Progress (NAEP) assessment provide the best current indicator for how well elementary students across the U.S. understand this, and other concepts. However, beyond expert reviews and statistical analysis,…
Assessing Construct Validity Using Multidimensional Item Response Theory.

ERIC Educational Resources Information Center

Ackerman, Terry A.

The concept of a user-specified validity sector is discussed. The idea of the validity sector combines the work of M. D. Reckase (1986) and R. Shealy and W. Stout (1991). Reckase developed a methodology to represent an item in a multidimensional latent space as a vector. Item vectors are computed using multidimensional item response theory item…
An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2016-01-01

This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Assessing Unidimensionality and Differential Item Functioning in Qualifying Examination for Senior Secondary School Students, Osun State, Nigeria

ERIC Educational Resources Information Center

Ajeigbe, Taiwo Oluwafemi; Afolabi, Eyitayo Rufus Ifedayo

2017-01-01

This study assessed unidimensionality and occurrence of Differential Item Functioning (DIF) in Mathematics and English Language items of Osun State Qualifying Examination. The study made use of secondary data. The results showed that OSQ Mathematics (-0.094 = r = 0.236) and English Language items (-0.095 = r = 0.228) were unidimensional. Also,…
Evolution of a Test Item

ERIC Educational Resources Information Center

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Formative Assessment: Policy, Perspectives and Practice

ERIC Educational Resources Information Center

Clark, Ian

2011-01-01

Proponents of formative assessment (FA) assert that students develop a deeper understanding of their learning when the essential components of formative feedback and cultural responsiveness are effectively incorporated as central features of the formative assessment process. Even with growing international agreement among the research community…
[Assessment of criminal responsibility in paraphilic disorder. Can the severity of the disorder be assessed with items of standardized prognostic instruments?].

PubMed

Briken, P; Müller, J L

2014-03-01

Assessment of the severity of paraphilic disorders is an important aspect of psychiatric court reports for assessing criminal responsibility and placement in a forensic psychiatric hospital according to the German penal code (§§ 20, 21, 63 StGB). The minimum requirements for appraisal of criminal responsibility published by an interdisciplinary working group under the guidance of the German Federal Court of Justice define the standards for this procedure. This paper presents a research concept that aims to assess the severity of paraphilic disorders by using items of standardized prognostic instruments. In addition to a formal diagnosis according to the international classification of diseases (ICD) and the diagnostic and statistical manual of mental diseases (DSM) criteria, the items "deviant sexual interests" and "sexual preoccupations" from the prognosis instrument Stable 2007 are used to assess the severity of paraphilic disorders. Other criteria, such as "relationship deficits" are used to support the appraisal of the severity of the disorder. The items "sexual preoccupation", "emotional collapse" and "collapse of social support" from the prognosis instrument Acute 2007 are used to assess the capacity for self-control. In a next step the validity and reliability of this concept will be tested.
The Effects of Formative Assessment with Detailed Feedback on Students' Science Learning Achievement and Attitudes Regarding Formative Assessment.

ERIC Educational Resources Information Center

Choi, Kyunghee; Nam, Jeong-Hee; Lee, Hyunju

2001-01-01

Examines the effects of a formative assessment with detailed feedback on students' science learning achievement and attitudes regarding formative assessment. Involves (n=133) ninth grade students from Seoul and administers pre- and post-tests for learning achievement and attitude regarding formative assessment. (Contains 16 references.)…
Using Item Response Theory and Adaptive Testing in Online Career Assessment

ERIC Educational Resources Information Center

Betz, Nancy E.; Turner, Brandon M.

2011-01-01

The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory…
Evaluation of item candidates for a diabetic retinopathy quality of life item bank.

PubMed

Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L

2013-09-01

We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2013-01-01

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
An Examination of Differential Item Functioning on the Vanderbilt Assessment of Leadership in Education

ERIC Educational Resources Information Center

Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph

2009-01-01

The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…
A Multidimensional Scaling Approach to Dimensionality Assessment for Measurement Instruments Modeled by Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Toro, Maritsa

2011-01-01

The statistical assessment of dimensionality provides evidence of the underlying constructs measured by a survey or test instrument. This study focuses on educational measurement, specifically tests comprised of items described as multidimensional. That is, items that require examinee proficiency in multiple content areas and/or multiple cognitive…
PISA Test Items and School-Based Examinations in Greece: Exploring the Relationship between Global and Local Assessment Discourses

ERIC Educational Resources Information Center

Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

2013-01-01

The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…

A Socio-Cultural Theorisation of Formative Assessment

ERIC Educational Resources Information Center

Pryor, John; Crossouard, Barbara

2008-01-01

Formative assessment has attracted increasing attention from both practitioners and scholars over the last decade. This paper draws on the authors' empirical research conducted over eleven years in educational situations ranging from infant schools to postgraduate education to propose a theorisation of formative assessment. Formative assessment is…
Using Conditional Percentages During Free-Operant Stimulus Preference Assessments to Predict the Effects of Preferred Items on Stereotypy: Preliminary Findings.

PubMed

Frewing, Tyla M; Rapp, John T; Pastrana, Sarah J

2015-09-01

To date, researchers have not identified an efficient methodology for selecting items that will compete with automatically reinforced behavior. In the present study, we identified high preference, high stereotypy (HP-HS), high preference, low stereotypy (HP-LS), low preference, high stereotypy (LP-HS), and low preference, low stereotypy (LP-LS) items based on response allocation to items and engagement in stereotypy during one to three, 30-min free-operant competing stimulus assessments (CSAs). The results showed that access to HP-LS items decreased stereotypy for all four participants; however, the results for other items were only predictive for one participant. Reanalysis of the CSA results revealed that the HP-LS item was typically identified by (a) the combined results of the first 10 min of the three 30-min assessments or (b) the results of one 30-min assessment. The clinical implications for the use of this method, as well as future directions for research, are briefly discussed. © The Author(s) 2015.
Written formative assessment and silence in the classroom

NASA Astrophysics Data System (ADS)

Lee Hang, Desmond Mene; Bell, Beverley

2015-09-01

In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the account of assessment practices in the Chinese culture given by Yin and Buck. Secondly, we document the cultural practice of silence in Samoan classroom's which has lead to the use of written formative assessment as in the Yin and Buck article. We also discuss the use of written formative assessment as a scaffold for teacher development for formative assessment. Finally, we briefly discuss both studies on formative assessment as a sociocultural practice.
Designing K-2 Formative Assessment Tasks

ERIC Educational Resources Information Center

Reed, Kristen E.; Goldenberg, E. Paul

2016-01-01

Formative assessment is a process used by teachers and students during instruction that provides feedback to adjust ongoing teaching and learning to improve students' achievements of intended instructional outcomes. Formative assessment means assessment embedded in instruction. That definition was adopted in 2006 by the Council of Chief State…
Assessing birth experience in fathers as an important aspect of clinical obstetrics: how applicable is Salmon's Item List for men?

PubMed

Gawlik, Stephanie; Müller, Mitho; Hoffmann, Lutz; Dienes, Aimée; Reck, Corinna

2015-01-01

validated questionnaire assessment of fathers' experiences during childbirth is lacking in routine clinical practice. Salmon's Item List is a short, validated method used for the assessment of birth experience in mothers in both English- and German-speaking communities. With little to no validated data available for fathers, this pilot study aimed to assess the applicability of the German version of Salmon's Item List, including a multidimensional birth experience concept, in fathers. longitudinal study. Data were collected by questionnaires. University hospital in Germany. the birth experiences of 102 fathers were assessed four to six weeks post partum using the German version of Salmon's Item List. construct validity testing with exploratory factor analysis using principal component analysis with varimax rotation was performed to identify the dimensions of childbirth experiences. Internal consistency was also analysed. factor analysis yielded a four-factor solution comprising 17 items that accounted for 54.5% of the variance. The main domain was 'fulfilment', and the secondary domains were 'emotional distress', 'physical discomfort' and 'emotional adaption'. For fulfilment, Cronbach's α met conventional reliability standards (0.87). Salmon's Item List is an appropriate instrument to assess birth experience in fathers in terms of fulfilment. Larger samples need to be examined in order to prove the stability of the factor structure before this can be extended to routine clinical assessment. a reduced version of Salmon's Item List may be useful as a screening tool for general assessment. Copyright © 2014 Elsevier Ltd. All rights reserved.
An item response theory evaluation of three depression assessment instruments in a clinical sample.

PubMed

Adler, Mats; Hetta, Jerker; Isacsson, Göran; Brodin, Ulf

2012-06-21

This study investigates whether an analysis, based on Item Response Theory (IRT), can be used for initial evaluations of depression assessment instruments in a limited patient sample from an affective disorder outpatient clinic, with the aim to finding major advantages and deficiencies of the instruments. Three depression assessment instruments, the depression module from the Patient Health Questionnaire (PHQ9), the depression subscale of Affective Self Rating Scale (AS-18-D) and the Montgomery-Åsberg Depression Rating Scale (MADRS) were evaluated in a sample of 61 patients with affective disorder diagnoses, mainly bipolar disorder. A '3- step IRT strategy' was used. In a first step, the Mokken non-parametric analysis showed that PHQ9 and AS-18-D had strong overall scalabilities of 0.510 [C.I. 0.42, 0.61] and 0,513 [C.I. 0.41, 0.63] respectively, while MADRS had a weak scalability of 0.339 [C.I. 0.25, 0.43]. In a second step, a Rasch model analysis indicated large differences concerning the item discriminating capacity and was therefore considered not suitable for the data. In third step, applying a more flexible two parameter model, all three instruments showed large differences in item information and items had a low capacity to reliably measure respondents at low levels of depression severity. We conclude that a stepwise IRT-approach, as performed in this study, is a suitable tool for studying assessment instruments at early stages of development. Such an analysis can give useful information, even in small samples, in order to construct more precise measurements or to evaluate existing assessment instruments. The study suggests that the PHQ9 and AS-18-D can be useful for measurement of depression severity in an outpatient clinic for affective disorder, while the MADRS shows weak measurement properties for this type of patients.
Item-Writing Guidelines for Physics

ERIC Educational Resources Information Center

Regan, Tom

2015-01-01

A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…
Development of a simple 12-item theory-based instrument to assess the impact of continuing professional development on clinical behavioral intentions.

PubMed

Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

2014-01-01

Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral intentions showed adequate validity and reliability
Item Response Data Analysis Using Stata Item Response Theory Package

ERIC Educational Resources Information Center

Yang, Ji Seung; Zheng, Xiaying

2018-01-01

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Development of an Item Bank for the Assessment of Knowledge on Biology in Argentine University Students.

PubMed

Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo

The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.
Elementary Teacher Use of Formative Assessment

ERIC Educational Resources Information Center

Cotton, Donna McLamb

2013-01-01

This dissertation was designed to examine elementary teacher use of formative assessment and the impact formative assessment may have on student achievement as measured by benchmark assessments. The study was conducted in a school district in northwestern North Carolina. The teachers in this study have had NCFALCON training in the use of formative…
The Dimensional Assessment of Personality Psychopathology Basic Questionnaire: shortened versions item analysis.

PubMed

Aluja, Anton; Blanch, Àngel; Blanco, Eduardo; Martí-Guiu, Maite; Balada, Ferran

2015-01-13

This study has been designed to evaluate and replicate the psychometric properties of the Dimensional Assessment of Personality Psychopathology-Basic Questionnaire (DAPP-BQ) and the DAPP-BQ short form (DAPP-SF) in a large Spanish general population sample. Additionally, we have generated a reduced form called DAPP-90, using a strategy based on a structural equation modeling (SEM) methodology in two independent samples, a calibration and a validation sample. The DAPP-90 scales obtained a more satisfactory fit on SEM adjustment values (average: TLI > .97 and RMSEA < .04) respect to full DAPP-BQ and the 136-item version. According to the factorial congruency coefficients, the DAPP-90 obtains a similar structure to the DAPP-BQ and the DAPP-SF. The DAPP-90 internal consistency is acceptable, with a Cronbach's alpha mean of .75. We did not find any differences in the pattern of relations between the two DAPP-BQ shortened versions and the SCL-90-R factors. The new 90-items version is especially useful when it is difficult to use the long version for diverse reasons, such as the assessment of patients in hospital consultation or in brief psychological assessments.
Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data.

PubMed

Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

2017-01-01

The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

ERIC Educational Resources Information Center

Sachse, Karoline A.; Haag, Nicole

2017-01-01

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Making Moves: Formative Assessment in Mathematics

ERIC Educational Resources Information Center

Duckor, Brent; Holmberg, Carrie; Becker, Joanne Rossi

2017-01-01

Research on teacher professional learning has shown that formative assessment can improve student learning more than most instructional practices. Empirical evidence indicates that thoughtfully implemented formative assessment practices improve students' learning, increase students' scores, and narrow achievement gaps between low-achieving…
Guide to an Assessment of Consumer Skills.

ERIC Educational Resources Information Center

Education Commission of the States, Denver, CO.

This guide is intended to assist those interested in developing and/or assessing consumer skills. It is an accompanyment to a separate collection of survey items (mostly in a multiple choice format) designed to assess seventeen-year-olds' consumer skills. It is suggested that the items can be used as part of an item pool, as an instructional tool,…
PSSA Released Reading Items, 2000-2001. The Pennsylvania System of School Assessment.

ERIC Educational Resources Information Center

Pennsylvania State Dept. of Education, Harrisburg. Bureau of Curriculum and Academic Services.

This document contains materials directly related to the actual reading test of the Pennsylvania System of School Assessment (PSSA), including the reading rubric, released passages, selected-response questions with answer keys, performance tasks, and scored samples of students' responses to the tasks. All of these items may be duplicated to…
Development of a Simple 12-Item Theory-Based Instrument to Assess the Impact of Continuing Professional Development on Clinical Behavioral Intentions

PubMed Central

Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

2014-01-01

Background Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Methods and Findings Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. Conclusion A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral
The 4-Item Negative Symptom Assessment (NSA-4) Instrument: A Simple Tool for Evaluating Negative Symptoms in Schizophrenia Following Brief Training.

PubMed

Alphs, Larry; Morlock, Robert; Coon, Cheryl; van Willigenburg, Arjen; Panagides, John

2010-07-01

Objective. To assess the ability of mental health professionals to use the 4-item Negative Symptom Assessment instrument, derived from the Negative Symptom Assessment-16, to rapidly determine the severity of negative symptoms of schizophrenia.Design. Open participation.Setting. Medical education conferences.Participants. Attendees at two international psychiatry conferences.Measurements. Participants read a brief set of the 4-item Negative Symptom Assessment instructions and viewed a videotape of a patient with schizophrenia. Using the 1 to 6 4-item Negative Symptom Assessment severity rating scale, they rated four negative symptom items and the overall global negative symptoms. These ratings were compared with a consensus rating determination using frequency distributions and Chi-square tests for the proportion of participant ratings that were within one point of the expert rating.Results. More than 400 medical professionals (293 physicians, 50% with a European practice, and 55% who reported past utilization of schizophrenia ratings scales) participated. Between 82.1 and 91.1 percent of the 4-items and the global rating determinations by the participants were within one rating point of the consensus expert ratings. The differences between the percentage of participant rating scores that were within one point versus the percentage that were greater than one point different from those by the consensus experts was significant (p<0.0001). Participants rating of negative symptoms using the 4-item Negative Symptom Assessment did not generally differ among the geographic regions of practice, the professional credentialing, or their familiarity with the use of schizophrenia symptom rating instruments.Conclusion. These findings suggest that clinicians from a variety of geographic practices can, after brief training, use the 4-item Negative Symptom Assessment effectively to rapidly assess negative symptoms in patients with schizophrenia.
Psychometric properties and podiatric medical student perceptions of USMLE-style items in a general anatomy course.

PubMed

D'Antoni, Anthony V; DiLandro, Anthony C; Chusid, Eileen D; Trepal, Michael J

2012-01-01

In 2010, the New York College of Podiatric Medicine general anatomy course was redesigned to emphasize clinical anatomy. Over a 2-year period, United States Medical Licensing Examination (USMLE)-style items were used in lecture assessments with two cohorts of students (N =200). Items were single-best-answer and extended-matching formats. Psychometric properties of items and assessments were evaluated, and anonymous student post-course surveys were administered. Mean grades for each assessment were recorded over time and compared between cohorts using analysis of variance. Correlational analyses were used to investigate the relationship between final course grades and lecture examinations. Post-course survey response rates for the cohorts were 71 of 97 (73%) and 81 of 103 (79%). The USMLE-style items had strong psychometric properties. Point biserial correlations were 0.20 and greater, and the range of students answering the items correctly was 25% to 75%. Examinations were highly reliable, with Kuder-Richardson 20 coefficients of 0.71 to 0.76. Students (>80%) reported that single-best-answer items were easier than extended-matching items. Students (>76%) believed that the items on the quizzes/examinations were similar to those found on USMLE Step 1. Most students (>84%) believed that they would do well on the anatomy section of their boards (American Podiatric Medical Licensing Examination [APMLE] Part I). Students valued USMLE-style items. These data, coupled with the psychometric data, suggest that USMLE-style items can be successfully incorporated into a basic science course in podiatric medical education. Outcomes from students who recently took the APMLE Part I suggest that incorporation of USMLE-style items into the general anatomy course was a successful measure and prepared them well.

Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

PubMed Central

Sharafi, Zahra

2017-01-01

Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463
An Extended Validity Argument for Assessing Feedback Culture.

PubMed

Rougas, Steven; Clyne, Brian; Cianciolo, Anna T; Chan, Teresa M; Sherbino, Jonathan; Yarris, Lalena M

2015-01-01

NEGEA 2015 CONFERENCE ABSTRACT (EDITED): Measuring an Organization's Culture of Feedback: Can It Be Done? Steven Rougas and Brian Clyne. CONSTRUCT: This study sought to develop a construct for measuring formative feedback culture in an academic emergency medicine department. Four archetypes (Market, Adhocracy, Clan, Hierarchy) reflecting an organization's values with respect to focus (internal vs. external) and process (flexibility vs. stability and control) were used to characterize one department's receptiveness to formative feedback. The prevalence of residents' identification with certain archetypes served as an indicator of the department's organizational feedback culture. New regulations have forced academic institutions to implement wide-ranging changes to accommodate competency-based milestones and their assessment. These changes challenge residencies that use formative feedback from faculty as a major source of data for determining training advancement. Though various approaches have been taken to improve formative feedback to residents, there currently exists no tool to objectively measure the organizational culture that surrounds this process. Assessing organizational culture, commonly used in the business sector to represent organizational health, may help residency directors gauge their program's success in fostering formative feedback. The Organizational Culture Assessment Instrument (OCAI) is widely used, extensively validated, applicable to survey research, and theoretically based and may be modifiable to assess formative feedback culture in the emergency department. Using a modified Delphi technique and several iterations of focus groups amongst educators at one institution, four of the original six OCAI domains (which each contain 4 possible responses) were modified to create a 16-item Formative Feedback Culture Tool (FFCT) that was administered to 26 residents (response rate = 55%) at a single academic emergency medicine department. The mean
Online formative assessments: exploring their educational value

PubMed Central

NAGANDLA, KAVITHA; SULAIHA, SHARIFAH; NALLIAH, SIVALINGAM

2018-01-01

Introduction: Online formative assessments (OFA’s) have been increasingly recognised in medical education as resources that promote self-directed learning. Formative assessments are used to support the self-directed learning of students. Online formative assessments have been identified to be less time consuming with automated feedback. This pilot study aimed to determine whether participation and performance in online formative assessments (OFA’s) had measurable effects on learning and evaluate the students’ experience of using the OFA’s in the department of Obstetrics and Gynaecology. Methods: This is a cross-sectional study conducted among fourth year medical students (n=92) during their seven week postings in Obstetrics and Gynaecology. Five sets of online formative assessments in the format of one best answers (OBA), Objective structured practical examination (OSPE) and Short answer question (SAQ) with feedback were delivered over five weeks through the online portal. The mean scores of the end of posting summative exam (EOP) of those who participated in the assessments (OFA users) and of those who did not (non-OFA users) were compared, using Students t test. The frequency of tool usage was analysed and satisfaction surveys were utilized at the end of the course by survey questionnaire using the five point Likert scale. Results: The mean scores of the students in end of posting summative examination marks for students who had participated in the online formative assessment (OFA users) and for those who had not (non OFA users) showed no significant difference in all the three components OBA, SAQ and OSPE (p=0.902, 0.633, 0.248). Majority of the students perceived that OFAs fulfilled the stated aims and objectives and so they would persuade their peers to participate in the OFAs. Conclusions: Online formative assessments are perceived as tools that promote self-directed learning, improved knowledge and tailor learning for individual learning needs and
Development of a self-report physical function instrument for disability assessment: item pool construction and factor analysis.

PubMed

McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K

2013-09-01

To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Development of a Self-Report Physical Function Instrument for Disability Assessment: Item Pool Construction and Factor Analysis

PubMed Central

McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.

2014-01-01

Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

PubMed

Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

2015-07-01

The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Handbook of Formative Assessment

ERIC Educational Resources Information Center

Andrade, Heidi, Ed.; Cizek, Gregory J., Ed.

2010-01-01

Formative assessment has recently become a focus of renewed research as state and federal policy-makers realize that summative assessments have reached a point of diminishing returns as a tool for increasing student achievement. Consequently, supporters of large-scale testing programs are now beginning to consider the potential of formative…
A Faculty Toolkit for Formative Assessment in Pharmacy Education.

PubMed

DiVall, Margarita V; Alston, Greg L; Bird, Eleanora; Buring, Shauna M; Kelley, Katherine A; Murphy, Nanci L; Schlesselman, Lauren S; Stowe, Cindy D; Szilagyi, Julianna E

2014-11-15

This paper aims to increase understanding and appreciation of formative assessment and its role in improving student outcomes and the instructional process, while educating faculty on formative techniques readily adaptable to various educational settings. Included are a definition of formative assessment and the distinction between formative and summative assessment. Various formative assessment strategies to evaluate student learning in classroom, laboratory, experiential, and interprofessional education settings are discussed. The role of reflective writing and portfolios, as well as the role of technology in formative assessment, are described. The paper also offers advice for formative assessment of faculty teaching. In conclusion, the authors emphasize the importance of creating a culture of assessment that embraces the concept of 360-degree assessment in both the development of a student's ability to demonstrate achievement of educational outcomes and a faculty member's ability to become an effective educator.
Assessment of item-writing flaws in multiple-choice questions.

PubMed

Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

2013-01-01

This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
Motivating student learning using a formative assessment journey

PubMed Central

Evans, Darrell J R; Zeun, Paul; Stanier, Robert A

2014-01-01

Providing formative assessment opportunities has been recognised as a significant benefit to student learning. The outcome of any formative assessment should be one that ultimately helps improve student learning through familiarising students with the levels of learning required, informing them about gaps in their learning and providing feedback to guide the direction of learning. This article provides an example of how formative assessments can be developed into a formative assessment journey where a number of different assessments can be offered to students during the course of a module of teaching, thus utilising a spaced-education approach. As well as incorporating the specific drivers of formative assessment, we demonstrate how approaches deemed to be stimulating, interactive and entertaining with the aim of maximising enthusiasm and engagement can be incorporated. We provide an example of a mixed approach to evaluating elements of the assessment journey that focuses student reaction, appraisal of qualitative and quantitative feedback from student questionnaires, focus group analysis and teacher observations. Whilst it is not possible to determine a quantifiable effect of the assessment journey on student learning, usage data and student feedback shows that formative assessment can achieve high engagement and positive response to different assessments. Those assessments incorporating an active learning element and a quiz-based approach appear to be particularly popular. A spaced-education format encourages a building block approach to learning that is continuous in nature rather than focussed on an intense period of study prior to summative examinations. PMID:24111930
An Item-Level Psychometric Analysis of the Personality Assessment Inventory: Clinical Scales in a Psychiatric Inpatient Unit

ERIC Educational Resources Information Center

Siefert, Caleb J.; Sinclair, Samuel J.; Kehl-Fie, Kendra A.; Blais, Mark A.

2009-01-01

Multi-item multiscale self-report measures are increasingly used in inpatient assessments. When considering a measure for this setting, it is important to evaluate the psychometric properties of the clinical scales and items to ensure that they are functioning as intended in a highly distressed clinical population. The present study examines scale…
A Faculty Toolkit for Formative Assessment in Pharmacy Education

PubMed Central

Alston, Greg L.; Bird, Eleanora; Buring, Shauna M.; Kelley, Katherine A.; Murphy, Nanci L.; Schlesselman, Lauren S.; Stowe, Cindy D.; Szilagyi, Julianna E.

2014-01-01

This paper aims to increase understanding and appreciation of formative assessment and its role in improving student outcomes and the instructional process, while educating faculty on formative techniques readily adaptable to various educational settings. Included are a definition of formative assessment and the distinction between formative and summative assessment. Various formative assessment strategies to evaluate student learning in classroom, laboratory, experiential, and interprofessional education settings are discussed. The role of reflective writing and portfolios, as well as the role of technology in formative assessment, are described. The paper also offers advice for formative assessment of faculty teaching. In conclusion, the authors emphasize the importance of creating a culture of assessment that embraces the concept of 360-degree assessment in both the development of a student’s ability to demonstrate achievement of educational outcomes and a faculty member’s ability to become an effective educator. PMID:26056399
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Assessing the equivalence of Web-based and paper-and-pencil questionnaires using differential item and test functioning (DIF and DTF) analysis: a case of the Four-Dimensional Symptom Questionnaire (4DSQ).

PubMed

Terluin, Berend; Brouwers, Evelien P M; Marchand, Miquelle A G; de Vet, Henrica C W

2018-05-01

Many paper-and-pencil (P&P) questionnaires have been migrated to electronic platforms. Differential item and test functioning (DIF and DTF) analysis constitutes a superior research design to assess measurement equivalence across modes of administration. The purpose of this study was to demonstrate an item response theory (IRT)-based DIF and DTF analysis to assess the measurement equivalence of a Web-based version and the original P&P format of the Four-Dimensional Symptom Questionnaire (4DSQ), measuring distress, depression, anxiety, and somatization. The P&P group (n = 2031) and the Web group (n = 958) consisted of primary care psychology clients. Unidimensionality and local independence of the 4DSQ scales were examined using IRT and Yen's Q3. Bifactor modeling was used to assess the scales' essential unidimensionality. Measurement equivalence was assessed using IRT-based DIF analysis using a 3-stage approach: linking on the latent mean and variance, selection of anchor items, and DIF testing using the Wald test. DTF was evaluated by comparing expected scale scores as a function of the latent trait. The 4DSQ scales proved to be essentially unidimensional in both modalities. Five items, belonging to the distress and somatization scales, displayed small amounts of DIF. DTF analysis revealed that the impact of DIF on the scale level was negligible. IRT-based DIF and DTF analysis is demonstrated as a way to assess the equivalence of Web-based and P&P questionnaire modalities. Data obtained with the Web-based 4DSQ are equivalent to data obtained with the P&P version.
Psychometrical assessment and item analysis of the General Health Questionnaire in victims of terrorism.

PubMed

Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

2013-03-01

There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and 729 relatives of the victims. All participants were evaluated using the 28-item version of the GHQ (GHQ-28). We examined the reliability and external validity of scores on the scale using Cronbach's alpha and Pearson correlation with the State-Trait Anxiety Inventory (STAI), respectively. The factor structure of the scale was analyzed with varimax rotation. Samejima's (1969) graded response model was used to explore the item properties. The GHQ-28 scores showed good reliability and item-scale correlations. The factor analysis identified 3 factors: anxious-somatic symptoms, social dysfunction, and depression symptoms. All factors showed good correlation with the STAI. Before rotation, the first, second, and third factor explained 44.0%, 6.4%, and 5.0% of the variance, respectively. Varimax rotation redistributed the percentages of variance accounted for to 28.4%, 13.8%, and 13.2%, respectively. Items with the highest loadings in the first factor measured anxiety symptoms, whereas items with the highest loadings in the third factor measured suicide ideation. Samejima's model found that high scores in suicide-related items were associated with severe depression. The factor structure of the GHQ-28 found in this study underscores the preeminence of anxiety symptoms among victims of terrorism and their relatives. Item response analysis identified the most difficult and significant items for each factor. PsycINFO Database Record (c) 2013 APA, all rights reserved.
A 14-item Mediterranean diet assessment tool and obesity indexes among high-risk subjects: the PREDIMED trial.

PubMed

Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

2012-01-01

Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Cross-sectional assessment of all participants in the "PREvención con DIeta MEDiterránea" (PREDIMED) trial. 7,447 participants (55-80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥ 3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were -0.0066 (95% confidence interval, -0.0088 to -0.0049) for women and -0.0059 (-0.0079 to -0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥ 10 points versus ≤ 7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. A brief 14-item tool was able to capture a strong monotonic inverse association between adherence to a good quality dietary pattern (Mediterranean diet
Gender, Assessment and Students' Literacy Learning: Implications for Formative Assessment

ERIC Educational Resources Information Center

Murphy, Patricia; Ivinson, Gabrielle

2005-01-01

Formative assessment is intended to develop students' capacity to learn and increase the effectiveness of teaching. However, the extent to which formative assessment can meet these aims depends on the relationship between its conception and current conceptions of learning. In recent years concern about sex group differences in achievement has led…
Item Analyses of Memory Differences

PubMed Central

Salthouse, Timothy A.

2017-01-01

Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

PubMed

Eichenbaum, Alexander E; Marcus, David K; French, Brian F

2017-06-01

This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.

Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis.

PubMed

Forkmann, Thomas; Boecker, Maren; Norra, Christine; Eberle, Nicole; Kircher, Tilo; Schauerte, Patrick; Mischke, Karl; Westhofen, Martin; Gauggel, Siegfried; Wirtz, Markus

2009-05-01

The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Identifying Promising Items: The Use of Crowdsourcing in the Development of Assessment Instruments

ERIC Educational Resources Information Center

Sadler, Philip M.; Sonnert, Gerhard; Coyle, Harold P.; Miller, Kelly A.

2016-01-01

The psychometrically sound development of assessment instruments requires pilot testing of candidate items as a first step in gauging their quality, typically a time-consuming and costly effort. Crowdsourcing offers the opportunity for gathering data much more quickly and inexpensively than from most targeted populations. In a simulation of a…
To Sum or Not to Sum: Taxometric Analysis with Ordered Categorical Assessment Items

ERIC Educational Resources Information Center

Walters, Glenn D.; Ruscio, John

2009-01-01

Meehl's taxometric method has been shown to differentiate between categorical and dimensional data, but there are many ways to implement taxometric procedures. When analyzing the ordered categorical data typically provided by assessment instruments, summing items to form input indicators has been a popular practice for more than 20 years. A Monte…
Promoting proximal formative assessment with relational discourse

NASA Astrophysics Data System (ADS)

Scherr, Rachel E.; Close, Hunter G.; McKagan, Sarah B.

2012-02-01

The practice of proximal formative assessment - the continual, responsive attention to students' developing understanding as it is expressed in real time - depends on students' sharing their ideas with instructors and on teachers' attending to them. Rogerian psychology presents an account of the conditions under which proximal formative assessment may be promoted or inhibited: (1) Normal classroom conditions, characterized by evaluation and attention to learning targets, may present threats to students' sense of their own competence and value, causing them to conceal their ideas and reducing the potential for proximal formative assessment. (2) In contrast, discourse patterns characterized by positive anticipation and attention to learner ideas increase the potential for proximal formative assessment and promote self-directed learning. We present an analysis methodology based on these principles and demonstrate its utility for understanding episodes of university physics instruction.
Investigation of Science Inquiry Items for Use on an Alternate Assessment Based on Modified Achievement Standards Using Cognitive Lab Methodology

ERIC Educational Resources Information Center

Dickenson, Tammiee S.; Gilmore, Joanna A.; Price, Karen J.; Bennett, Heather L.

2013-01-01

This study evaluated the benefits of item enhancements applied to science-inquiry items for incorporation into an alternate assessment based on modified achievement standards for high school students. Six items were included in the cognitive lab sessions involving both students with and without disabilities. The enhancements (e.g., use of visuals,…
Developing an item bank and short forms that assess the impact of asthma on quality of life.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

2014-02-01

The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Formative Assessment Probes: Big and Small Seeds. Linking Formative Assessment Probes to the Scientific Practices

ERIC Educational Resources Information Center

Keeley, Page

2016-01-01

This column focuses on promoting learning through assessment. Formative assessment probes are designed to uncover students' ideas about objects, events, and processes in the natural world. This assessment information is then used throughout instruction to move students toward an understanding of the scientific ideas behind the probes. During the…
Development of the PROMIS nicotine dependence item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Formative Assessment: A Cybernetic Viewpoint

ERIC Educational Resources Information Center

Roos, Bertil; Hamilton, David

2005-01-01

This paper considers alternative assessment, feedback and cybernetics. For more than 30 years, debates about the bi-polarity of formative and summative assessment have served as surrogates for discussions about the workings of the mind, the social implications of assessment and, as important, the role of instruction in the advancement of learning.…
Item Feature Effects in Evolution Assessment

ERIC Educational Resources Information Center

Nehm, Ross H.; Ha, Minsu

2011-01-01

Despite concerted efforts by science educators to understand patterns of evolutionary reasoning in science students and teachers, the vast majority of evolution education studies have failed to carefully consider or control for item feature effects in knowledge measurement. Our study explores whether robust contextualization patterns emerge within…
Calibration of Automatically Generated Items Using Bayesian Hierarchical Modeling.

ERIC Educational Resources Information Center

Johnson, Matthew S.; Sinharay, Sandip

For complex educational assessments, there is an increasing use of "item families," which are groups of related items. However, calibration or scoring for such an assessment requires fitting models that take into account the dependence structure inherent among the items that belong to the same item family. C. Glas and W. van der Linden…
Applying Item Response Theory Methods to Design a Learning Progression-Based Science Assessment

ERIC Educational Resources Information Center

Chen, Jing

2012-01-01

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1)…
Development of the AGREE II, part 2: assessment of validity of items and tools to support application

PubMed Central

Brouwers, Melissa C.; Kho, Michelle E.; Browman, George P.; Burgers, Jako S.; Cluzeau, Françoise; Feder, Gene; Fervers, Béatrice; Graham, Ian D.; Hanna, Steven E.; Makarski, Julie

2010-01-01

Background We established a program of research to improve the development, reporting and evaluation of practice guidelines. We assessed the construct validity of the items and user’s manual in the β version of the AGREE II. Methods We designed guideline excerpts reflecting high-and low-quality guideline content for 21 of the 23 items in the tool. We designed two study packages so that one low-quality and one high-quality version of each item were randomly assigned to each package. We randomly assigned 30 participants to one of the two packages. Participants reviewed and rated the guideline content according to the instructions of the user’s manual and completed a survey assessing the manual. Results In all cases, content designed to be of high quality was rated higher than low-quality content; in 18 of 21 cases, the differences were significant (p < 0.05). The manual was rated by participants as appropriate, easy to use, and helpful in differentiating guidelines of varying quality, with all scores above the mid-point of the seven-point scale. Considerable feedback was offered on how the items and manual of the β-AGREE II could be improved. Interpretation The validity of the items was established and the user’s manual was rated as highly useful by users. We used these results and those of our study presented in part 1 to modify the items and user’s manual. We recommend AGREE II (available at www.agreetrust.org) as the revised standard for guideline development, reporting and evaluation. PMID:20513779
Computerized Numerical Control Test Item Bank.

ERIC Educational Resources Information Center

Reneau, Fred; And Others

This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
Harnessing Collaborative Annotations on Online Formative Assessments

ERIC Educational Resources Information Center

Lin, Jian-Wei; Lai, Yuan-Cheng

2013-01-01

This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…
Guideline appraisal with AGREE II: online survey of the potential influence of AGREE II items on overall assessment of guideline quality and recommendation for use.

PubMed

Hoffmann-Eßer, Wiebke; Siering, Ulrich; Neugebauer, Edmund A M; Brockhaus, Anne Catharina; McGauran, Natalie; Eikermann, Michaela

2018-02-27

The AGREE II instrument is the most commonly used guideline appraisal tool. It includes 23 appraisal criteria (items) organized within six domains. AGREE II also includes two overall assessments (overall guideline quality, recommendation for use). Our aim was to investigate how strongly the 23 AGREE II items influence the two overall assessments. An online survey of authors of publications on guideline appraisals with AGREE II and guideline users from a German scientific network was conducted between 10th February 2015 and 30th March 2015. Participants were asked to rate the influence of the AGREE II items on a Likert scale (0 = no influence to 5 = very strong influence). The frequencies of responses and their dispersion were presented descriptively. Fifty-eight of the 376 persons contacted (15.4%) participated in the survey and the data of the 51 respondents with prior knowledge of AGREE II were analysed. Items 7-12 of Domain 3 (rigour of development) and both items of Domain 6 (editorial independence) had the strongest influence on the two overall assessments. In addition, Items 15-17 (clarity of presentation) had a strong influence on the recommendation for use. Great variations were shown for the other items. The main limitation of the survey is the low response rate. In guideline appraisals using AGREE II, items representing rigour of guideline development and editorial independence seem to have the strongest influence on the two overall assessments. In order to ensure a transparent approach to reaching the overall assessments, we suggest the inclusion of a recommendation in the AGREE II user manual on how to consider item and domain scores. For instance, the manual could include an a-priori weighting of those items and domains that should have the strongest influence on the two overall assessments. The relevance of these assessments within AGREE II could thereby be further specified.
A 14-Item Mediterranean Diet Assessment Tool and Obesity Indexes among High-Risk Subjects: The PREDIMED Trial

PubMed Central

Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

2012-01-01

Objective Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Design Cross-sectional assessment of all participants in the “PREvención con DIeta MEDiterránea” (PREDIMED) trial. Subjects 7,447 participants (55–80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Results Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were −0.0066 (95% confidence interval, –0.0088 to −0.0049) for women and –0.0059 (–0.0079 to –0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥10 points versus ≤7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. Conclusions A brief 14-item tool was able to capture a strong monotonic inverse association between
Measuring Teaching Best Practice in the Induction Years: Development and Validation of an Item-Level Assessment

ERIC Educational Resources Information Center

Kingsley, Laurie; Romine, William

2014-01-01

Schools and teacher induction programs around the world routinely assess teaching best practice to inform accreditation, tenure/promotion, and professional development decisions. Routine assessment is also necessary to ensure that teachers entering the profession get the assistance they need to develop and succeed. We introduce the Item-Level…
Formative Assessment Probes: Is It Melting? Formative Assessment for Teacher Learning

ERIC Educational Resources Information Center

Keeley, Page

2013-01-01

Formative assessment probes are effective tools for uncovering students' ideas about the various concepts they encounter when learning science. They are used to build a bridge from where the student is in his or her thinking to where he or she needs to be in order to construct and understand the scientific explanation for observed phenomena.…
IRT-Estimated Reliability for Tests Containing Mixed Item Formats

ERIC Educational Resources Information Center

Shu, Lianghua; Schwarz, Richard D.

2014-01-01

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

ERIC Educational Resources Information Center

Lee, HyeSun; Geisinger, Kurt F.

2016-01-01

The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…
Conjoint Community Resiliency Assessment Measure-28/10 items (CCRAM28 and CCRAM10): A self-report tool for assessing community resilience.

PubMed

Leykin, Dmitry; Lahad, Mooli; Cohen, Odeya; Goldberg, Avishay; Aharonson-Daniel, Limor

2013-12-01

Community resilience is used to describe a community's ability to deal with crises or disruptions. The Conjoint Community Resiliency Assessment Measure (CCRAM) was developed in order to attain an integrated, multidimensional instrument for the measurement of community resiliency. The tool was developed using an inductive, exploratory, sequential mixed methods design. The objective of the present study was to portray and evaluate the CCRAM's psychometric features. A large community sample (N = 1,052) were assessed by the CCRAM tool, and the data was subjected to exploratory and confirmatory factor analysis. A Five factor model (21 items) was obtained, explaining 67.67 % of the variance. This scale was later reduced to 10-item brief instrument. Both scales showed good internal consistency coefficients (α = .92 and α = .85 respectively), and acceptable fit indices to the data. Seven additional items correspond to information requested by leaders, forming the CCRAM28. The CCRAM has been shown to be an acceptable practical tool for assessing community resilience. Both internal and external validity have been demonstrated, as all factors obtained in the factor analytical process, were tightly linked to previous literature on community resilience. The CCRAM facilitates the estimation of an overall community resiliency score but furthermore, it detects the strength of five important constructs of community function following disaster: Leadership, Collective Efficacy, Preparedness, Place Attachment and Social Trust. Consequently, the CCRAM can serve as an aid for community leaders to assess, monitor, and focus actions to enhance and restore community resilience for crisis situations.
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

PubMed Central

Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

2014-01-01

Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Teachers' Use of Test-Item Banks for Student Assessment in North Carolina Secondary Agricultural Education Programs

ERIC Educational Resources Information Center

Marshall, Joy Morgan

2014-01-01

Higher expectations are on all parties to ensure students successfully perform on standardized tests. Specifically in North Carolina agriculture classes, students are given a CTE Post Assessment to measure knowledge gained and proficiency. Prior to students taking the CTE Post Assessment, teachers have access to a test item bank system that…
An Anthropologist among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi

ERIC Educational Resources Information Center

Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin

2015-01-01

This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…
An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog).

PubMed

Dowling, N Maritza; Bolt, Daniel M; Deng, Sien

2016-12-01

When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Screening for depression in arthritis populations: an assessment of differential item functioning in three self-reported questionnaires.

PubMed

Hu, Jinxiang; Ward, Michael M

2017-09-01

To determine if persons with arthritis differ systematically from persons without arthritis in how they respond to questions on three depression questionnaires, which include somatic items such as fatigue and sleep disturbance. We extracted data on the Centers for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9), and the Kessler-6 (K-6) scale from three large population-based national surveys. We assessed items on these questionnaires for differential item functioning (DIF) between persons with and without self-reported physician-diagnosed arthritis using multiple indicator multiple cause models, which controlled for the underlying level of depression and important confounders. We also examined if DIF by arthritis status was similar between women and men. Although five items of the CES-D, one item of the PHQ-9, and five items of the K-6 scale had evidence of DIF based on statistical comparisons, the magnitude of each difference was less than the threshold of a small effect. The statistical differences were a function of the very large sample sizes in the surveys. Effect sizes for DIF were similar between women and men except for two items on the Patient Health Questionnaire-9. For each questionnaire, DIF accounted for 8% or less of the arthritis-depression association, and excluding items with DIF did not reduce the difference in depression scores between those with and without arthritis. Persons with arthritis respond to items on the CES-D, PHQ-9, and K-6 depression scales similarly to persons without arthritis, despite the inclusion of somatic items in these scales.
Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

PubMed

Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

2013-11-01

To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has
Formative assessment in mathematics for engineering students

NASA Astrophysics Data System (ADS)

Ní Fhloinn, Eabhnat; Carr, Michael

2017-07-01

In this paper, we present a range of formative assessment types for engineering mathematics, including in-class exercises, homework, mock examination questions, table quizzes, presentations, critical analyses of statistical papers, peer-to-peer teaching, online assessments and electronic voting systems. We provide practical tips for the implementation of such assessments, with a particular focus on time or resource constraints and large class sizes, as well as effective methods of feedback. In addition, we consider the benefits of such formative assessments for students and staff.
Formative Assessment in the High School IMC

ERIC Educational Resources Information Center

Edwards, Valerie A.

2007-01-01

In this article, the author discusses how she uses formative assessments of information literacy skills in the high school IMC. As a result of informal observation and conversations with individual students--a form of formative assessment itself--the author learned that students were not using indexes to locate relevant information in nonfiction…
Examination of the PROMIS upper extremity item bank.

PubMed

Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Valuing a More Rigorous Review of Formative Assessment's Effectiveness

ERIC Educational Resources Information Center

Apthorp, Helen; Klute, Mary; Petrites, Tony; Harlacher, Jason; Real, Marianne

2016-01-01

Prior reviews of evidence for the impact of formative assessment on student achievement suggest widely different estimates of formative assessment's effectiveness, ranging from 0.40 and 0.70 standard deviations in one review. The purpose of this study is to describe variability in the effectiveness of formative assessment for promoting student…
Student Perceptions of Formative Assessment in the Chemistry Classroom

ERIC Educational Resources Information Center

Haroldson, Rachelle Ann

2012-01-01

Research on formative assessment has focused on the ways teachers implement and use formative assessment to check student understanding in order to guide their instruction. This study shifted emphasis away from teachers to look at how students use and perceive formative assessment in the science classroom. Four key strategies of formative…
An examination of reactivity to craving assessment: craving to smoke does not change over the course of a multi-item craving questionnaire.

PubMed

Germeroth, Lisa J; Tiffany, Stephen T

2015-06-01

Self-report measures are typically used to assess drug craving, but researchers have questioned whether completing these assessments can elicit or enhance craving. Previous studies have examined cigarette craving reactivity and found null craving reactivity effects. Several methodological limitations of those studies, however, preclude definitive conclusions. The current study addresses limitations of previous studies and extends this area of research by using a large sample size to examine: (1) item-by-item changes in craving level during questionnaire completion, (2) craving reactivity as a function of craving intensity reflected in item content, (3) craving reactivity differences between nicotine dependent and nondependent smokers, and (4) potential reactivity across multiple sessions. This study also used a more comprehensive craving assessment (the 32-item Questionnaire on Smoking Urges; QSU) than employed in previous studies. Nicotine dependent and nondependent smokers (n=270; nicotine dependence determined by the Nicotine Addiction Taxon Scale) completed the QSU on six separate occasions across 12 weeks. Craving level was observed at the item level and across various subsets of items. Analyses indicated that there was no significant effect of item/subset position on craving ratings, nor were there any significant interactions between item/subset position and session or level of nicotine dependence. These findings indicate that, even with relatively sensitive procedures for detecting potential reactivity, there was no evidence that completing a craving questionnaire induces craving. Copyright © 2015 Elsevier Ltd. All rights reserved.
Assessing Model Data Fit of Unidimensional Item Response Theory Models in Simulated Data

ERIC Educational Resources Information Center

Kose, Ibrahim Alper

2014-01-01

The purpose of this paper is to give an example of how to assess the model-data fit of unidimensional IRT models in simulated data. Also, the present research aims to explain the importance of fit and the consequences of misfit by using simulated data sets. Responses of 1000 examinees to a dichotomously scoring 20 item test were simulated with 25…
Improving the Memory Sections of the Standardized Assessment of Concussion Using Item Analysis

ERIC Educational Resources Information Center

McElhiney, Danielle; Kang, Minsoo; Starkey, Chad; Ragan, Brian

2014-01-01

The purpose of the study was to improve the immediate and delayed memory sections of the Standardized Assessment of Concussion (SAC) by identifying a list of more psychometrically sound items (words). A total of 200 participants with no history of concussion in the previous six months (aged 19.60 ± 2.20 years; N?=?93 men, N?=?107 women)…
On the displacement of leisure items by food during multiple-stimulus preference assessments.

PubMed Central

Bojak, S L; Carr, J E

1999-01-01

Previous studies have demonstrated that when food and leisure stimuli are combined in multiple-stimulus preference assessments, individuals typically select food more often, although the leisure stimuli also have known reinforcing properties. The purpose of the current study was to replicate this effect and determine its durability by examining the effect after mealtimes. Four adults who had been diagnosed with severe mental retardation were given three initial multiple-stimulus (without replacement) preference assessments (i.e., food, leisure stimuli, and combined). All participants selected food items as the most preferred stimuli in the combined assessments. Combined assessments were then administered immediately before and after the evening meal for each participant for 1 week. The results showed similar data both before and after mealtimes. PMID:10641304
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.

Relationship between Item Responses of Negative Affect Items and the Distribution of the Sum of the Item Scores in the General Population

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Improving Foreign Language Speaking through Formative Assessment

ERIC Educational Resources Information Center

Tuttle, Harry Grover; Tuttle, Alan Robert

2012-01-01

Want a quick way to get your students happily conversing more in the target language? This practical book shows you how to use formative assessments to gain immediate and lasting improvement in your students' fluency. You'll learn how to: (1) Imbed the 3-minute formative assessment into every lesson with ease; (2) Engage students in peer formative…
Negative Affect Impairs Associative Memory but Not Item Memory

ERIC Educational Resources Information Center

Bisby, James A.; Burgess, Neil

2014-01-01

The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine…
Formative Assessment in Dance Education

ERIC Educational Resources Information Center

Andrade, Heidi; Lui, Angela; Palma, Maria; Hefferen, Joanna

2015-01-01

Feedback is crucial to students' growth as dancers. When used within the framework of formative assessment, or assessment for learning, feedback results in actionable next steps that dancers can use to improve their performances. This article showcases the work of two dance specialists, one elementary and one middle school teacher, who have…
Assessing the Straightforwardly-Worded Brief Fear of Negative Evaluation Scale for Differential Item Functioning Across Gender and Ethnicity.

PubMed

Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael

2015-06-01

The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
Effects of spacing of item repetitions in continuous recognition memory: does item retrieval difficulty promote item retention in older adults?

PubMed

Kılıç, Aslı; Hoyer, William J; Howard, Marc W

2013-01-01

BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

PubMed

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
[Impact of passing items above the ceiling on the assessment results of Peabody developmental motor scales].

PubMed

Zhao, Gai; Bian, Yang; Li, Ming

2013-12-18

To analyze the impact of passing items above the roof level in the gross motor subtest of Peabody development motor scales (PDMS-2) on its assessment results. In the subtests of PDMS-2, 124 children from 1.2 to 71 months were administered. Except for the original scoring method, a new scoring method which includes passing items above the ceiling were developed. The standard scores and quotients of the two scoring methods were compared using the independent-samples t test. Only one child could pass the items above the ceiling in the stationary subtest, 19 children in the locomotion subtest, and 17 children in the visual-motor integration subtest. When the scores of these passing items were included in the raw scores, the total raw scores got the added points of 1-12, the standard scores added 0-1 points and the motor quotients added 0-3 points. The diagnostic classification was changed only in two children. There was no significant difference between those two methods about motor quotients or standard scores in the specific subtest (P>0.05). The passing items above a ceiling of PDMS-2 isn't a rare situation. It usually takes place in the locomotion subtest and visual-motor integration subtest. Including these passing items into the scoring system will not make significant difference in the standard scores of the subtests or the developmental motor quotients (DMQ), which supports the original setting of a ceiling established by upassing 3 items in a row. However, putting the passing items above the ceiling into the raw score will improve tracking of children's developmental trajectory and intervention effects.
Methodology for developing and evaluating the PROMIS smoking item banks.

PubMed

Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

2014-09-01

This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

ERIC Educational Resources Information Center

Domyancich, John M.

2014-01-01

Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…
Formative Assessment Probes: With a Purpose

ERIC Educational Resources Information Center

Keeley, Page

2011-01-01

The first thing that comes to mind for many teachers when they think of assessment is testing, quizzes, performance tasks, and other summative forms used for grading purposes. Such assessment practices represent only a fraction of the kinds of assessment that occur on an ongoing basis in an effective science classroom. Formative assessment is a…
Concurrent validity of single-item measures of emotional exhaustion and depersonalization in burnout assessment.

PubMed

West, Colin P; Dyrbye, Liselotte N; Satele, Daniel V; Sloan, Jeff A; Shanafelt, Tait D

2012-11-01

Burnout is a common problem among physicians and physicians-in-training. The Maslach Burnout Inventory (MBI) is the gold standard for burnout assessment, but the length of this well-validated 22-item instrument can limit its feasibility for survey research. To evaluate the concurrent validity of two questions relative to the full MBI for measuring the association of burnout with published outcomes. DESIGN, PARTICIPANTS, AND MAIN MEASURES: The single questions "I feel burned out from my work" and "I have become more callous toward people since I took this job," representing the emotional exhaustion and depersonalization domains of burnout, respectively, were evaluated in published studies of medical students, internal medicine residents, and practicing surgeons. We compared predictive models for the association of each question, versus the full MBI, using longitudinal data on burnout and suicidality from 2006 and 2007 for 858 medical students at five United States medical schools, cross-sectional data on burnout and serious thoughts of dropping out of medical school from 2007 for 2222 medical students at seven United States medical schools, and cross-sectional data on burnout and unprofessional attitudes and behaviors from 2009 for 2566 medical students at seven United States medical schools. We also assessed results for longitudinal data on burnout and perceived major medical errors from 2003 to 2009 for 321 Mayo Clinic Rochester internal medicine residents and cross-sectional data on burnout and both perceived major medical errors and suicidality from 2008 for 7,905 respondents to a national survey of members of the American College of Surgeons. Point estimates of effect for models based on the single-item measures were uniformly consistent with those reported for models based on the full MBI. The single-item measures of emotional exhaustion and depersonalization exhibited strong associations with each published outcome (all p ≤ 0.008). No conclusion regarding
Written Formative Assessment and Silence in the Classroom

ERIC Educational Resources Information Center

Lee Hang, Desmond Mene; Bell, Beverley

2015-01-01

In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the…
Formative Assessment Probes: Constructing Cl-Ev-R Explanations to Formative Assessment Probes

ERIC Educational Resources Information Center

Keeley, Page

2015-01-01

A distinguishing feature of all the formative assessment probes in the "Uncovering Student Ideas" series is that each probe has two parts: (1) a selected answer choice that usually mirrors the research on commonly held ideas students have about concepts or phenomena; and (2) an explanation that supports their answer choice. It is this…
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).

PubMed

Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul

2018-03-01

Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Quality of surgical randomized controlled trials for acute cholecystitis: assessment based on CONSORT and additional check items.

PubMed

Shikata, Satoru; Nakayama, Takeo; Yamagishi, Hisakazu

2008-01-01

In this study, we conducted a limited survey of reports of surgical randomized controlled trials, using the consolidated standards of reporting trials (CONSORT) statement and additional check items to clarify problems in the evaluation of surgical reports. A total of 13 randomized trials were selected from two latest review articles on biliary surgery. Each randomized trial was evaluated according to 28 quality measures that comprised items from the CONSORT statement plus additional items. Analysis focused on relationships between the quality of each study and the estimated effect gap ("pooled estimate in meta-analysis" -- "estimated effect of each study"). No definite relationships were found between individual study quality and the estimated effect gap. The following items could have been described but were not provided in almost all the surgical RCT reports: "clearly defined outcomes"; "details of randomization"; "participant flow charts"; "intention-to-treat analysis"; "ancillary analyses"; and "financial conflicts of interest". The item, "participation of a trial methodologist in the study" was not found in any of the reports. Although the quality of reporting trials is not always related to a biased estimation of treatment effect, the items used for quality measures must be described to enable readers to evaluate the quality and applicability of the reporting. Further development of an assessment tool is needed for items specific to surgical randomized controlled trials.
Developing Parallel Career and Occupational Development Objectives and Exercise (Test) Items in Spanish for Assessment and Evaluation.

ERIC Educational Resources Information Center

Muratti, Jose E.; And Others

A parallel Spanish edition was developed of released objectives and objective-referenced items used in the National Assessment of Educational Progress (NAEP) in the field of Career and Occupational Development (COD). The Spanish edition was designed to assess the identical skills, attitudes, concepts, and knowledge of Spanish-dominant students…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

ERIC Educational Resources Information Center

Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.

2003-01-01

Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
What Do They Understand? Using Technology to Facilitate Formative Assessment

ERIC Educational Resources Information Center

Mitten, Carolyn; Jacobbe, Tim; Jacobbe, Elizabeth

2017-01-01

Formative assessment is so important to inform teachers' planning. A discussion of the benefits of using technology to facilitate formative assessment explains how four primary school teachers adopted three different apps to make their formative assessment more meaningful and useful.
Item Response Modeling of Presence-Severity Items: Application to Measurement of Patient-Reported Outcomes

ERIC Educational Resources Information Center

Liu, Ying; Verkuilen, Jay

2013-01-01

The Presence-Severity (P-S) format refers to a compound item structure in which a question is first asked to check the presence of the particular event in question. If the respondent provides an affirmative answer, a follow-up is administered, often about the frequency, density, severity, or impact of the event. Despite the popularity of the P-S…
Test Industry Split over "Formative" Assessment

ERIC Educational Resources Information Center

Cech, Scott J.

2008-01-01

There's a war of sorts going on within the normally staid assessment industry, and it's a war over the definition of a type of assessment that many educators understand in only a sketchy fashion. Formative assessments, also known as "classroom assessments," are in some ways easier to define by what they are not. They're not like the long,…
Agriculture Library of Test Items.

ERIC Educational Resources Information Center

Sutherland, Duncan, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Examining Increased Flexibility in Assessment Formats

ERIC Educational Resources Information Center

Irwin, Brian; Hepplestone, Stuart

2012-01-01

There have been calls in the literature for changes to assessment practices in higher education, to increase flexibility and give learners more control over the assessment process. This article explores the possibilities of allowing student choice in the format used to present their work, as a starting point for changing assessment, based on…
Psychological distress in cancer survivors: the further development of an item bank.

PubMed

Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P

2013-02-01

Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
Identifying predictors of physics item difficulty: A linear regression approach

NASA Astrophysics Data System (ADS)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

PubMed

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
The science achievement of various subgroups on alternative assessment formats

NASA Astrophysics Data System (ADS)

Lawrenz, Frances; Huffman, Douglas; Welch, Wayne

2001-05-01

The purpose of this study was to examine the science achievement outcomes for different subgroups of students using different assessment formats. A nationally representative sample of approximately 3,500 ninth grade science students from 13 high schools throughout the United States completed a series of science assessments designed to measure their level of achievement on the national science education standards. All of the schools were using a curriculum designed to meet the standards. The assessments included a multiple-choice test, a written open-ended test, a hands-on lab skills test, and a hands-on full investigation. The results show that the student outcomes on the different assessment formats are more highly correlated for higher achieving students than for lower achieving students. Patterns for different cultural groups also vary by assessment format. There were no differences found for sex. The results support the notion that different assessment formats assess different competencies and that the achievement of students from different subgroups varies by assessment format.
Vegetable parenting practices scale: Item response modeling analyses

USDA-ARS?s Scientific Manuscript database

Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
Formative Assessment: Making It Happen in the Classroom

ERIC Educational Resources Information Center

Heritage, Margaret

2010-01-01

Formative assessment allows teachers to identify and close gaps in student understanding and move learning forward. Now this research-based book helps educators develop the knowledge and skills necessary to successfully implement formative assessment in the classroom. Margaret Heritage walks readers through every step of the process and offers…
Hitting the Reset Button: Using Formative Assessment to Guide Instruction

ERIC Educational Resources Information Center

Dirksen, Debra J.

2011-01-01

Using formative assessment gives students a second chance to learn material they didn't master the first time around. It lets failure become a learning experience rather than something to fear. Several types of formative assessment are discussed, including how to use summative assessments formatively. (Contains 2 figures.)
Practical Consequences of Item Response Theory Model Misfit in the Context of Test Equating with Mixed-Format Test Data

PubMed Central

Zhao, Yue; Hambleton, Ronald K.

2017-01-01

In item response theory (IRT) models, assessing model-data fit is an essential step in IRT calibration. While no general agreement has ever been reached on the best methods or approaches to use for detecting misfit, perhaps the more important comment based upon the research findings is that rarely does the research evaluate IRT misfit by focusing on the practical consequences of misfit. The study investigated the practical consequences of IRT model misfit in examining the equating performance and the classification of examinees into performance categories in a simulation study that mimics a typical large-scale statewide assessment program with mixed-format test data. The simulation study was implemented by varying three factors, including choice of IRT model, amount of growth/change of examinees’ abilities between two adjacent administration years, and choice of IRT scaling methods. Findings indicated that the extent of significant consequences of model misfit varied over the choice of model and IRT scaling methods. In comparison with mean/sigma (MS) and Stocking and Lord characteristic curve (SL) methods, separate calibration with linking and fixed common item parameter (FCIP) procedure was more sensitive to model misfit and more robust against various amounts of ability shifts between two adjacent administrations regardless of model fit. SL was generally the least sensitive to model misfit in recovering equating conversion and MS was the least robust against ability shifts in recovering the equating conversion when a substantial degree of misfit was present. The key messages from the study are that practical ways are available to study model fit, and, model fit or misfit can have consequences that should be considered when choosing an IRT model. Not only does the study address the consequences of IRT model misfit, but also it is our hope to help researchers and practitioners find practical ways to study model fit and to investigate the validity of particular
Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks.

PubMed

Zhao, Yue

2017-03-01

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
Development of the PROMIS health expectancies of smoking item banks.

PubMed

Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Formative Assessment of Writing in English as a Foreign Language

ERIC Educational Resources Information Center

Burner, Tony

2016-01-01

Recognizing the importance of formative assessment, this mixed-methods study investigates how four teachers and 100 students respond to the new emphasis on formative assessment in English as a foreign language (EFL) writing classes in Norway. While previous studies have examined formative assessment in oral classroom interactions and focused on…
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
More relevant, precise, and efficient items for assessment of physical function and disability: moving beyond the classic instruments

PubMed Central

Fries, J F; Bruce, B; Bjorner, J; Rose, M

2006-01-01

Objectives Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of “physical function/disability” items and thus the performance of resulting instruments. Methods The present study included1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. Results A “present tense” time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the “past tense” had 80–90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of “disability” suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81–0.99. Conclusions Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials. PMID:17038464

Comparing Two Forms of Dynamic Assessment and Traditional Assessment of Preschool Phonological Awareness

ERIC Educational Resources Information Center

Kantor, Patricia Thatcher; Wagner, Richard K.; Torgesen, Joseph K.; Rashotte, Carol A.

2011-01-01

The goal of the current study was to compare two forms of dynamic assessment and standard assessment of preschool children's phonological awareness. The first form of dynamic assessment was a form of scaffolding in which item formats were modified in response to an error so as to make the task easier or more explicit. The second form of dynamic…
Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

PubMed

Sinharay, Sandip

2017-09-01

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Screencasts: Formative Assessment for Mathematical Thinking

ERIC Educational Resources Information Center

Soto, Melissa; Ambrose, Rebecca

2016-01-01

Increased attention to reasoning and justification in mathematics classrooms requires the use of more authentic assessment methods. Particularly important are tools that allow teachers and students opportunities to engage in formative assessment practices such as gathering data, interpreting understanding, and revising thinking or instruction.…
The Applicability of Interactive Item Templates in Varied Knowledge Types

ERIC Educational Resources Information Center

Koong, Chorng-Shiuh; Wu, Chi-Ying

2011-01-01

A well-edited assessment can enhance student's learning motives. Applicability of items, which includes item content and template, plays a crucial role in authoring a good assessment. Templates in discussion contain not only conventional true & false, multiple choice, completion item and short answer but also of those interactive ones. Methods…
Language-related differential item functioning between English and German PROMIS Depression items is negligible.

PubMed

Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

2017-12-01

To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.
Vegetable parenting practices scale. Item response modeling analyses

PubMed Central

Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

2015-01-01

Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting
Leading Formative Assessment Change: A 3-Phase Approach

ERIC Educational Resources Information Center

Northwest Evaluation Association, 2016

2016-01-01

If you are seeking greater student engagement and growth, you need to integrate high-impact formative assessment practices into daily instruction. Read the final article in our five-part series to find advice aimed at leaders determined to bring classroom formative assessment practices district wide. Learn: (1) what you MUST consider when…
Evaluating Design-Based Formative Assessment Practices in Outdoor Science Teaching

ERIC Educational Resources Information Center

Hartmeyer, Rikke; Stevenson, Matt P.; Bentsen, Peter

2016-01-01

Background and purpose: Research in formative assessment often pays close attention to the strategies which can be used by teachers. However, less emphasis in the literature seems to have been paid to study the application of formative assessment designs in practice. In this paper, we argue that a formative assessment design that we call…
Formative Assessment Probes: Representing Microscopic Life

ERIC Educational Resources Information Center

Keeley, Page

2011-01-01

This column focuses on promoting learning through assessment. The author discusses the formative assessment probe "Pond Water," which reveals how elementary children will often apply what they know about animal structures to newly discovered microscopic organisms, connecting their knowledge of the familiar to the unfamiliar through…
Integrating Test-Form Formatting into Automated Test Assembly

ERIC Educational Resources Information Center

Diao, Qi; van der Linden, Wim J.

2013-01-01

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

ERIC Educational Resources Information Center

Sydorenko, Tetyana

2011-01-01

This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
Feedback in formative OSCEs: comparison between direct observation and video-based formats

PubMed Central

Junod Perron, Noëlle; Louis-Simonet, Martine; Cerutti, Bernard; Pfarrwaller, Eva; Sommer, Johanna; Nendaz, Mathieu

2016-01-01

Introduction Medical students at the Faculty of Medicine, University of Geneva, Switzerland, have the opportunity to practice clinical skills with simulated patients during formative sessions in preparation for clerkships. These sessions are given in two formats: 1) direct observation of an encounter followed by verbal feedback (direct feedback) and 2) subsequent review of the videotaped encounter by both student and supervisor (video-based feedback). The aim of the study was to evaluate whether content and process of feedback differed between both formats. Methods In 2013, all second- and third-year medical students and clinical supervisors involved in formative sessions were asked to take part in the study. A sample of audiotaped feedback sessions involving supervisors who gave feedback in both formats were analyzed (content and process of the feedback) using a 21-item feedback scale. Results Forty-eight audiotaped feedback sessions involving 12 supervisors were analyzed (2 direct and 2 video-based sessions per supervisor). When adjusted for the length of feedback, there were significant differences in terms of content and process between both formats; the number of communication skills and clinical reasoning items addressed were higher in the video-based format (11.29 vs. 7.71, p=0.002 and 3.71 vs. 2.04, p=0.010, respectively). Supervisors engaged students more actively during the video-based sessions than during direct feedback sessions (self-assessment: 4.00 vs. 3.17, p=0.007; active problem-solving: 3.92 vs. 3.42, p=0.009). Students made similar observations and tended to consider that the video feedback was more useful for improving some clinical skills. Conclusion Video-based feedback facilitates discussion of clinical reasoning, communication, and professionalism issues while at the same time actively engaging students. Different time and conceptual frameworks may explain observed differences. The choice of feedback format should depend on the educational
Feedback in formative OSCEs: comparison between direct observation and video-based formats.

PubMed

Junod Perron, Noëlle; Louis-Simonet, Martine; Cerutti, Bernard; Pfarrwaller, Eva; Sommer, Johanna; Nendaz, Mathieu

2016-01-01

Medical students at the Faculty of Medicine, University of Geneva, Switzerland, have the opportunity to practice clinical skills with simulated patients during formative sessions in preparation for clerkships. These sessions are given in two formats: 1) direct observation of an encounter followed by verbal feedback (direct feedback) and 2) subsequent review of the videotaped encounter by both student and supervisor (video-based feedback). The aim of the study was to evaluate whether content and process of feedback differed between both formats. In 2013, all second- and third-year medical students and clinical supervisors involved in formative sessions were asked to take part in the study. A sample of audiotaped feedback sessions involving supervisors who gave feedback in both formats were analyzed (content and process of the feedback) using a 21-item feedback scale. Forty-eight audiotaped feedback sessions involving 12 supervisors were analyzed (2 direct and 2 video-based sessions per supervisor). When adjusted for the length of feedback, there were significant differences in terms of content and process between both formats; the number of communication skills and clinical reasoning items addressed were higher in the video-based format (11.29 vs. 7.71, p= 0.002 and 3.71 vs. 2.04, p= 0.010, respectively). Supervisors engaged students more actively during the video-based sessions than during direct feedback sessions (self-assessment: 4.00 vs. 3.17, p= 0.007; active problem-solving: 3.92 vs. 3.42, p= 0.009). Students made similar observations and tended to consider that the video feedback was more useful for improving some clinical skills. Video-based feedback facilitates discussion of clinical reasoning, communication, and professionalism issues while at the same time actively engaging students. Different time and conceptual frameworks may explain observed differences. The choice of feedback format should depend on the educational goal.
For Which Boys and Which Girls Are Reading Assessment Items Biased Against? Detection of Differential Item Functioning in Heterogeneous Gender Populations

ERIC Educational Resources Information Center

Grover, Raman K.; Ercikan, Kadriye

2017-01-01

In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
Development of the PROMIS negative psychosocial expectancies of smoking item banks.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Cerully, Jennifer; Kuhfeld, Megan; Hansen, Mark; Cai, Li

2014-09-01

Negative psychosocial expectancies of smoking include aspects of social disapproval and disappointment in oneself. This paper describes analyses conducted to develop and evaluate item banks for assessing psychosocial expectancies among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of psychosocial expectancies items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess psychosocial expectancies. A total of 21 items were included in the Psychosocial Expectancies item banks: 14 items are common across daily and nondaily smokers, 6 are unique to daily, and 1 is unique to nondaily. For both daily and nondaily smokers, the Psychosocial Expectancies item banks are strongly unidimensional, highly reliable (reliability = 0.95 and 0.93, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.85). Results from simulated CATs showed that, on average, fewer than 8 items are needed to assess psychosocial expectancies with adequate precision when using the item banks. Psychosocial expectancies of smoking can be assessed on the basis of these item banks via the SF, by using CAT, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Formative Assessment Probes: Talk Moves. A Formative Assessment Strategy for Fostering Productive Probe Discussions

ERIC Educational Resources Information Center

Keeley, Page

2016-01-01

Formative assessment probes can be used to foster productive science discussions in which students make their thinking visible to themselves, their peers, and the teacher. During these discussions, there is an exchange between the teacher and students that encourages exploratory thinking, supports careful listening to others' ideas, asks for…
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

ERIC Educational Resources Information Center

Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

2006-01-01

In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
The Role of Context in Young Learners' Processes for Responding to Self-Assessment Items

ERIC Educational Resources Information Center

Butler, Yuko Goto

2018-01-01

With use of self-assessment (SA) of young language learners on the rise, educators of young learners often want to know what SA captures and how best to use it in order to assist their students' learning. This study focuses on understanding how young learners' processes for responding to SA items differ by age and by context of implementation…
An NCME Instructional Module on Polytomous Item Response Theory Models

ERIC Educational Resources Information Center

Penfield, Randall David

2014-01-01

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…

Using Distractor-Driven Standards-Based Multiple-Choice Assessments and Rasch Modeling to Investigate Hierarchies of Chemistry Misconceptions and Detect Structural Problems with Individual Items

ERIC Educational Resources Information Center

Herrmann-Abell, Cari F.; DeBoer, George E.

2011-01-01

Distractor-driven multiple-choice assessment items and Rasch modeling were used as diagnostic tools to investigate students' understanding of middle school chemistry ideas. Ninety-one items were developed according to a procedure that ensured content alignment to the targeted standards and construct validity. The items were administered to 13360…
Anatomy of a physics test: Validation of the physics items on the Texas Assessment of Knowledge and Skills

NASA Astrophysics Data System (ADS)

Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry

2009-06-01

We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
Instruction and Learning through Formative Assessments

ERIC Educational Resources Information Center

Bossé, Michael J.; Lynch-Davis, Kathleen; Adu-Gyamfi, Kwaku; Chandler, Kayla

2016-01-01

Assessment and instruction are interwoven in mathematically rich formative assessment tasks, so employing these tasks in the classrooms is an exciting and time-efficient opportunity. To provide a window into how these tasks work in the classroom, this article analyzes summaries of student work on such a task and considers several students'…
State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts

ERIC Educational Resources Information Center

Swanson, Leonard C.

2010-01-01

This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…
Evaluation of Item Candidates: The PROMIS Qualitative Item Review

PubMed Central

DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.

2009-01-01

One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114
The five item Barthel index

PubMed Central

Hobart, J; Thompson, A

2001-01-01

OBJECTIVES—Routine data collection is now considered mandatory. Therefore, staff rated clinical scales that consist of multiple items should have the minimum number of items necessary for rigorous measurement. This study explores the possibility of developing a short form Barthel index, suitable for use in clinical trials, epidemiological studies, and audit, that satisfies criteria for rigorous measurement and is psychometrically equivalent to the 10 item instrument. METHODS—Data were analysed from 844 consecutive admissions to a neurological rehabilitation unit in London. Random half samples were generated. Short forms were developed in one sample (n=419), by selecting items with the best measurement properties, and tested in the other (n=418). For each of the 10 items of the BI, item total correlations and effect sizes were computed and rank ordered. The best items were defined as those with the lowest cross product of these rank orderings. The acceptability, reliability, validity, and responsiveness of three short form BIs (five, four, and three item) were determined and compared with the 10 item BI. Agreement between scores generated by short forms and 10 item BI was determined using intraclass correlation coefficients and the method of Bland and Altman. RESULTS—The five best items in this sample were transfers, bathing, toilet use, stairs, and mobility. Of the three short forms examined, the five item BI had the best measurement properties and was psychometrically equivalent to the 10 item BI. Agreement between scores generated by the two measures for individual patients was excellent (ICC=0.90) but not identical (limits of agreement=1.84±3.84). CONCLUSIONS—The five item short form BI may be a suitable outcome measure for group comparison studies in comparable samples. Further evaluations are needed. Results demonstrate a fundamental difference between assessment and measurement and the importance of incorporating psychometric methods in the
An Argument for Formative Assessment with Science Learning Progressions

ERIC Educational Resources Information Center

Alonzo, Alicia C.

2018-01-01

Learning progressions--particularly as defined and operationalized in science education--have significant potential to inform teachers' formative assessment practices. In this overview article, I lay out an argument for this potential, starting from definitions for "formative assessment practices" and "learning progressions"…
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Forty-two systematic reviews generated 23 items for assessing the risk of bias in values and preferences' studies.

PubMed

Yepes-Nuñez, Juan Jose; Zhang, Yuan; Xie, Feng; Alonso-Coello, Pablo; Selva, Anna; Schünemann, Holger; Guyatt, Gordon

2017-05-01

In systematic reviews of studies of patients' values and preferences, the objective of the study was to summarize items and domains authors have identified when considering the risk of bias (RoB) associated with primary studies. We conducted a systematic survey of systematic reviews of patients' values and preference studies. Our search included three databases (MEDLINE, EMBASE, and PsycINFO) from their inception to August 2015. We conducted duplicate data extraction, focusing on items that authors used to address RoB in the primary studies included in their reviews and the associated underlying domains, and summarized criteria in descriptive tables. We identified 42 eligible systematic reviews that addressed 23 items relevant to RoB and grouped the items into 7 domains: appropriate administration of instrument; instrument choice; instrument-described health state presentation; choice of participants group; description, analysis, and presentation of methods and results; patient understanding; and subgroup analysis. The items and domains identified provide insight into issues of RoB in patients' values and preference studies and establish the basis for an instrument to assess RoB in such studies. Copyright © 2017 Elsevier Inc. All rights reserved.
Formative evaluation of the dietary assessment component of Children's and Adolescents' Nutrition Assessment and Advice on the Web (CANAA-W).

PubMed

Vereecken, C; Covents, M; Maes, L; Moyson, T

2014-01-01

The increased availability of computers and the efficiency and user-acceptability of computer-assisted questioning have increased the attractiveness of computer-administered querying for large-scale population nutrition research during the last decade. The Young Adolescents' Nutrition Assessment on Computer (YANA-C), a computer-based 24-h dietary recall, was originally developed to collect dietary data among Belgian-Flemish adolescents. A web-based version was created to collect parentally reported dietary data of preschoolers, called Young Children's Nutrition Assessment on the Web (YCNA-W), which has been improved and adapted for use in young adolescents: Children and Adolescents' Nutrition Assessment and Advice on the Web (CANAA-W). The present study describes recent developments and the formative evaluation of the dietary assessment component. A feasibility questionnaire was completed by 131 children [mean (SD) age: 11.3 (0.7) years] and 53 parents. Eight focus groups were held with children (n = 65) and three with parents (n = 17). Children (C) and parents (P) found the instrument clear (C: 97%; P: 94%), comprehensible (C: 92%; P: 100%), attractive (C: 84%; P: 85%), fun (C: 93%; P: 83%) and easy to complete (C: 91%; P: 83%). There was ample explanation (C: 95%; P: 94%); the pictures were clear (C: 97%; P: 96%); and most respondents found the food items easy to find (C: 71%, P: 85%). The results helped to refine the lay out and structure of the instrument and the list of food items included. Children and parents were enthusiastic. The major challenge will be to convince parents who are less interested in dietary intake and less computer literate to participate in this type of study. Children in this age group (11-12 years) should complete the instrument with assistance from an adult. © 2013 The Authors Journal of Human Nutrition and Dietetics © 2013 The British Dietetic Association Ltd.
41 CFR 302-7.21 - If my HHG shipment includes an item for which a weight additive is assessed by the HHG carrier (e...

Code of Federal Regulations, 2012 CFR

2012-07-01

... includes an item for which a weight additive is assessed by the HHG carrier (e.g., boat, trailer...) General Rules § 302-7.21 If my HHG shipment includes an item for which a weight additive is assessed by... will not be responsible for the shipping charges that result from a weight additive so long as the...
41 CFR 302-7.21 - If my HHG shipment includes an item for which a weight additive is assessed by the HHG carrier (e...

Code of Federal Regulations, 2013 CFR

2013-07-01

... includes an item for which a weight additive is assessed by the HHG carrier (e.g., boat, trailer...) General Rules § 302-7.21 If my HHG shipment includes an item for which a weight additive is assessed by... will not be responsible for the shipping charges that result from a weight additive so long as the...
Improving the Factor Structure of Psychological Scales: The Expanded Format as an Alternative to the Likert Scale Format

ERIC Educational Resources Information Center

Zhang, Xijuan; Savalei, Victoria

2016-01-01

Many psychological scales written in the Likert format include reverse worded (RW) items in order to control acquiescence bias. However, studies have shown that RW items often contaminate the factor structure of the scale by creating one or more method factors. The present study examines an alternative scale format, called the Expanded format,…
Cross-informant and cross-national equivalence using item-response theory (IRT) linking: A case study using the behavioral assessment for children of African heritage in the United States and Jamaica.

PubMed

Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T

2016-03-01

Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Assessment of single-item literacy questions, age, and education level in the prediction of low health numeracy.

PubMed

Johnson, Tim V; Abbasi, Ammara; Kleris, Renee S; Ehrlich, Samantha S; Barthwaite, Echo; DeLong, Jennifer; Master, Viraj A

2013-08-01

Determining a patient's health literacy is important to optimum patient care. Single-item questions exist for screening written health literacy. We sought to assess the predictive potential of three common screening questions, along with patient age and education level, in the prediction of low health numerical literacy (numeracy). After demographic and educational information was obtained, 441 patients were administered three health literacy screening questions. The three-item Schwartz-Woloshin Numeracy Scale was then administered to assess for low health numeracy (score of 0 out of 3). This score served as the reference standard for Receiver Operating Characteristics (ROC) curve analysis. ROC curves were constructed and used to determine the area under the curve (AUC); a higher AUC suggests increased statistical significance. None of the three screening questions were significant predictors of low health numeracy. However, education level was a significant predictor of low health numeracy, with an AUC (95% CI) of 0.811 (0.720-0.902). This measure had a specificity of 95.3% at the cutoff of 12 years of education (<12 versus > or = 12 years of education) but was non-sensitive. Common single-item questions used to screen for written health literacy are ineffective screening tools for health numeracy. However, low education level is a specific predictor of low health numeracy.
Analysis of Item-Level Bias in the Bayley-III Language Subscales: The Validity and Utility of Standardized Language Assessment in a Multilingual Setting.

PubMed

Goh, Shaun K Y; Tham, Elaine K H; Magiati, Iliana; Sim, Litwee; Sanmugam, Shamini; Qiu, Anqi; Daniel, Mary L; Broekman, Birit F P; Rifkin-Graboi, Anne

2017-09-18

The purpose of this study was to improve standardized language assessments among bilingual toddlers by investigating and removing the effects of bias due to unfamiliarity with cultural norms or a distributed language system. The Expressive and Receptive Bayley-III language scales were adapted for use in a multilingual country (Singapore). Differential item functioning (DIF) was applied to data from 459 two-year-olds without atypical language development. This involved investigating if the probability of success on each item varied according to language exposure while holding latent language ability, gender, and socioeconomic status constant. Associations with language, behavioral, and emotional problems were also examined. Five of 16 items showed DIF, 1 of which may be attributed to cultural bias and another to a distributed language system. The remaining 3 items favored toddlers with higher bilingual exposure. Removal of DIF items reduced associations between language scales and emotional and language problems, but improved the validity of the expressive scale from poor to good. Our findings indicate the importance of considering cultural and distributed language bias in standardized language assessments. We discuss possible mechanisms influencing performance on items favoring bilingual exposure, including the potential role of inhibitory processing.
Incremental and Predictive Utility of Formative Assessment Methods of Reading Comprehension

ERIC Educational Resources Information Center

Marcotte, Amanda M.; Hintze, John M.

2009-01-01

Formative assessment measures are commonly used in schools to assess reading and to design instruction accordingly. The purpose of this research was to investigate the incremental and concurrent validity of formative assessment measures of reading comprehension. It was hypothesized that formative measures of reading comprehension would contribute…
Discriminant content validity: a quantitative methodology for assessing content of theory-based measures, with illustrative applications.

PubMed

Johnston, Marie; Dixon, Diane; Hart, Jo; Glidewell, Liz; Schröder, Carin; Pollard, Beth

2014-05-01

In studies involving theoretical constructs, it is important that measures have good content validity and that there is not contamination of measures by content from other constructs. While reliability and construct validity are routinely reported, to date, there has not been a satisfactory, transparent, and systematic method of assessing and reporting content validity. In this paper, we describe a methodology of discriminant content validity (DCV) and illustrate its application in three studies. Discriminant content validity involves six steps: construct definition, item selection, judge identification, judgement format, single-sample test of content validity, and assessment of discriminant items. In three studies, these steps were applied to a measure of illness perceptions (IPQ-R) and control cognitions. The IPQ-R performed well with most items being purely related to their target construct, although timeline and consequences had small problems. By contrast, the study of control cognitions identified problems in measuring constructs independently. In the final study, direct estimation response formats for theory of planned behaviour constructs were found to have as good DCV as Likert format. The DCV method allowed quantitative assessment of each item and can therefore inform the content validity of the measures assessed. The methods can be applied to assess content validity before or after collecting data to select the appropriate items to measure theoretical constructs. Further, the data reported for each item in Appendix S1 can be used in item or measure selection. Statement of contribution What is already known on this subject? There are agreed methods of assessing and reporting construct validity of measures of theoretical constructs, but not their content validity. Content validity is rarely reported in a systematic and transparent manner. What does this study add? The paper proposes discriminant content validity (DCV), a systematic and transparent method
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Innovative Application of a Multidimensional Item Response Model in Assessing the Influence of Social Desirability on the Pseudo-Relationship between Self-Efficacy and Behavior

ERIC Educational Resources Information Center

Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.

2006-01-01

This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…

Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

PubMed

Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

2018-02-01

Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Concurrent Validity of Selected Movement Skill Items in the New Zealand Ministry of Education's Health and Physical Education Assessment

ERIC Educational Resources Information Center

Miyahara, Motohide; Clarkson, Jenny

2005-01-01

The concurrent validity of the New Zealand Ministry of Education's Health and Physical Education Assessment (HPEA) (Crooks & Flockton, 1999) was examined with the respective items from the Movement Assessment Battery for Children (Henderson & Sugden, 2000) and the Bruininks-Oseretsky Test of Motor Proficiency (Bruininks, 1978) on manual…
Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
[Examination of calibrated item banks for the assessment of work capacity in an outpatient sample of cardiological patients].

PubMed

Haschke, A; Abberger, B; Schröder, K; Wirtz, M; Bengel, J; Baumeister, H

2013-12-01

Work capacity is a major outcome variable in cardiological rehabilitation. However, there is a lacks of capacious and economic assessment instruments for work capacity. By developing item response theory based item banks a first step to close this gap is done. The present study aims to validate the work capacity item banks for cardiovascular rehabilitation inpatients (WCIB-Cardio) in a sample of cardiovascular rehabilitation outpatients. Additionally, we examined differences between in- and outpatients with regard to their work capacity. Data of 283 cardiovascular rehabilitation inpatients and 77 cardiovascular rehabilitation outpatients were collected in 15 rehabilitation centres. The WCIB-Cardio contains the 2 domains of "cognitive work capacity"(20 items) and "physical work capacity"(18 items). Validation of the item bank for cardiological outpatients was conducted with separate Rasch analysis for each domain. For the domain of cognitive work capacity 10 items showed satisfying quality criteria (Rasch reliability=0.71; overall model fit=0.07). For the domain of physical work capacity good values for Rasch-reliability (0.83) and overall -model fit (0.65) could be proven after exclusion of 3 items. Unidimensionality and a broad ability spectrum could be covered for both domains. With regard to content, outpatients evaluate themselves less burdened than inpatients for the domain of cognitive work capacity (‾X outpatient =-2.06 vs. ‾X inpatient =-2.49; p<0.07) similarly for the domain of physical work capacity (‾X outpatient =-3.68 vs. ‾X inpatient =-2.88; p<0.01). With the WCIB-Cardio II there is a precondition to develop self-report instruments of work capacity in cardiological in- and outpatients. © Georg Thieme Verlag KG Stuttgart · New York.
Impact of teaching and assessment format on electrocardiogram interpretation skills.

PubMed

Raupach, Tobias; Hanneforth, Nathalie; Anders, Sven; Pukrop, Tobias; Th J ten Cate, Olle; Harendza, Sigrid

2010-07-01

Interpretation of the electrocardiogram (ECG) is a core clinical skill that should be developed in undergraduate medical education. This study assessed whether small-group peer teaching is more effective than lectures in enhancing medical students' ECG interpretation skills. In addition, the impact of assessment format on study outcome was analysed. Two consecutive cohorts of Year 4 medical students (n=335) were randomised to receive either traditional ECG lectures or the same amount of small-group, near-peer teaching during a 6-week cardiorespiratory course. Before and after the course, written assessments of ECG interpretation skills were undertaken. Whereas this final assessment yielded a considerable amount of credit points for students in the first cohort, it was merely formative in nature for the second cohort. An unannounced retention test was applied 8 weeks after the end of the cardiovascular course. A significant advantage of near-peer teaching over lectures (effect size 0.33) was noted only in the second cohort, whereas, in the setting of a summative assessment, both teaching formats appeared to be equally effective. A summative instead of a formative assessment doubled the performance increase (Cohen's d 4.9 versus 2.4), mitigating any difference between teaching formats. Within the second cohort, the significant difference between the two teaching formats was maintained in the retention test (p=0.017). However, in both cohorts, a significant decrease in student performance was detected during the 8 weeks following the cardiovascular course. Assessment format appeared to be more powerful than choice of instructional method in enhancing student learning. The effect observed in the second cohort was masked by an overriding incentive generated by the summative assessment in the first cohort. This masking effect should be considered in studies assessing the effectiveness of different teaching methods.
Alzheimer's Disease Assessment: A Review and Illustrations Focusing on Item Response Theory Techniques.

PubMed

Balsis, Steve; Choudhury, Tabina K; Geraci, Lisa; Benge, Jared F; Patrick, Christopher J

2018-04-01

Alzheimer's disease (AD) affects neurological, cognitive, and behavioral processes. Thus, to accurately assess this disease, researchers and clinicians need to combine and incorporate data across these domains. This presents not only distinct methodological and statistical challenges but also unique opportunities for the development and advancement of psychometric techniques. In this article, we describe relatively recent research using item response theory (IRT) that has been used to make progress in assessing the disease across its various symptomatic and pathological manifestations. We focus on applications of IRT to improve scoring, test development (including cross-validation and adaptation), and linking and calibration. We conclude by describing potential future multidimensional applications of IRT techniques that may improve the precision with which AD is measured.
Development of the PROMIS coping expectancies of smoking item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Negative affect impairs associative memory but not item memory.

PubMed

Bisby, James A; Burgess, Neil

2013-12-17

The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine the effects of emotion on memory for items and their associations. By presenting neutral and negative items with background contexts, Experiment 1 demonstrated that item memory was facilitated by emotional affect, whereas memory for an associated context was reduced. In Experiment 2, arousal was manipulated independently of the memoranda, by a threat of shock, whereby encoding trials occurred under conditions of threat or safety. Memory for context was equally impaired by the presence of negative affect, whether induced by threat of shock or a negative item, relative to retrieval of the context of a neutral item in safety. In Experiment 3, participants were presented with neutral and negative items as paired associates, including all combinations of neutral and negative items. The results showed both above effects: compared to a neutral item, memory for the associate of a negative item (a second item here, context in Experiments 1 and 2) is impaired, whereas retrieval of the item itself is enhanced. Our findings suggest that negative affect impairs associative memory while recognition of a negative item is enhanced. They support dual-processing models in which negative affect or stress impairs hippocampal-dependent associative memory while the storage of negative sensory/perceptual representations is spared or even strengthened.
Integrating Data-Based Decision Making, Assessment for Learning and Diagnostic Testing in Formative Assessment

ERIC Educational Resources Information Center

Van der Kleij, Fabienne M.; Vermeulen, Jorine A.; Schildkamp, Kim; Eggen, Theo J. H .M.

2015-01-01

Recent research has highlighted the lack of a uniform definition of formative assessment, although its effectiveness is widely acknowledged. This paper addresses the theoretical differences and similarities amongst three approaches to formative assessment that are currently most frequently discussed in educational research literature: data-based…
Testing Three-Item Versions for Seven of Young's Maladaptive Schema

ERIC Educational Resources Information Center

Blau, Gary; DiMino, John; Sheridan, Natalie; Pred, Robert S.; Beverly, Clyde; Chessler, Marcy

2015-01-01

The Young Schema Questionnaire (YSQ) in either long-form (205- item) or short-form (75-item or 90-item) versions has demonstrated its clinical usefulness for assessing early maladaptive schemas. However, even a 75 or 90-item "short form", particularly when combined with other measures, can represent a lengthy…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange

ERIC Educational Resources Information Center

Penfield, Randall D.

2005-01-01

Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…
Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

ERIC Educational Resources Information Center

Baghaei, Purya; Aryadoust, Vahid

2015-01-01

Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…
The Influence of Item Response Indecision on the Self-Directed Search

ERIC Educational Resources Information Center

Sampson, James P., Jr.; Shy, Jonathan D.; Hartley, Sarah Lucas; Reardon, Robert C.; Peterson, Gary W.

2009-01-01

Students (N = 247) responded to Self-Directed Search (SDS) per the standard response format and were also instructed to record a question mark (?) for items about which they were uncertain (item response indecision [IRI]). The initial responses of the 114 participants with a (?) were then reversed and a second SDS summary code was obtained and…
Formative Assessment in Primary Science

ERIC Educational Resources Information Center

Loughland, Tony; Kilpatrick, Laetitia

2015-01-01

This action learning study in a year three classroom explored the implementation of five formative assessment principles to assist students' understandings of the scientific topic of liquids and solids. These principles were employed to give students a greater opportunity to express their understanding of the concepts. The study found that the…
Connected Classroom Technology Facilitates Multiple Components of Formative Assessment Practice

NASA Astrophysics Data System (ADS)

Shirley, Melissa L.; Irving, Karen E.

2015-02-01

Formative assessment has been demonstrated to result in increased student achievement across a variety of educational contexts. When using formative assessment strategies, teachers engage students in instructional tasks that allow the teacher to uncover levels of student understanding so that the teacher may change instruction accordingly. Tools that support the implementation of formative assessment strategies are therefore likely to enhance student achievement. Connected classroom technologies (CCTs) include a family of devices that show promise in facilitating formative assessment. By promoting the use of interactive student tasks and providing both teachers and students with rapid and accurate data on student learning, CCT can provide teachers with necessary evidence for making instructional decisions about subsequent lessons. In this study, the experiences of four middle and high school science teachers in their first year of implementing the TI-Navigator™ system, a specific type of CCT, are used to characterize the ways in which CCT supports the goals of effective formative assessment. We present excerpts of participant interviews to demonstrate the alignment of CCT with several main phases of the formative assessment process. CCT was found to support implementation of a variety of instructional tasks that generate evidence of student learning for the teacher. The rapid aggregation and display of student learning evidence provided teachers with robust data on which to base subsequent instructional decisions.
Formative Assessment Jump-Starts a Middle Grades Differentiation Initiative

ERIC Educational Resources Information Center

Doubet, Kristina J.

2012-01-01

A rural middle level school had stalled in its third year of a district-wide differentiation initiative. This article describes the way teachers and the leadership team engaged in collaborative practices to put a spotlight on formative assessment. Teachers learned to systematically gather formative assessment data from their students and to use…
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

PubMed

Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Investigating Computer-Based Formative Assessments in a Medical Terminology Course

ERIC Educational Resources Information Center

Wilbanks, Jammie T.

2012-01-01

Research has been conducted on the effectiveness of formative assessments and on effectively teaching medical terminology; however, research had not been conducted on the use of formative assessments in a medical terminology course. A quantitative study was performed which captured data from a pretest, self-assessment, four module exams, and a…

Formative Assessment in Practice: A Process of Inquiry and Action

ERIC Educational Resources Information Center

Heritage, Margaret

2013-01-01

Margaret Heritage presents a practical guide to formative assessment as a process of "inquiry and action" essential to twenty-first century learning. In the wake of the development of the Common Core standards and the effort to develop the appropriate assessments to accompany them, formative assessment has attracted increasing attention…
The comparability of English, French and Dutch scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F): an assessment of differential item functioning in patients with systemic sclerosis.

PubMed

Kwakkenbos, Linda; Willems, Linda M; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H M; Thombs, Brett D

2014-01-01

The Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics.
ABILOCO-Kids: a Rasch-built 10-item questionnaire for assessing locomotion ability in children with cerebral palsy.

PubMed

Caty, Gilles D; Gilles, Caty D; Arnould, Carlyne; Thonnard, Jean-Louis; Lejeune, Thierry M

2008-11-01

To develop a questionnaire (ABILOCO-Kids) based on the Rasch measurement model that assesses locomotion ability in children with cerebral palsy. Prospective study and questionnaire development. A total of 113 children with cerebral palsy (10 (standard deviation 2.5) years old). A 41-item questionnaire was developed based on existing scales and on the clinical experience of professionals in the field of rehabilitation. This questionnaire was tested separately on the 113 children with cerebral palsy and their parents. Their responses were analysed using the Rasch model (RUMM-2020) to select items that had an ordered rating scale and that fit a unidimensional model. The final ABILOCO-Kids scale consisted of 10 locomotion activities, of which difficulty was rated by the parents. The parents gave a more precise assessment of their children's ability than the children themselves, leading to a wider range of measurement that was well-targeted on the sample population and that had good reliability (r=0.97) and reproducibility (intraclass correlation coefficient=0.96). Item calibration did not vary with age, sex or clinical presentation (hemiplegia, diplegia, quadriplegia). The concurrent validity of the ABILOCO-Kids questionnaire was also shown by its correlation with the Gross Motor Function Classification System. The ABILOCO-Kids questionnaire has good psychometric qualities for measuring a wide range of locomotion abilities in children with cerebral palsy.
Developing the Communicative Participation Item Bank: Rasch Analysis Results From a Spasmodic Dysphonia Sample

PubMed Central

Baylor, Carolyn R.; Yorkston, Kathryn M.; Eadie, Tanya L.; Miller, Robert M.; Amtmann, Dagmar

2011-01-01

Purpose The purpose of this study was to conduct the initial psychometric analyses of the Communicative Participation Item Bank—a new self-report instrument designed to measure the extent to which communication disorders interfere with communicative participation. This item bank is intended for community-dwelling adults across a range of communication disorders. Method A set of 141 candidate items was administered to 208 adults with spasmodic dysphonia. Participants rated the extent to which their condition interfered with participation in various speaking communication situations. Questionnaires were administered online or in a paper version per participant preference. Participants also completed the Voice Handicap Index (B. H. Jacobson et al., 1997) and a demographic questionnaire. Rasch analyses were conducted using Winsteps software (J. M. Linacre, 1991). Results The results show that items functioned better when the 5-category response format was recoded to a 4-category format. After removing 8 items that did not fit the Rasch model, the remaining 133 items demonstrated strong evidence of sufficient unidimensionality, with the model accounting for 89.3% of variance. Item location values ranged from −2.73 to 2.20 logits. Conclusions Preliminary Rasch analyses of the Communicative Participation Item Bank show strong psychometric properties. Further testing in populations with other communication disorders is needed. PMID:19717652
Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

PubMed

Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

2017-09-16

This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
Using Formative Assessments to Improve Student Learning Outcomes: A Study of the Different Types of Formative Assessments Teachers Use to Drive Instruction and Their Effects on Student Learning

ERIC Educational Resources Information Center

Alzina, Amy

2016-01-01

Understanding the difference between summative and formative assessments is still unclear for many teachers and principals as well as the effects formative assessments have on student learning outcomes. This quantitative study was conducted to explicitly explore formative assessments as a means to improve student learning outcomes, while examining…
Formative Assessment in Mathematics for Engineering Students

ERIC Educational Resources Information Center

Ní Fhloinn, Eabhnat; Carr, Michael

2017-01-01

In this paper, we present a range of formative assessment types for engineering mathematics, including in-class exercises, homework, mock examination questions, table quizzes, presentations, critical analyses of statistical papers, peer-to-peer teaching, online assessments and electronic voting systems. We provide practical tips for the…
Determining if Active Learning through a Formative Assessment Process Translates to Better Performance in Summative Assessment

ERIC Educational Resources Information Center

Grosas, Aidan Bradley; Raju, Shiwani Rani; Schuett, Burkhardt Siegfried; Chuck, Jo-Anne; Millar, Thomas James

2016-01-01

Formative assessment used in a level 2 unit, Immunology, gave outcomes that were both surprising and applicable across disciplines. Four formative tests were given and reviewed during class time. The students' attitudes to formative assessment were evaluated using questionnaires and its effectiveness in closing the gap was measured by the…
NAEP Validity Studies: Improving the Information Value of Performance Items in Large Scale Assessments. Working Paper No. 2003-08

ERIC Educational Resources Information Center

Pearson, P. David; Garavaglia, Diane R.

2003-01-01

The purpose of this essay is to explore both what is known and what needs to be learned about the information value of performance items "when they are used in large scale assessments." Within the context of the National Assessment of Educational Progress (NAEP), there is substantial motivation for answering these questions. Over the…
Transaction-Level Learning Analytics in Online Authentic Assessments

ERIC Educational Resources Information Center

Nyland, Rob; Davies, Randall S.; Chapman, John; Allen, Gove

2017-01-01

This paper presents a case for the use of transaction-level data when analyzing automated online assessment results to identify knowledge gaps and misconceptions for individual students. Transaction-level data, which records all of the steps a student uses to complete an assessment item, are preferred over traditional assessment formats that…
Validation of Physics Standardized Test Items

NASA Astrophysics Data System (ADS)

Marshall, Jill

2008-10-01

The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
Scoring best-worst data in unbalanced many-item designs, with applications to crowdsourcing semantic judgments.

PubMed

Hollis, Geoff

2018-04-01

Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.
Playing Games with Formative Assessment

ERIC Educational Resources Information Center

Cassie, Jonathan

2018-01-01

Games can be great tools to engage reluctant learners and provide ongoing feedback to educators about how their lessons are "sticking." Cassie discusses how to use gamified formative assessments to measure different kinds of skills and looks at the different ways teachers can use games in the classroom--from out-of-the-box board games to…
Implementing Curriculum-Embedded Formative Assessment in Primary School Science Classrooms

ERIC Educational Resources Information Center

Hondrich, Annika Lena; Hertel, Silke; Adl-Amini, Katja; Klieme, Eckhard

2016-01-01

The implementation of formative assessment strategies is challenging for teachers. We evaluated teachers' implementation fidelity of a curriculum-embedded formative assessment programme for primary school science education, investigating both material-supported, direct application and subsequent transfer. Furthermore, the relationship between…
Mathematics Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Fraser, Graham, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from previous tests are made available to teachers for the construction of pretests or posttests, reference tests for inter-class comparisons and general assignments. The collection was reviewed for content…
Missouri Assessment Program (MAP), Spring 2000: Intermediate Communication Arts, Released Items, Grade 7.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document deals with testing in intermediate communication arts for seventh graders in Missouri public schools. The document contains the following items from the Session 1 Test Booklet: "Swimming in Snow" (Diana C. Conway) (Items 1, 2, and 5); "Discovery" (Marion Dane Bauer) (Item 13); writing prompt; and a writer's…
Analyzing force concept inventory with item response theory

NASA Astrophysics Data System (ADS)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Modeling Unproductive Behavior in Online Homework in Terms of Latent Student Traits: An Approach Based on Item Response Theory

NASA Astrophysics Data System (ADS)

Gönülateş, Emre; Kortemeyer, Gerd

2017-04-01

Homework is an important component of most physics courses. One of the functions it serves is to provide meaningful formative assessment in preparation for examinations. However, correlations between homework and examination scores tend to be low, likely due to unproductive student behavior such as copying and random guessing of answers. In this study, we attempt to model these two counterproductive learner behaviors within the framework of Item Response Theory in order to provide an ability measurement that strongly correlates with examination scores. We find that introducing additional item parameters leads to worse predictions of examination grades, while introducing additional learner traits is a more promising approach.
The Missing Disciplinary Substance of Formative Assessment

ERIC Educational Resources Information Center

Coffey, Janet E.; Hammer, David; Levin, Daniel M.; Grant, Terrance

2011-01-01

We raise concerns about the current state of research and development in formative assessment, specifically to argue that in its concentration on "strategies for the teacher", the literature overlooks the "disciplinary substance" of what teachers and students assess. Our argument requires analysis of specific instances in the literature, and so we…
Assessment formats in dental medicine: An overview

PubMed Central

Gerhard-Szep, Susanne; Güntsch, Arndt; Pospiech, Peter; Söhnel, Andreas; Scheutzel, Petra; Wassmann, Torsten; Zahn, Tugba

2016-01-01

Aim: At the annual meeting of German dentists in Frankfurt am Main in 2013, the Working Group for the Advancement of Dental Education (AKWLZ) initiated an interdisciplinary working group to address assessments in dental education. This paper presents an overview of the current work being done by this working group, some of whose members are also actively involved in the German Association for Medical Education's (GMA) working group for dental education. The aim is to present a summary of the current state of research on this topic for all those who participate in the design, administration and evaluation of university-specific assessments in dentistry. Method: Based on systematic literature research, the testing scenarios listed in the National Competency-based Catalogue of Learning Objectives (NKLZ) have been compiled and presented in tables according to assessment value. Results: Different assessment scenarios are described briefly in table form addressing validity (V), reliability (R), acceptance (A), cost (C), feasibility (F), and the influence on teaching and learning (EI) as presented in the current literature. Infoboxes were deliberately chosen to allow readers quick access to the information and to facilitate comparisons between the various assessment formats. Following each description is a list summarizing the uses in dental and medical education. Conclusion: This overview provides a summary of competency-based testing formats. It is meant to have a formative effect on dental and medical schools and provide support for developing workplace-based strategies in dental education for learning, teaching and testing in the future. PMID:27579365

Geography Library of Test Items. Volume Four.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Three.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Meeve, Brian, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Five.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume Two.

ERIC Educational Resources Information Center

Meeve, Brian, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Six.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography: Library of Test Items. Volume II.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
The impact of item order on ratings of cancer risk perception.

PubMed

Taylor, Kathryn L; Shelby, Rebecca A; Schwartz, Marc D; Ackerman, Josh; LaSalle, V Holland; Gelmann, Edward P; McGuire, Colleen

2002-07-01

Although perceived risk is central to most theories of health behavior, there is little consensus on its measurement with regard to item wording, response set, or the number of items to include. In a methodological assessment of perceived risk, we assessed the impact of changing the order of three commonly used perceived risk items: quantitative personal risk, quantitative population risk, and comparative risk. Participants were 432 men and women enrolled in an ancillary study of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Three groups of consecutively enrolled participants responded to the three items in one of three question orders. Results indicated that item order was related to the perceived risk ratings of both ovarian (P < 0.05) and colorectal (P < 0.05) cancers. Perceptions of risk were significantly lower when the comparative rating was made first. The findings suggest that compelling participants to consider their own risk relative to the risk of others results in lower ratings of perceived risk. Although the use of multiple items may provide more information than when only a single method is used, different conclusions may be reached depending on the context in which an item is assessed.
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

ERIC Educational Resources Information Center

Wetzel, C. Douglas; McBride, James R.

Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
The Comparability of English, French and Dutch Scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F): An Assessment of Differential Item Functioning in Patients with Systemic Sclerosis

PubMed Central

Kwakkenbos, Linda; Willems, Linda M.; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H. M.; Thombs, Brett D.

2014-01-01

Objective The Functional Assessment of Chronic Illness Therapy- Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. Methods The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. Results A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. Conclusions There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics. PMID:24638101
Effects of age on negative subsequent memory effects associated with the encoding of item and item-context information.

PubMed

Mattson, Julia T; Wang, Tracy H; de Chastelaine, Marianne; Rugg, Michael D

2014-12-01

It has consistently been reported that "negative" subsequent memory effects--lower study activity for later remembered than later forgotten items--are attenuated in older individuals. The present functional magnetic resonance imaging study investigated whether these findings extend to subsequent memory effects associated with successful encoding of item-context information. Older (n = 25) and young (n = 17) subjects were scanned while making 1 of 2 encoding judgments on a series of pictures. Memory was assessed for the study item and, for items judged old, the item's encoding task. Both memory judgments were made using confidence ratings, permitting item and source memory strength to be unconfounded and source confidence to be equated across age groups. Replicating prior findings, negative item effects in regions of the default mode network in young subjects were reversed in older subjects. Negative source effects, however, were invariant with respect to age and, in both age groups, the magnitude of the effects correlated with source memory performance. It is concluded that negative item effects do not reflect processes necessary for the successful encoding of item-context associations in older subjects. Negative source effects, in contrast, appear to reflect the engagement of processes that are equally important for successful episodic encoding in older and younger individuals. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Lower-fat menu items in restaurants satisfy customers.

PubMed

Fitzpatrick, M P; Chapman, G E; Barr, S I

1997-05-01

To evaluate a restaurant-based nutrition program by measuring customer satisfaction with lower-fat menu items and assessing patrons' reactions to the program. Questionnaires to assess satisfaction with menu items were administered to patrons in eight of the nine restaurants that volunteered to participate in the nutrition program. One patron from each participating restaurant was randomly selected for a semistructured interview about nutrition programming in restaurants. Persons dining in eight participating restaurants over a 1-week period (n = 686). Independent samples t tests were used to compare respondents' satisfaction with lower-fat and regular menu items. Two-way analysis of variance tests were completed using overall satisfaction as the dependent variable and menu-item classification (ie, lower fat or regular) and one of eight other menu item and respondent characteristics as independent variables. Qualitative methods were used to analyze interview transcripts. Of 1,127 menu items rated for satisfaction, 205 were lower fat, 878 were regular, and 44 were of unknown classification. Customers were significantly more satisfied with lower-fat than with regular menu items (P < .001). Overall satisfaction did not vary by any of the other independent variables. Interview results indicate the importance of restaurant during as an indulgent experience. High satisfaction with lower-fat menu items suggests that customers will support restaurant providing such choices. Dietitians can use these findings to encourage restaurateurs to include lower-fat choices on their menus, and to assure clients that their expectations of being indulged are not incompatible with these choices.
Validity and reliability of the TED-QOL: a new three-item questionnaire to assess quality of life in thyroid eye disease.

PubMed

Fayers, Tessa; Dolman, Peter J

2011-12-01

To develop and test a user-friendly questionnaire for rapidly assessing quality of life (QOL) in thyroid eye disease (TED). A three-item questionnaire, the TED-QOL, was designed and compared to the 16-item Graves Ophthalmopathy (GO)-QOL and the nine-item GO-Quality of Life Scale (QLS). 100 patients with TED were administered all three questionnaires on two occasions. Results were compared to clinical severity scores (Vision, Inflammation, Strabismus, Appearance (VISA) classification). Main outcomes were construct and criterion validity, test-retest reliability, duration, comprehension and completion rates. TED-QOL correlated strongly with the other questionnaires for corresponding items (Pearson correlation: appearance 0.71, 0.62; functioning 0.69, 0.66; overall QOL 0.53). Test-retest analysis demonstrated good reliability for all three questionnaires (intraclass correlations: TED-QOL 0.81, 0.74, 0.87; GO-QOL 0.81, 0.82; GO-QLS 0.74, 0.86, 0.67). TED-QOL was significantly faster to complete (1.6 min vs GO-QOL 3.1 min, GO-QLS 2.7 min, p<0.0001) and had a higher completion rate (100% vs GO-QOL 78%, GO-QLS 94%). There was only moderate correlation between items on all three questionnaires and VISA scores. The TED-QOL is rapid and easy to complete and analyse and has similar validity and reliability to longer questionnaires. All questionnaires showed only moderate correlation with disease severity, emphasising the discrepancy between objective and subjective assessments and the importance of measuring both.
Formative Assessment Probes: Mountaintop Fossil: A Puzzling Phenomenon

ERIC Educational Resources Information Center

Keeley, Page

2015-01-01

This column focuses on promoting learning through assessment. This month's issue describes using formative assessment probes to uncover several ways of thinking about the puzzling discovery of a marine fossil on top of a mountain.
Formative Assessment Probes: Is It Erosion or Weathering?

ERIC Educational Resources Information Center

Keeley, Page

2016-01-01

This column focuses on promoting learning through assessment. The formative assessment probe in this month's issue can be used as an initial elicitation before students are introduced to the formal concepts of weathering and erosion.
Using Data Mining to Predict K-12 Students' Performance on Large-Scale Assessment Items Related to Energy

ERIC Educational Resources Information Center

Liu, Xiufeng; Ruiz, Miguel E.

2008-01-01

This article reports a study on using data mining to predict K-12 students' competence levels on test items related to energy. Data sources are the 1995 Third International Mathematics and Science Study (TIMSS), 1999 TIMSS-Repeat, 2003 Trend in International Mathematics and Science Study (TIMSS), and the National Assessment of Educational…
Formative Assessment Probes: How Far Did It Go?

ERIC Educational Resources Information Center

Keeley, Page

2011-01-01

Assessment serves many purposes in the elementary classroom. Formative assessment, often called assessment for learning, is characterized by its primary purpose--promoting learning. It takes place both formally and informally, is embedded in various stages of an instructional cycle, informs the teacher about appropriate next steps for instruction,…
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Assessing nicotine dependence in adolescent E-cigarette users: The 4-item Patient-Reported Outcomes Measurement Information System (PROMIS) Nicotine Dependence Item Bank for electronic cigarettes.

PubMed

Morean, Meghan E; Krishnan-Sarin, Suchitra; S O'Malley, Stephanie

2018-04-26

Adolescent e-cigarette use (i.e., "vaping") likely confers risk for developing nicotine dependence. However, there have been no studies assessing e-cigarette nicotine dependence in youth. We evaluated the psychometric properties of the 4-item Patient-Reported Outcomes Measurement Information System Nicotine Dependence Item Bank for E-cigarettes (PROMIS-E) for assessing youth e-cigarette nicotine dependence and examined risk factors for experiencing stronger dependence symptoms. In 2017, 520 adolescent past-month e-cigarette users completed the PROMIS-E during a school-based survey (50.5% female, 84.8% White, 16.22[1.19] years old). Adolescents also reported on sex, grade, race, age at e-cigarette use onset, vaping frequency, nicotine e-liquid use, and past-month cigarette smoking. Analyses included conducting confirmatory factor analysis and examining the internal consistency of the PROMIS-E. Bivariate correlations and independent-samples t-tests were used to examine unadjusted relationships between e-cigarette nicotine dependence and the proposed risk factors. Regression models were run in which all potential risk factors were entered as simultaneous predictors of PROMIS-E scores. The single-factor structure of the PROMIS-E was confirmed and evidenced good internal consistency. Across models, larger PROMIS-E scores were associated with being in a higher grade, initiating e-cigarette use at an earlier age, vaping more frequently, using nicotine e-liquid (and higher nicotine concentrations), and smoking cigarettes. Adolescent e-cigarette users reported experiencing nicotine dependence, which was assessed using the psychometrically sound PROMIS-E. Experiencing stronger nicotine dependence symptoms was associated with characteristics that previously have been shown to confer risk for frequent vaping and tobacco cigarette dependence. Copyright © 2018 Elsevier B.V. All rights reserved.

Item response theory analysis of the Pain Self-Efficacy Questionnaire.

PubMed

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain
Development and testing of item response theory-based item banks and short forms for eye, skin and lung problems in sarcoidosis.

PubMed

Victorson, David E; Choi, Seung; Judson, Marc A; Cella, David

2014-05-01

Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool. After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches. From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids. Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.
[Item function analysis on the Quality of Life-Alzheimer's Disease(QOL-AD)Chinese version, based on the Item Response Theory(IRT)].

PubMed

Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei

2013-07-01

To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

PubMed

Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

2009-04-01

To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.
Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

PubMed

Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

2016-03-12

Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Technology-Enhanced Formative Assessment of Plant Identification

NASA Astrophysics Data System (ADS)

Conejo, Ricardo; Garcia-Viñas, Juan Ignacio; Gastón, Aitor; Barros, Beatriz

2016-04-01

Developing plant identification skills is an important part of the curriculum of any botany course in higher education. Frequent practice with dried and fresh plants is necessary to recognize the diversity of forms, states, and details that a species can present. We have developed a web-based assessment system for mobile devices that is able to pose appropriate questions according to the location of the student. A student's location can be obtained using the device position or by scanning a QR code attached to a dried plant sheet in a herbarium or to a fresh plant in an arboretum. The assessment questions are complemented with elaborated feedback that, according to the students' responses, provides indications of possible mistakes and correct answers. Three experiments were designed to measure the effectiveness of the formative assessment using dried and fresh plants. Three questionnaires were used to evaluate the system performance from the students' perspective. The results clearly indicate that formative assessment is objectively effective compared to traditional methods and that the students' attitudes towards the system were very positive.
Does Computer-Aided Formative Assessment Improve Learning Outcomes?

ERIC Educational Resources Information Center

Hannah, John; James, Alex; Williams, Phillipa

2014-01-01

Two first-year engineering mathematics courses used computer-aided assessment (CAA) to provide students with opportunities for formative assessment via a series of weekly quizzes. Most students used the assessment until they achieved very high (>90%) quiz scores. Although there is a positive correlation between these quiz marks and the final…
Gender Invariance of the Gambling Behavior Scale for Adolescents (GBS-A): An Analysis of Differential Item Functioning Using Item Response Theory.

PubMed

Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina

2017-01-01

As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
Assessing Hopelessness in Terminally Ill Cancer Patients: Development of the Hopelessness Assessment in Illness Questionnaire

PubMed Central

Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William

2013-01-01

Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
The Role of Item Models in Automatic Item Generation

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2012-01-01

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Meta-analytic guidelines for evaluating single-item reliabilities of personality instruments.

PubMed

Spörrle, Matthias; Bekk, Magdalena

2014-06-01

Personality is an important predictor of various outcomes in many social science disciplines. However, when personality traits are not the principal focus of research, for example, in global comparative surveys, it is often not possible to assess them extensively. In this article, we first provide an overview of the advantages and challenges of single-item measures of personality, a rationale for their construction, and a summary of alternative ways of assessing their reliability. Second, using seven diverse samples (Ntotal = 4,263) we develop the SIMP-G, the German adaptation of the Single-Item Measures of Personality, an instrument assessing the Big Five with one item per trait, and evaluate its validity and reliability. Third, we integrate previous research and our data into a first meta-analysis of single-item reliabilities of personality measures, and provide researchers with guidelines and recommendations for the evaluation of single-item reliabilities. © The Author(s) 2013.
The Development of Practical Item Analysis Program for Indonesian Teachers

ERIC Educational Resources Information Center

Muhson, Ali; Lestari, Barkah; Supriyanto; Baroroh, Kiromim

2017-01-01

Item analysis has essential roles in the learning assessment. The item analysis program is designed to measure student achievement and instructional effectiveness. This study was aimed to develop item-analysis program and verify its feasibility. This study uses a Research and Development (R & D) model. The procedure includes designing and…
Web-Based Quiz-Game-Like Formative Assessment: Development and Evaluation

ERIC Educational Resources Information Center

Wang, Tzu-Hua

2008-01-01

This research aims to develop a multiple-choice Web-based quiz-game-like formative assessment system, named GAM-WATA. The unique design of "Ask-Hint Strategy" turns the Web-based formative assessment into an online quiz game. "Ask-Hint Strategy" is composed of "Prune Strategy" and "Call-in Strategy".…
On Formative Assessment in Math: How Diagnostic Questions Can Help

ERIC Educational Resources Information Center

Barton, Craig

2018-01-01

In this article, the author asserts that asking and responding to diagnostic questions is the single most important part of teaching secondary school mathematics. He notes the importance of formative assessment and recommends a formative assessment strategy that requires students to be public about their answers to questions, displaying their…
How Does Student Performance on Formative Assessments Relate to Learning Assessed by Exams?

ERIC Educational Resources Information Center

Smith, Gary

2007-01-01

A retrospective analysis examines the relationships between formative assessments and exam grades in two undergraduate geoscience courses. Pair and group-work grades correlate weakly with individual exam grades. Exam performance correlates to individual, weekly online assessments. Student attendance and use of assessment feedback are also…
Item Information and Discrimination Functions for Trinary PCM Items.

ERIC Educational Resources Information Center

Akkermans, Wies; Muraki, Eiji

1997-01-01

For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)
Using case method to explicitly teach formative assessment in preservice teacher science education

NASA Astrophysics Data System (ADS)

Bentz, Amy Elizabeth

The process of formative assessment improves student understanding; however, the topic of formative assessment in preservice education has been severely neglected. Since a major goal of teacher education is to create reflective teaching professionals, preservice teachers should be provided an opportunity to critically reflect on the use of formative assessment in the classroom. Case method is an instructional methodology that allows learners to engage in and reflect on real-world situations. Case based pedagogy can play an important role in enhancing preservice teachers' ability to reflect on teaching and learning by encouraging alternative ways of thinking about assessment. Although the literature on formative assessment and case methodology are extensive, using case method to explore the formative assessment process is, at best, sparse. The purpose of this study is to answer the following research questions: To what extent does the implementation of formative assessment cases in methods instruction influence preservice elementary science teachers' knowledge of formative assessment? What descriptive characteristics change between the preservice teachers' pre-case and post-case written reflection that would demonstrate learning had occurred? To investigate these questions, preservice teachers in an elementary methods course were asked to reflect on and discuss five cases. Pre/post-case data was analyzed. Results indicate that the preservice teachers modified their ideas to reflect the themes that were represented within the cases and modified their reflections to include specific ideas or examples taken directly from the case discussions. Comparing pre- and post-case reflections, the data supports a noted change in how the preservice teachers interpreted the case content. The preservice teachers began to evaluate the case content, question the lack of formative assessment concepts and strategies within the case, and apply formative assessment concepts and
Universal Design and Multimethod Approaches to Item Review

ERIC Educational Resources Information Center

Johnstone, Christopher J.; Thompson, Sandra J.; Bottsford-Miller, Nicole A.; Thurlow, Martha L.

2008-01-01

Test items undergo multiple iterations of review before states and vendors deem them acceptable to be placed in a live statewide assessment. This article reviews three approaches that can add validity evidence to states' item review processes. The first process is a structured sensitivity review process that focuses on universal design…
Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

ERIC Educational Resources Information Center

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee

2017-01-01

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…

Preliminary Investigation of a Video-Based Stimulus Preference Assessment

ERIC Educational Resources Information Center

Snyder, Katie; Higbee, Thomas S.; Dayton, Elizabeth

2012-01-01

Video clips may be an effective format for presenting complex stimuli in preference assessments. In this preliminary study, we evaluated the correspondence between preference hierarchies generated from preference assessments that included either toys or videos of the toys. The top-ranked item corresponded in both assessments for 5 of the 6…
Revisiting the Impact of Formative Assessment Opportunities on Student Learning

ERIC Educational Resources Information Center

Peat, Mary; Franklin, Sue; Devlin, Marcia; Charles, Margaret

2005-01-01

This project developed as a result of some inconclusive data from an investigation of whether a relationship existed between the use of formative assessment opportunities and performance, as measured by final grade. We were expecting to show our colleagues and students that use of formative assessment resources had the potential to improve…
Psychosocial consequences of cancer cachexia: the development of an item bank.

PubMed

Häne, Hanspeter; Oberholzer, Rolf; Walker, Jochen; Hopkinson, Jane B; de Wolf-Linder, Susanne; Strasser, Florian

2013-12-01

Cancer cachexia syndrome (CCS) is often accompanied by psychosocial consequences (PSC). To alleviate PSC, a systematic assessment method is required. Currently, few assessment tools are available (e.g., Functional Assessment of Anorexia/Cachexia Therapy). There is no systematic assessment tool that captures the PSC of CCS. To develop a pilot item bank to assess the PSC of CCS. A total of 132 questions, generated from patient answers in a previous study, were reduced to 121 items by content analysis and evaluation by multidisciplinary experts (doctor, nutritionists, and nurses). In our two-step, cross-sectional study, patients, judged by staff to have PSC of CCS, were included, and the questions were randomly allocated to the patients. Questions were evaluated for understandability and triggering emotions, and patients were asked to provide a response using a four-point Likert scale. Subsequently, problematic questions were revised, reformulated, and retested. A total of 20 patients with a variety of tumor types participated. Of the 121 questions, 31 had to be reformulated after Step 1 and were retested in Step 2, after which seven were again evaluated as not being perfectly comprehensible. In Step 1, 22 questions were found to trigger emotions, but no item required remodeling. Item performance using the Likert scale revealed no consistent floor or ceiling effects. Our final pilot question bank comprised 117 questions. The final item bank contains questions that are understood and accepted by the patients. This item bank now needs to be developed into a measurement tool that groups items into domains and can be used in future research studies. Copyright © 2013 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities

PubMed Central

Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.

2017-01-01

Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.

PubMed

Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M

2016-09-01

The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Diagnostic Utility of Craving in Predicting Nicotine Dependence: Impact of Craving Content and Item Stability

PubMed Central

2013-01-01

Introduction: Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Methods: Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Results: Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Conclusions: Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed. PMID:23817585
Diagnostic utility of craving in predicting nicotine dependence: impact of craving content and item stability.

PubMed

Germeroth, Lisa J; Wray, Jennifer M; Gass, Julie C; Tiffany, Stephen T

2013-12-01

Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed.
Item selection via Bayesian IRT models.

PubMed

Arima, Serena

2015-02-10

With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Targeting Instruction with Formative Assessment Probes

ERIC Educational Resources Information Center

Fagan, Emily R.; Tobey, Cheryl Rose; Brodesky, Amy R.

2016-01-01

This article introduces the formative assessment probe--a powerful tool for collecting focused, actionable information about student thinking and potential misconceptions--along with a process for targeting instruction in response to probe results. Drawing on research about common student mathematical misconceptions as well as the former work of…
Methods for Assessing Item, Step, and Threshold Invariance in Polytomous Items Following the Partial Credit Model

ERIC Educational Resources Information Center

Penfield, Randall D.; Myers, Nicholas D.; Wolfe, Edward W.

2008-01-01

Measurement invariance in the partial credit model (PCM) can be conceptualized in several different but compatible ways. In this article the authors distinguish between three forms of measurement invariance in the PCM: step invariance, item invariance, and threshold invariance. Approaches for modeling these three forms of invariance are proposed,…
Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items

ERIC Educational Resources Information Center

Chen, Cheng-Te; Wang, Wen-Chung

2007-01-01

This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…
Instrument Formatting with Computer Data Entry in Mind.

ERIC Educational Resources Information Center

Boser, Judith A.; And Others

Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
Teacher Effectiveness in the Formative Use of a Mathematical Assessment

ERIC Educational Resources Information Center

Collette, Lisa Audrey

2012-01-01

In current research literature, formative assessment has been identified as having the potential for tracking student progress to ensure high-stakes test preparedness. Formative assessment has several shades of meaning. This study defines it as an ongoing process that utilizes all of the moment-by-moment day-by-day pieces of data that can be…
Some Issues in Item Response Theory: Dimensionality Assessment and Models for Guessing

ERIC Educational Resources Information Center

Smith, Jessalyn

2009-01-01

Currently, standardized tests are widely used as a method to measure how well schools and students meet academic standards. As a result, measurement issues have become an increasingly popular topic of study. Unidimensional item response models are used to model latent abilities and specific item characteristics. This class of models makes…
Assessing reprogramming by chimera formation and tetraploid complementation.

PubMed

Li, Xin; Xia, Bao-long; Li, Wei; Zhou, Qi

2015-01-01

Pluripotent stem cells can be evaluated by pluripotent markers expression, embryoid body aggregation, teratoma formation, chimera contribution and even more, tetraploid complementation. Whether iPS cells in general are functionally equivalent to normal ESCs is difficult to establish. Here, we present the detailed procedure for chimera formation and tetraploid complementation, the most stringent criterion, to assessing pluripotency.
Encouraging formative assessments of leadership for foundation doctors.

PubMed

Hadley, Lindsay; Black, David; Welch, Jan; Reynolds, Peter; Penlington, Clare

2015-08-01

Clinical leadership is considered essential for maintaining and improving patient care and safety in the UK, and is incorporated in the curriculum for all trainee doctors. Despite the growing focus on the importance of leadership, and the introduction of the Medical Leadership Competency Framework (MLCF) in the UK, leadership education for doctors in training is still in its infancy. Assessment is focused on clinical skills, and trainee doctors receive very little formal feedback on their leadership competencies. In this article we describe the approach taken by Health Education Kent, Sussex and Surrey (HEKSS) to raise the profile of leadership amongst doctors in training in the South Thames Foundation School (STFS). An annual structured formative assessment in leadership for each trainee has been introduced, supported by leadership education for both trainees and their supervisors in HEKSS trusts. We analysed over 500 of these assessments from the academic year 2012/13 for foundation doctors in HEKSS trusts, in order to assess the quality of the feedback. From the analysis, potential indicators of more effective formative assessments were identified. These may be helpful in improving the leadership education programme for future years. There is a wealth of evidence to highlight the importance and value of formative assessments; however, particularly for foundation doctors, these have typically been focused on assessing clinical capabilities. This HEKSS initiative encourages doctors to recognise leadership opportunities at the beginning of their careers, seeks to help them understand the importance of acquiring leadership skills and provides structured feedback to help them improve. Leadership education for doctors in training is still in its infancy. © 2015 John Wiley & Sons Ltd.
Perceptions and attitudes of formative assessments in middle-school science classes

NASA Astrophysics Data System (ADS)

Chauncey, Penny Denyse

No Child Left Behind mandates utilizing summative assessment to measure schools' effectiveness. The problem is that summative assessment measures students' knowledge without depth of understanding. The goal of public education, however, is to prepare students to think critically at higher levels. The purpose of this study was to examine any difference between formative assessment incorporated in instruction as opposed to the usual, more summative methods in terms of attitudes and academic achievement of middle-school science students. Maslow's theory emphasizes that individuals must have basic needs met before they can advance to higher levels. Formative assessment enables students to master one level at a time. The research questions focused on whether statistically significant differences existed between classrooms using these two types of assessments on academic tests and an attitude survey. Using a quantitative quasi-experimental control-group design, data were obtained from a sample of 430 middle-school science students in 6 classes. One control and 2 experimental classes were assigned to each teacher. Results of the independent t tests revealed academic achievement was significantly greater for groups that utilized formative assessment. No significant difference in attitudes was noted. Recommendations include incorporating formative assessment results with the summative results. Findings from this study could contribute to positive social change by prompting educational stakeholders to examine local and state policies on curriculum as well as funding based on summative scores alone. Use of formative assessment can lead to improved academic success.
Tale of Two Science Teachers' Formative Assessment Practices in a Similar School Environment

ERIC Educational Resources Information Center

Sathasivam, Renuka V.; Daniel, Esther G. S.

2016-01-01

There is an accumulating research base that supports the effectiveness of formative assessment practices in enhancing the quality of educational outcomes, yet research findings seem to indicate sluggish implementation of these formative assessment strategies in the classrooms. Many factors influence teachers' formative assessment practices…
Consumer Skills Items. A Collection of Consumer Skills Items for State and Local Education Agencies to Draw upon in Custom-Building Their Own Consumer Skills Instruments.

ERIC Educational Resources Information Center

Education Commission of the States, Denver, CO.

This is a collection of consumer skills items for state and local education agencies to draw upon in composing consumer skills instruments. It provides items to assess seventeen-year-olds' consumer skills. The booklet contains items classified under eight major topics: behavior, contracts, economics, energy, finances, mathematics, projection, and…
The Impact of Presentation Format on Younger and Older Adults' Self-Regulated Learning.

PubMed

Price, Jodi

2017-01-01

Background/Study Context: Self-regulated learning involves deciding what to study and for how long. Debate surrounds whether individuals' selections are influenced more by item complexity, point values, or if instead people select in a left-to-right reading order, ignoring item complexity and value. The present study manipulated whether point values and presentation format favored selection of simple or complex Chinese-English pairs to assess the impact on younger and older adults' selection behaviors. One hundred and five younger (M age = 20.26, SD = 2.38) and 102 older adults (M age = 70.28, SD = 6.37) participated in the experiment. Participants studied four different 3 × 3 grids (two per trial), each containing three simple, three medium, and three complex Chinese-English vocabulary pairs presented in either a simple-first or complex-first order, depending on condition. Point values were assigned in either a 2-4-8 or 8-4-2 order so that either simple or complex items were favored. Points did not influence the order in which either age group selected items, whereas presentation format did. Younger and older adults selected more simple or complex items when they appeared in the first column. However, older adults selected and allocated more time to simpler items but recalled less overall than did younger adults. Memory beliefs and working memory capacity predicted study time allocation, but not item selection, behaviors. Presentation format must be considered when evaluating which theory of self-regulated learning best accounts for younger and older adults' study behaviors and whether there are age-related differences in self-regulated learning. The results of the present study combine with others to support the importance of also considering the role of external factors (e.g., working memory capacity and memory beliefs) in each age group's self-regulated learning decisions.

Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks

PubMed Central

Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.

2013-01-01

Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
SUPPORTING TEACHERS IN IMPLEMENTING FORMATIVE ASSESSMENT PRACTICES IN EARTH SYSTEMS SCIENCE

NASA Astrophysics Data System (ADS)

Harris, C. J.; Penuel, W. R.; Haydel Debarger, A.; Blank, J. G.

2009-12-01

An important purpose of formative assessment is to elicit student thinking to use in instruction to help all students learn and inform next steps in teaching. However, formative assessment practices are difficult to implement and thus present a formidable challenge for many science teachers. A critical need in geoscience education is a framework for providing teachers with real-time assessment tools as well as professional development to learn how to use formative assessment to improve instruction. Here, we describe a comprehensive support system, developed for our NSF-funded Contingent Pedagogies project, for addressing the challenge of helping teachers to use formative assessment to enhance student learning in middle school Earth Systems science. Our support system is designed to improve student understanding about the geosphere by integrating classroom network technology, interactive formative assessments, and contingent curricular activities to guide teachers from formative assessment to instructional decision-making and improved student learning. To accomplish this, we are using a new classroom network technology, Group Scribbles, in the context of an innovative middle-grades Earth Science curriculum called Investigating Earth Systems (IES). Group Scribbles, developed at SRI International, is a collaborative software tool that allows individual students to compose “scribbles” (i.e., drawings and notes), on “post-it” notes in a private workspace (a notebook computer) in response to a public task. They can post these notes anonymously to a shared, public workspace (a teacher-controlled large screen monitor) that becomes the centerpiece of group and class discussion. To help teachers implement formative assessment practices, we have introduced a key resource, called a teaching routine, to help teachers take advantage of Group Scribbles for more interactive assessments. Routine refers to a sequence of repeatable interactions that, over time, become
Revisiting the role of recollection in item versus forced-choice recognition memory.

PubMed

Cook, Gabriel I; Marsh, Richard L; Hicks, Jason L

2005-08-01

Many memory theorists have assumed that forced-choice recognition tests can rely more on familiarity, whereas item (yes-no) tests must rely more on recollection. In actuality, several studies have found no differences in the contributions of recollection and familiarity underlying the two different test formats. Using word frequency to manipulate stimulus characteristics, the present study demonstrated that the contributions of recollection to item versus forced-choice tests is variable. Low word frequency resulted in significantly more recollection in an item test than did a forced-choice procedure, but high word frequency produced the opposite result. These results clearly constrain any uniform claim about the degree to which recollection supports responding in item versus forced-choice tests.
Item Response Theory at Subject- and Group-Level. Research Report 90-1.

ERIC Educational Resources Information Center

Tobi, Hilde

This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…
Assessing the Equivalence of Paper, Mobile Phone, and Tablet Survey Responses at a Community Mental Health Center Using Equivalent Halves of a 'Gold-Standard' Depression Item Bank.

PubMed

Brodey, Benjamin B; Gonzalez, Nicole L; Elkin, Kathryn Ann; Sasiela, W Jordan; Brodey, Inger S

2017-09-06

The computerized administration of self-report psychiatric diagnostic and outcomes assessments has risen in popularity. If results are similar enough across different administration modalities, then new administration technologies can be used interchangeably and the choice of technology can be based on other factors, such as convenience in the study design. An assessment based on item response theory (IRT), such as the Patient-Reported Outcomes Measurement Information System (PROMIS) depression item bank, offers new possibilities for assessing the effect of technology choice upon results. To create equivalent halves of the PROMIS depression item bank and to use these halves to compare survey responses and user satisfaction among administration modalities-paper, mobile phone, or tablet-with a community mental health care population. The 28 PROMIS depression items were divided into 2 halves based on content and simulations with an established PROMIS response data set. A total of 129 participants were recruited from an outpatient public sector mental health clinic based in Memphis. All participants took both nonoverlapping halves of the PROMIS IRT-based depression items (Part A and Part B): once using paper and pencil, and once using either a mobile phone or tablet. An 8-cell randomization was done on technology used, order of technologies used, and order of PROMIS Parts A and B. Both Parts A and B were administered as fixed-length assessments and both were scored using published PROMIS IRT parameters and algorithms. All 129 participants received either Part A or B via paper assessment. Participants were also administered the opposite assessment, 63 using a mobile phone and 66 using a tablet. There was no significant difference in item response scores for Part A versus B. All 3 of the technologies yielded essentially identical assessment results and equivalent satisfaction levels. Our findings show that the PROMIS depression assessment can be divided into 2 equivalent
Teacher Candidates Exposure to Formative Assessment in Educational Psychology Textbooks: A Content Analysis

ERIC Educational Resources Information Center

Wininger, Steven R.; Norman, Antony D.

2005-01-01

The purpose of this article is to define formative assessment, outline what is known about the prevalence of formative assessment implementation in the classroom, establish the importance of formative assessment with regards to student motivation and achievement, and present the results of a content analysis of current educational psychology…
The Many Faces of Formative Assessment

ERIC Educational Resources Information Center

Stull, Judith; Varnum, Susan Janse; Ducette, Joseph; Schiller, John

2011-01-01

In this research paper we consider formative assessment (FA) and discuss ways in which it has been implemented in four different university courses. We illustrate the different aspects of FA by deconstructing it and then demonstrating effectiveness in improving both teaching and student achievement. It appears that specifically "what is done" was…
Developing and investigating the use of single-item measures in organizational research.

PubMed

Fisher, Gwenith G; Matthews, Russell A; Gibbons, Alyssa Mitchell

2016-01-01

The validity of organizational research relies on strong research methods, which include effective measurement of psychological constructs. The general consensus is that multiple item measures have better psychometric properties than single-item measures. However, due to practical constraints (e.g., survey length, respondent burden) there are situations in which certain single items may be useful for capturing information about constructs that might otherwise go unmeasured. We evaluated 37 items, including 18 newly developed items as well as 19 single items selected from existing multiple-item scales based on psychometric characteristics, to assess 18 constructs frequently measured in organizational and occupational health psychology research. We examined evidence of reliability; convergent, discriminant, and content validity assessments; and test-retest reliabilities at 1- and 3-month time lags for single-item measures using a multistage and multisource validation strategy across 3 studies, including data from N = 17 occupational health subject matter experts and N = 1,634 survey respondents across 2 samples. Items selected from existing scales generally demonstrated better internal consistency reliability and convergent validity, whereas these particular new items generally had higher levels of content validity. We offer recommendations regarding when use of single items may be more or less appropriate, as well as 11 items that seem acceptable, 14 items with mixed results that might be used with caution due to mixed results, and 12 items we do not recommend using as single-item measures. Although multiple-item measures are preferable from a psychometric standpoint, in some circumstances single-item measures can provide useful information. (c) 2016 APA, all rights reserved).
Home Science Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Smith, Jan, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
How item banks and their application can influence measurement practice in rehabilitation medicine: a PROMIS fatigue item bank example.

PubMed

Lai, Jin-Shei; Cella, David; Choi, Seung; Junghaenel, Doerte U; Christodoulou, Christopher; Gershon, Richard; Stone, Arthur

2011-10-01

To illustrate how measurement practices can be advanced by using as an example the fatigue item bank (FIB) and its applications (short forms and computerized adaptive testing [CAT]) that were developed through the National Institutes of Health Patient Reported Outcomes Measurement Information System (PROMIS) Cooperative Group. Psychometric analysis of data collected by an Internet survey company using item response theory-related techniques. A U.S. general population representative sample collected through the Internet. Respondents used for dimensionality evaluation of the PROMIS FIB (N=603) and item calibrations (N=14,931). Not applicable. Fatigue items (112) developed by the PROMIS fatigue domain working group, 13-item Functional Assessment of Chronic Illness Therapy-Fatigue, and 4-item Medical Outcomes Study 36-Item Short Form Health Survey Vitality scale. The PROMIS FIB version 1, which consists of 95 items, showed acceptable psychometric properties. CAT showed consistently better precision than short forms. However, all 3 short forms showed good precision for most participants in that more than 95% of the sample could be measured precisely with reliability greater than 0.9. Measurement practice can be advanced by using a psychometrically sound measurement tool and its applications. This example shows that CAT and short forms derived from the PROMIS FIB can reliably estimate fatigue reported by the U.S. general population. Evaluation in clinical populations is warranted before the item bank can be used for clinical trials. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Integrating formative assessment and participatory research: Building healthier communities in the CHILE Project

PubMed Central

Sussman, Andrew L.; Davis, Sally

2013-01-01

Background The need to conduct formative assessment to inform the development of interventional studies has been increasingly recognized in community-based health research. While this purpose alone may provide sufficient justification to conduct formative assessment, researchers are also recognizing the importance of such efforts with regard to partnership building. Purpose This article reports a formative assessment process in a large scale randomized controlled trial in New Mexico aimed at preventing obesity in rural American Indian and Hispanic children in Head Start programs. Methods We interviewed Head Start staff and conducted observations to understand the context of food service and physical activity in these sites. We also collected data from other community partners, including grocery store managers and primary care providers, to assess appropriate strategies regarding their engagement in the study. Results Formative assessment findings helped modify the planned intervention while allowing for variation relevant to cultural and Head Start organizational conditions in each community. Rather than view formative assessment only as a planning phase of the research, our experience illustrates the need to conceptualize these activities more broadly. Discussion Integrating formative assessment and participatory research raises the need to address the challenge of ensuring standardization and consistency across varied community settings, the evolving nature of initial formative relationships and the need to build trust in academic/community partnerships. Translation to Health Education Practice In our work with American Indian and Hispanic communities in New Mexico, formative assessment represents a partnership building opportunity. PMID:23745177
Development of an item bank and computer adaptive test for role functioning.

PubMed

Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B

2012-11-01

Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
Non-ignorable missingness item response theory models for choice effects in examinee-selected items.

PubMed

Liu, Chen-Wei; Wang, Wen-Chung

2017-11-01

Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
Rats Remember Items in Context Using Episodic Memory.

PubMed

Panoz-Brown, Danielle; Corbin, Hannah E; Dalecki, Stefan J; Gentry, Meredith; Brotheridge, Sydney; Sluka, Christina M; Wu, Jie-En; Crystal, Jonathon D

2016-10-24

Vivid episodic memories in people have been characterized as the replay of unique events in sequential order [1-3]. Animal models of episodic memory have successfully documented episodic memory of a single event (e.g., [4-8]). However, a fundamental feature of episodic memory in people is that it involves multiple events, and notably, episodic memory impairments in human diseases are not limited to a single event. Critically, it is not known whether animals remember many unique events using episodic memory. Here, we show that rats remember many unique events and the contexts in which the events occurred using episodic memory. We used an olfactory memory assessment in which new (but not old) odors were rewarded using 32 items. Rats were presented with 16 odors in one context and the same odors in a second context. To attain high accuracy, the rats needed to remember item in context because each odor was rewarded as a new item in each context. The demands on item-in-context memory were varied by assessing memory with 2, 3, 5, or 15 unpredictable transitions between contexts, and item-in-context memory survived a 45 min retention interval challenge. When the memory of item in context was put in conflict with non-episodic familiarity cues, rats relied on item in context using episodic memory. Our findings suggest that rats remember multiple unique events and the contexts in which these events occurred using episodic memory and support the view that rats may be used to model fundamental aspects of human cognition. Copyright © 2016 Elsevier Ltd. All rights reserved.
41 CFR 302-7.21 - If my HHG shipment includes an item for which a weight additive is assessed by the HHG carrier (e...

Code of Federal Regulations, 2014 CFR

2014-07-01

... includes an item for which a weight additive is assessed by the HHG carrier (e.g., boat, trailer... BAGGAGE ALLOWANCE General Rules § 302-7.21 If my HHG shipment includes an item for which a weight additive... payment? (a) No, you will not be responsible for the shipping charges that result from a weight additive...
Integrated online formative assessments in the biomedical sciences for medical students: benefits for learning.

PubMed

Velan, Gary M; Jones, Philip; McNeil, H Patrick; Kumar, Rakesh K

2008-11-25

Online formative assessments have a sound theoretical basis, and are prevalent and popular in higher education settings, but data to establish their educational benefits are lacking. This study attempts to determine whether participation and performance in integrated online formative assessments in the biomedical sciences has measurable effects on learning by junior medical students. Students enrolled in Phase 1 (Years 1 and 2) of an undergraduate Medicine program were studied over two consecutive years, 2006 and 2007. In seven consecutive courses, end-of-course (EOC) summative examination marks were analysed with respect to the effect of participation and performance in voluntary online formative assessments. Online evaluation surveys were utilized to gather students' perceptions regarding online formative assessments. Students rated online assessments highly on all measures. Participation in formative assessments had a statistically significant positive relationship with EOC marks in all courses. The mean difference in EOC marks for those who participated in formative assessments ranged from 6.3% (95% confidence intervals 1.6 to 11.0; p = 0.009) in Course 5 to 3.2% (0.2 to 6.2; p = 0.037) in Course 2. For all courses, performance in formative assessments correlated significantly with EOC marks (p < 0.001 for each course). The variance in EOC marks that could be explained by performance in the formative assessments ranged from 21.8% in Course 6 to 4.1% in Course 7. The results support the contention that well designed formative assessments can have significant positive effects on learning. There is untapped potential for use of formative assessments to assist learning by medical students and postgraduate medical trainees.
Dealing with Item Nonresponse in Large-Scale Cognitive Assessments: The Impact of Missing Data Methods on Estimated Explanatory Relationships

ERIC Educational Resources Information Center

Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H.

2017-01-01

Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…
"Formative Good, Summative Bad?"--A Review of the Dichotomy in Assessment Literature

ERIC Educational Resources Information Center

Lau, Alice Man Sze

2016-01-01

The debate between summative and formative assessment is creating a situation that increasingly calls to mind the famous slogan in George Orwell's (1945) "Animal Farm"--"Four legs good, two legs bad". Formative assessment is increasingly being portrayed in the literature as "good" assessment, which tutors should…
Dimensionality Assessment for Dichotomously Scored Items Using Multidimensional Scaling.

ERIC Educational Resources Information Center

Jones, Patricia B.; And Others

In order to determine the effectiveness of multidimensional scaling (MDS) in recovering the dimensionality of a set of dichotomously-scored items, data were simulated in one, two, and three dimensions for a variety of correlations with the underlying latent trait. Similarity matrices were constructed from these data using three margin-sensitive…
Does Delivery Format Make a Difference in Learning about Global and Cultural Understanding?

ERIC Educational Resources Information Center

Rawls, Janita; Hammons, Stacy A.

2016-01-01

This study assessed a learning outcome for nontraditional seniors who were in accelerated degree programs in both online and on-site formats. Using items from the National Survey of Student Engagement, researchers explored engagement with global understanding and cultural awareness. A quantitative, single-case analysis method was used to determine…

Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

ERIC Educational Resources Information Center

Kleinke, David J.

Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
Rasch analysis of the Italian Lower Extremity Functional Scale: insights on dimensionality and suggestions for an improved 15-item version.

PubMed

Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano

2017-04-01

To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
Comparing narrative and multiple-choice formats in online communication skill assessment.

PubMed

Kim, Sara; Spielberg, Freya; Mauksch, Larry; Farber, Stu; Duong, Cuong; Fitch, Wes; Greer, Tom

2009-06-01

We compared multiple-choice and open-ended responses collected from a web-based tool designated 'Case for Change', which had been developed for assessing and teaching medical students in the skills involved in integrating sexual risk assessment and behaviour change discussions into patient-centred primary care visits. A total of 111 Year 3 students completed the web-based tool. A series of videos from one patient encounter illustrated how a clinician uses patient-centred communication and health behaviour change skills while caring for a patient presenting with a urinary tract infection. Each video clip was followed by a request for students to respond in two ways to the question: 'What would you do next?' Firstly, students typed their statements of what they would say to the patient. Secondly, students selected from a multiple-choice list the statements that most closely resembled their free text entries. These two modes of students' answers were analysed and compared. When articulating what they would say to the patient in a narrative format, students frequently used doctor-centred approaches that focused on premature diagnostic questioning or neglected to elicit patient perspectives. Despite the instruction to select a matching statement from the multiple-choice list, students tended to choose the most exemplary patient-centred statement, which was contrary to the doctor-centred approaches reflected in their narrative responses. Open-ended questions facilitate in-depth understanding of students' educational needs, although the scoring of narrative responses is time-consuming. Multiple-choice questions allow efficient scoring and individualised feedback associated with question items but do not fully elicit students' thought processes.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

PubMed

Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

2017-03-01

Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Examination of the Assumptions and Properties of the Graded Item Response Model: An Example Using a Mathematics Performance Assessment.

ERIC Educational Resources Information Center

Lane, Suzanne; And Others

1995-01-01

Over 5,000 students participated in a study of the dimensionality and stability of the item parameter estimates of a mathematics performance assessment developed for the Quantitative Understanding: Amplifying Student Achievement and Reasoning (QUASAR) Project. Results demonstrate the test's dimensionality and illustrate ways to examine use of the…
The influence of formative assessments on student motivation, achievement, and conceptual change

NASA Astrophysics Data System (ADS)

Yin, Yue

2005-07-01

This study connected research on formative assessment, motivation, and conceptual change. In particular, it examined three research questions: (1) Can formative assessment improve students' motivational beliefs? (2) Can formative assessment improve students' achievement in science and bring about conceptual change? and (3) Are students' science achievement and conceptual change correlated with their motivational beliefs? Formative assessment in this study refers to assessments embedded in an inquiry-based curriculum. To answer those questions, a randomized experiment was conducted. One thousand and two 6th or 7th graders of 12 teachers in 12 different schools in six states participated in the study. The 12 teachers were matched in pairs and randomly assigned to the experimental and control group. The experimental group employed embedded formative assessments while teaching a science curriculum unit and the control group taught the same unit without formative assessments. All the students were given a motivation survey and one or more achievement tests at pre- and posttest. By comparing the experimental and control students' motivation and achievement scores at pretest and posttest, I examined whether the formative assessment treatment affected students' motivation, learning, and conceptual change. By correlating students' posttest motivation, achievement as well as conceptual change scores, I examined whether students' motivation was related to their achievement and conceptual change. Analyses indicated that, the embedded assessments used by the experimental group did not significantly influence students' motivation, achievement, or conceptual change compared to students in the control group. Most motivation beliefs were correlated with students' achievement in a way similar to what has been reported in the literature. They were not correlated with students' conceptual change scores as hypothesized. Teachers, as well as some contextual factors associated with
Individuals with knee impairments identify items in need of clarification in the Patient Reported Outcomes Measurement Information System (PROMIS®) pain interference and physical function item banks - a qualitative study.

PubMed

Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J

2016-05-11

The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
Investigating the Dynamics of Formative Assessment: Relationships between Teacher Knowledge, Assessment Practice and Learning

ERIC Educational Resources Information Center

Herman, Joan; Osmundson, Ellen; Dai, Yunyun; Ringstaff, Cathy; Timms, Michael

2015-01-01

This exploratory study of elementary school science examines questions central to policy, practice and research on formative assessment: What is the quality of teachers' content-pedagogical and assessment knowledge? What is the relationship between teacher knowledge and assessment practice? What is the relationship between teacher knowledge,…
A Third-Order Item Response Theory Model for Modeling the Effects of Domains and Subdomains in Large-Scale Educational Assessment Surveys

ERIC Educational Resources Information Center

Rijmen, Frank; Jeon, Minjeong; von Davier, Matthias; Rabe-Hesketh, Sophia

2014-01-01

Second-order item response theory models have been used for assessments consisting of several domains, such as content areas. We extend the second-order model to a third-order model for assessments that include subdomains nested in domains. Using a graphical model framework, it is shown how the model does not suffer from the curse of…
Using the Cumulative Common Log-Odds Ratio to Identify Differential Item Functioning of Rating Scale Items in the Exercise and Sport Sciences

ERIC Educational Resources Information Center

Penfield, Randall D.; Giacobbi, Peter R., Jr.; Myers, Nicholas D.

2007-01-01

One aspect of construct validity is the extent to which the measurement properties of a rating scale are invariant across the groups being compared. An increasingly used method for assessing between-group differences in the measurement properties of items of a scale is the framework of differential item functioning (DIF). In this paper we…
Development of the Oxford Participation and Activities Questionnaire: constructing an item pool

PubMed Central

Kelly, Laura; Jenkinson, Crispin; Dummett, Sarah; Dawson, Jill; Fitzpatrick, Ray; Morley, David

2015-01-01

Purpose The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF). The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Methods Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson’s disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13) were used to assess items for face and content validity. Results ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Conclusion Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and to assess its psychometric properties. The final instrument is intended for use in clinical trials and interventions targeted at maintaining or improving activity and participation. PMID:26056503
Development of the Oxford Participation and Activities Questionnaire: constructing an item pool.

PubMed

Kelly, Laura; Jenkinson, Crispin; Dummett, Sarah; Dawson, Jill; Fitzpatrick, Ray; Morley, David

2015-01-01

The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF). The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson's disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13) were used to assess items for face and content validity. ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and to assess its psychometric properties. The final instrument is intended for use in clinical trials and interventions targeted at maintaining or improving activity and participation.
Automated Item Generation with Recurrent Neural Networks.

PubMed

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Does Formative Assessment Improve Student Learning and Performance in Soil Science?

ERIC Educational Resources Information Center

Kopittke, Peter M.; Wehr, J. Bernhard; Menzies, Neal W.

2012-01-01

Soil science students are required to apply knowledge from a range of disciplines to unfamiliar scenarios to solve complex problems. To encourage deep learning (with student performance an indicator of learning), a formative assessment exercise was introduced to a second-year soil science subject. For the formative assessment exercise, students…
Exploring Formative Assessment Using Cultural Historical Activity Theory

ERIC Educational Resources Information Center

Asghar, Mandy

2013-01-01

Formative assessment is a pedagogic practice that has been the subject of much research and debate, as to how it can be used most effectively to deliver enhanced student learning in the higher education setting. Often described as a complex concept it embraces activities that range from facilitating students understanding of assessment standards,…
Assessment of Person Fit for Mixed-Format Tests

ERIC Educational Resources Information Center

Sinharay, Sandip

2015-01-01

Person-fit assessment may help the researcher to obtain additional information regarding the answering behavior of persons. Although several researchers examined person fit, there is a lack of research on person-fit assessment for mixed-format tests. In this article, the lz statistic and the ?2 statistic, both of which have been used for tests…
Developing Formative Assessments for Postgraduate Students in Engineering

ERIC Educational Resources Information Center

Burrow, Michael; Evdorides, Harry; Hallam, Barbara; Freer-Hewish, Richard

2005-01-01

This paper outlines an approach taken to produce computer-based formative assessments for two modules in a one-year taught MSc programme in Road Management and Engineering. It presents the aims of the assessments, the taxonomy adopted to ensure that the formulation of the questions addressed learning outcomes related to the development of higher…
Development and content validation of performance assessments for endoscopic third ventriculostomy.

PubMed

Breimer, Gerben E; Haji, Faizal A; Hoving, Eelco W; Drake, James M

2015-08-01

This study aims to develop and establish the content validity of multiple expert rating instruments to assess performance in endoscopic third ventriculostomy (ETV), collectively called the Neuro-Endoscopic Ventriculostomy Assessment Tool (NEVAT). The important aspects of ETV were identified through a review of current literature, ETV videos, and discussion with neurosurgeons, fellows, and residents. Three assessment measures were subsequently developed: a procedure-specific checklist (CL), a CL of surgical errors, and a global rating scale (GRS). Neurosurgeons from various countries, all identified as experts in ETV, were then invited to participate in a modified Delphi survey to establish the content validity of these instruments. In each Delphi round, experts rated their agreement including each procedural step, error, and GRS item in the respective instruments on a 5-point Likert scale. Seventeen experts agreed to participate in the study and completed all Delphi rounds. After item generation, a total of 27 procedural CL items, 26 error CL items, and 9 GRS items were posed to Delphi panelists for rating. An additional 17 procedural CL items, 12 error CL items, and 1 GRS item were added by panelists. After three rounds, strong consensus (>80% agreement) was achieved on 35 procedural CL items, 29 error CL items, and 10 GRS items. Moderate consensus (50-80% agreement) was achieved on an additional 7 procedural CL items and 1 error CL item. The final procedural and error checklist contained 42 and 30 items, respectively (divided into setup, exposure, navigation, ventriculostomy, and closure). The final GRS contained 10 items. We have established the content validity of three ETV assessment measures by iterative consensus of an international expert panel. Each measure provides unique assessment information and thus can be used individually or in combination, depending on the characteristics of the learner and the purpose of the assessment. These instruments must now
Teachers and Testing: An Investigation into Teachers' Perceptions of Formative Assessment

ERIC Educational Resources Information Center

Sach, Elizabeth

2012-01-01

Research conducted within the past decade contributes much to an understanding of the role and potential value of formative assessment in learning. As an Advisory Teacher within a local authority, the researcher was interested to find out how teachers actually perceive formative assessment. This study therefore set out to investigate the range and…
A Critical Analysis of the Design and Implementation of Formative Assessment

ERIC Educational Resources Information Center

Green, Rhiannon

2017-01-01

The objectives of this article are to critically analyse the impact of formative and summative assessment in an informal secondary school environment. The informality reflects the work of The Anne Frank Trust UK and their practices in evaluating student progress through a two-week workshop programme. The preference for formative assessment is…

Semiparametric Item Response Functions in the Context of Guessing

ERIC Educational Resources Information Center

Falk, Carl F.; Cai, Li

2016-01-01

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Preliminary development of an ultrabrief two-item bedside test for delirium.

PubMed

Fick, Donna M; Inouye, Sharon K; Guess, Jamey; Ngo, Long H; Jones, Richard N; Saczynski, Jane S; Marcantonio, Edward R

2015-10-01

Delirium is common, morbid, and costly, yet is greatly under-recognized among hospitalized older adults. To identify the best single and pair of mental status test items that predict the presence of delirium. Diagnostic test evaluation study that enrolled medicine inpatients aged 75 years or older at an academic medical center. Patients underwent a clinical reference standard assessment involving a patient interview, medical record review, and interviews with family members and nurses to determine the presence or absence of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition defined delirium. Participants also underwent the three-dimensional Confusion Assessment Method (3D-CAM), a brief, validated assessment for delirium. Individual items and pairs of items from the 3D-CAM were evaluated to determine sensitivity and specificity relative to the reference standard delirium diagnosis. Of the 201 participants (mean age 84 years, 62% female), 42 (21%) had delirium based on the clinical reference standard. The single item with the best test characteristics was "months of the year backwards" with a sensitivity of 83% (95% confidence interval [CI]: 69%-93%) and specificity of 69% (95% CI: 61%-76%). The best 2-item screen was the combination of "months of the year backwards" and "what is the day of the week?" with a sensitivity of 93% (95% CI: 81%-99%) and specificity of 64% (95% CI: 56%-70%). We identified a single item with >80% and pair of items with >90% sensitivity for delirium. If validated prospectively, these items will serve as an initial innovative screening step for delirium identification in hospitalized older adults. © 2015 Society of Hospital Medicine.
Providing Formative Feedback From a Summative Computer-aided Assessment

PubMed Central

Sewell, Robert D. E.

2007-01-01

Objectives To examine the effectiveness of providing formative feedback for summative computer-aided assessment. Design Two groups of first-year undergraduate life science students in pharmacy and neuroscience who were studying an e-learning package in a common pharmacology module were presented with a computer-based summative assessment. A sheet with individualized feedback derived from each of the 5 results sections of the assessment was provided to each student. Students were asked via a questionnaire to evaluate the form and method of feedback. Assessment The students were able to reflect on their performance and use the feedback provided to guide their future study or revision. There was no significant difference between the responses from pharmacy and neuroscience students. Students' responses on the questionnaire indicated a generally positive reaction to this form of feedback. Conclusions Findings suggest that additional formative assessment conveyed by this style and method would be appreciated and valued by students. PMID:17533442
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

ERIC Educational Resources Information Center

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Students' Assessment Preferences and Approaches to Learning: Can Formative Assessment Make a Difference?

ERIC Educational Resources Information Center

Gijbels, David; Dochy, Filip

2006-01-01

The purpose of this paper is to gain insight into the relationships between hands-on experiences with formative assessment, students' assessment preferences and their approaches to learning. The sample consisted of 108 university first-year Bachelor's students studying criminology. Data were obtained using the "Revised two-factor study…
Formative Assessment as an Effective Leadership Learning Tool.

PubMed

Garrett, J Matthew; Camper, Jill M

2015-01-01

Formative assessment can be a critical and creative practice in leadership education and significantly enhance student learning, leader development, and leadership development. This chapter seeks to frame the use of assessment as both a best practice in leadership education and as an integral component to effective leadership learning pedagogy. © 2015 Wiley Periodicals, Inc., A Wiley Company.
Formative Assessment in the Classroom: Getting It Right

ERIC Educational Resources Information Center

Doffermyre, Janet Jackson

2016-01-01

Formative assessment, assessment for learning, involves checking in with students during the learning process to see if they understand concept or standard, before holding them accountable for mastery and moving on to the next concept or standard. This process can be used in the classroom during the lesson or across a subject area as teachers of…
Development of an Instrument to Measure Behavioral Health Function for Work Disability: Item Pool Construction and Factor Analysis

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Haley, Stephen M.; Jette, Alan M.; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Brandt, Diane E.; Rasch, Elizabeth K.

2014-01-01

Objectives To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Design Cross-sectional. Setting Community. Participants Item pools of behavioral health functioning were developed, refined, and field-tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working due to mental or both mental and physical conditions. Interventions None. Main Outcome Measure Social Security Administration Behavioral Health (SSA-BH) measurement instrument Results Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, and social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the four scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these four distinct scales of function. Conclusion This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. PMID:23548542
Development of an instrument to measure behavioral health function for work disability: item pool construction and factor analysis.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Jette, Alan M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K

2013-09-01

To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Cross-sectional. Community. Item pools of behavioral health functioning were developed, refined, and field tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working because of mental or both mental and physical conditions. None. Social Security Administration Behavioral Health (SSA-BH) measurement instrument. Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the 4 scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these 4 distinct scales of function. This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work-related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work-related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Real and Artificial Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Andrich, David; Hagquist, Curt

2015-01-01

Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Think Pair Share with Formative Assessment for Junior High School Student

NASA Astrophysics Data System (ADS)

Pradana, O. R. Y.; Sujadi, I.; Pramudya, I.

2017-09-01

Geometry is a science related to abstract thinking ability so that not many students are able to understand this material well. In this case, the learning model plays a crucial role in improving student achievement. This means that a less precise learning model will cause difficulties for students. Therefore, this study provides a quantitative explanation of the Think Pair Share learning model combined with the formative assessment. This study aims to test the Think Pair Share with the formative assessment on junior high school students. This research uses a quantitative approach of Pretest-Posttest in control group and experiment group. ANOVA test and Scheffe test used to analyse the effectiveness this learning. Findings in this study are student achievement on the material geometry with Think Pair Share using formative assessment has increased significantly. This happens probably because this learning makes students become more active during learning. Hope in the future, Think Pair Share with formative assessment be a useful learning for teachers and this learning applied by the teacher around the world especially on the material geometry.
Internal and External Factors Affecting Teachers' Adoption of Formative Assessment to Support Learning

ERIC Educational Resources Information Center

Izci, Kemal

2016-01-01

Assessment forms an important part of instruction. Assessment that aims to support learning is known as formative assessment and it contributes student's learning gain and motivation. However, teachers rarely use assessment formatively to aid their students' learning. Thus reviewing the factors that limit or support teachers' practices of…
Internal and External Factors Affecting Teachers' Adoption of Formative Assessment to Support Learning

ERIC Educational Resources Information Center

Izci, Kemal

2016-01-01

Assessment forms an important part of instruction. Assessment that aims to support learning is known as formative assessment and it contributes student's learning gain and motivation. However, teachers rarely use assessment formatively to aid their students' learning. Thus, reviewing the factors that limit or support teachers' practices of…
Component Identification and Item Difficulty of Raven's Matrices Items.

ERIC Educational Resources Information Center

Green, Kathy E.; Kluever, Raymond C.

Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
A model of formative assessment practice in secondary science classrooms using an audience response system

NASA Astrophysics Data System (ADS)

Shirley, Melissa L.

Formative assessment involves the probing of students' ideas to determine their level of understanding during the instructional sequence. Often conceptualized as a cycle, formative assessment consists of the teacher posing an instructional task to students, collecting data about student understanding, and engaging in follow-up strategies such as clarifying student understanding and adjusting instruction to meet learning needs. Despite having been shown to increase student achievement in a variety of classroom settings, formative assessment remains a relative weak area of teacher practice. Methods that enhance formative assessment strategies may therefore have a positive effect on student achievement. Audience response systems comprise a broad category of technologies that support richer classroom interaction and have the potential to facilitate formative assessment. Results from a large national research study, Classroom Connectivity in Promoting Mathematics and Science Achievement (CCMS), show that students in algebra classrooms where the teacher has implemented a type of audience response system experience significantly higher achievement gains compared to a control group. This suggests a role for audience response systems in promoting rich formative assessment. The importance of incorporating formative assessment strategies into regular classroom practice is widely recognized. However, it remains challenging to identify whether rich formative assessment is occurring during a particular class session. This dissertation uses teacher interviews and classroom observations to develop a fine-grained model of formative assessment in secondary science classrooms employing a type of audience response system. This model can be used by researchers and practitioners to characterize components of formative assessment practice in classrooms. A major component of formative assessment practice is the collection and aggregation of evidence of student learning. This dissertation
Use of Formative Classroom Assessment Techniques in a Project Management Course

ERIC Educational Resources Information Center

Purcell, Bernice M.

2014-01-01

Formative assessment is considered to be an evaluation technique that informs the instructor of the level of student learning, giving evidence when it may be necessary for the instructor to make a change in delivery based upon the results. Several theories of formative assessment exist, all which propound the importance of feedback to the student.…
Effect of online formative assessment on summative performance in integrated musculoskeletal system module.

PubMed

Mitra, Nilesh Kumar; Barua, Ankur

2015-03-03

The impact of web-based formative assessment practices on performance of undergraduate medical students in summative assessments is not widely studied. This study was conducted among third-year undergraduate medical students of a designated university in Malaysia to compare the effect, on performance in summative assessment, of repeated computer-based formative assessment with automated feedback with that of single paper-based formative assessment with face-to face feedback. This quasi-randomized trial was conducted among two groups of undergraduate medical students who were selected by stratified random technique from a cohort undertaking the Musculoskeletal module. The control group C (n = 102) was subjected to a paper-based formative MCQ test. The experimental group E (n = 65) was provided three online formative MCQ tests with automated feedback. The summative MCQ test scores for both these groups were collected after the completion of the module. In this study, no significant difference was observed between the mean summative scores of the two groups. However, Band 1 students from group E with higher entry qualification showed higher mean score in the summative assessment. A trivial, but significant and positive correlation (r(2) = +0.328) was observed between the online formative test scores and summative assessment scores of group E. The proportionate increase of performance in group E was found to be almost double than group C. The use of computer based formative test with automated feedback improved the performance of the students with better academic background in the summative assessment. Computer-based formative test can be explored as an optional addition to the curriculum of pre-clinical integrated medical program to improve the performance of the students with higher academic ability.
The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

PubMed

Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander

2014-04-01

An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
Teacher Learning of Technology Enhanced Formative Assessment

ERIC Educational Resources Information Center

Feldman, Allan; Capobianco, Brenda M.

2008-01-01

This study examined the integration of technology enhanced formative assessment (FA) into teachers' practice. Participants were high school physics teachers interested in improving their use of a classroom response system (CRS) to promote FA. Data were collected using interviews, direct classroom observations, and collaborative discussions. The…
A Multi-Faceted Formative Assessment Approach: Better Recognising the Learning Needs of Students

ERIC Educational Resources Information Center

Jenkins, James O.

2010-01-01

Students are increasingly subject to a series of learning pressures that prevent effective engagement in assessment. Thus, the aim of this study was to create a multi-faceted formative assessment approach that better enabled students to engage in the assessment process. A formative assessment approach, consisting of six key initiatives, is…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.