Science.gov

Sample records for assessment item format

  1. Formative Assessment in High School Chemistry Teaching: Investigating the Alignment of Teachers' Goals with Their Items

    ERIC Educational Resources Information Center

    Sandlin, Benjamin; Harshman, Jordan; Yezierski, Ellen

    2015-01-01

    A 2011 report by the Department of Education states that understanding how teachers use results from formative assessments to guide their practice is necessary to improve instruction. Chemistry teachers have goals for items in their formative assessments, but the degree of alignment between what is assessed by these items and the teachers' goals…

  2. An Evaluation of Forced-Choice and True-False Item Formats in Personality Assessment.

    ERIC Educational Resources Information Center

    Jackson, Douglas N.; And Others

    In a comparative evaluation of a standard true-false format for personality assessment and a forced-choice format, subjects from college residential units were assigned randomly to respond either to the forced-choice or standard true-false form of the Personality Research Form (PRF). All subjects also rated themselves and the members of their…

  3. An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

    ERIC Educational Resources Information Center

    Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

    2013-01-01

    Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…

  4. Reliability, validity and efficiency of multiple choice question and patient management problem item formats in assessment of clinical competence.

    PubMed

    Norcini, J J; Swanson, D B; Grosso, L J; Webster, G D

    1985-05-01

    Despite a lack of face validity, there continues to be heavy reliance on objective paper-and-pencil measures of clinical competence. Among these measures, the most common item formats are patient management problems (PMPs) and three types of multiple choice questions (MCQs): one-best-answer (A-types); matching questions (M-types); and multiple true/false questions (X-types). The purpose of this study is to compare the reliability, validity and efficiency of these item formats with particular focus on whether MCQs and PMPs measure different aspects of clinical competence. Analyses revealed reliabilities of 0.72 or better for all item formats; the MCQ formats were most reliable. Similarly, efficiency analyses (reliability per unit of testing time) demonstrated the superiority of MCQs. Evidence for validity obtained through correlations of both programme directors' ratings and criterion group membership with item format scores also favoured MCQs. More important, however, is whether MCQs and PMPs measure the same or different aspects of clinical competence. Regression analyses of the scores on the validity measures (programme directors' ratings and criterion group membership) indicated that MCQs and PMPs seem to be measuring predominantly the same thing. MCQs contribute a small unique variance component over and above PMPs, while PMPs make the smallest unique contribution. As a whole, these results indicate that MCQs are more efficient, reliable and valid than PMPs. PMID:4010571

  5. MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin

    2010-01-01

    Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…

  6. Assessing the Impact of Characteristics of the Test, Common-Items, and Examinees on the Preservation of Equity Properties in Mixed-Format Test Equating

    ERIC Educational Resources Information Center

    Wolf, Raffaela

    2013-01-01

    Preservation of equity properties was examined using four equating methods--IRT True Score, IRT Observed Score, Frequency Estimation, and Chained Equipercentile--in a mixed-format test under a common-item nonequivalent groups (CINEG) design. Equating of mixed-format tests under a CINEG design can be influenced by factors such as attributes of the…

  7. Formative Assessment

    ERIC Educational Resources Information Center

    Technology & Learning, 2005

    2005-01-01

    In today's climate of high-stakes testing and accountability, educators are challenged to continuously monitor student progress to ensure achievement. This article details how formative assessment helps educators meet this challenge and to ensure achievement. Formative assessment can influence learning and support achievement, allowing teachers…

  8. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    PubMed

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models. PMID:25106393

  9. A Multilevel Assessment of Differential Item Functioning.

    ERIC Educational Resources Information Center

    Shen, Linjun

    A multilevel approach was proposed for the assessment of differential item functioning and compared with the traditional logistic regression approach. Data from the Comprehensive Osteopathic Medical Licensing Examination for 2,300 freshman osteopathic medical students were analyzed. The multilevel approach used three-level hierarchical generalized…

  10. The Fantastic Four of Mathematics Assessment Items

    ERIC Educational Resources Information Center

    Greenlees, Jane

    2011-01-01

    In this article, the author makes reference to four comic book characters to make the point that together they are a formidable team, but on their own they are vulnerable. She examines the four components of mathematics assessment items and the need for implicit instruction within the classroom for student success. Just like the "Fantastic Four"…

  11. Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

    ERIC Educational Resources Information Center

    Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

    2013-01-01

    We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…

  12. The Psychometric Structure of Items Assessing Autogynephilia.

    PubMed

    Hsu, Kevin J; Rosenthal, A M; Bailey, J Michael

    2015-07-01

    Autogynephilia, or paraphilic sexual arousal in a man to the thought or image of himself as a woman, manifests in a variety of different behaviors and fantasies. We examined the psychometric structure of 22 items assessing five known types of autogynephilia by subjecting them to exploratory factor analysis in a sample of 149 autogynephilic men. Results of oblique factor analyses supported the ability to distinguish five group factors with suitable items. Results of hierarchical factor analyses suggest that the five group factors were strongly underlain by a general factor of autogynephilia. Because the general factor accounted for a much greater amount of the total variance of the 22 items than did the group factors, the types of autogynephilia that a man has seem less important than the degree to which he has autogynephilia. However, the five types of autogynephilia remain conceptually useful because meaningful distinctions were found among them, including differential rates of endorsement and differential ability to predict other relevant variables like gender dysphoria. Factor-derived scales and subscales demonstrated good internal consistency reliabilities, and validity, with large differences found between autogynephilic men and heterosexual male controls. Future research should attempt to replicate our findings, which were mostly exploratory. PMID:25277693

  13. Using Mutual Information for Adaptive Item Comparison and Student Assessment

    ERIC Educational Resources Information Center

    Liu, Chao-Lin

    2005-01-01

    The author analyzes properties of mutual information between dichotomous concepts and test items. The properties generalize some common intuitions about item comparison, and provide principled foundations for designing item-selection heuristics for student assessment in computer-assisted educational systems. The proposed item-selection strategies…

  14. Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts

    ERIC Educational Resources Information Center

    Boo, Hong Kwen

    2007-01-01

    Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…

  15. Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd

    The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…

  16. Classification Accuracy of Mixed Format Tests: A Bi-Factor Item Response Theory Approach

    PubMed Central

    Wang, Wei; Drasgow, Fritz; Liu, Liwen

    2016-01-01

    Mixed format tests (e.g., a test consisting of multiple-choice [MC] items and constructed response [CR] items) have become increasingly popular. However, the latent structure of item pools consisting of the two formats is still equivocal. Moreover, the implications of this latent structure are unclear: For example, do constructed response items tap reasoning skills that cannot be assessed with multiple choice items? This study explored the dimensionality of mixed format tests by applying bi-factor models to 10 tests of various subjects from the College Board's Advanced Placement (AP) Program and compared the accuracy of scores based on the bi-factor analysis with scores derived from a unidimensional analysis. More importantly, this study focused on a practical and important question—classification accuracy of the overall grade on a mixed format test. Our findings revealed that the degree of multidimensionality resulting from the mixed item format varied from subject to subject, depending on the disattenuated correlation between scores from MC and CR subtests. Moreover, remarkably small decrements in classification accuracy were found for the unidimensional analysis when the disattenuated correlations exceeded 0.90. PMID:26973568

  17. Test item linguistic complexity and assessments for deaf students.

    PubMed

    Cawthon, Stephanie

    2011-01-01

    Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students. PMID:21941876

  18. Assessing Scientific Reasoning: A Comprehensive Evaluation of Item Features That Affect Item Difficulty

    ERIC Educational Resources Information Center

    Stiller, Jurik; Hartmann, Stefan; Mathesius, Sabrina; Straube, Philipp; Tiemann, Rüdiger; Nordmeier, Volkhard; Krüger, Dirk; Upmeier zu Belzen, Annette

    2016-01-01

    The aim of this study was to improve the criterion-related test score interpretation of a text-based assessment of scientific reasoning competencies in higher education by evaluating factors which systematically affect item difficulty. To provide evidence about the specific demands which test items of various difficulty make on pre-service…

  19. Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

    ERIC Educational Resources Information Center

    Arendasy, Martin E.; Sommer, Markus

    2012-01-01

    The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…

  20. Development and assessment of floor and ceiling items for the PROMIS physical function item bank

    PubMed Central

    2013-01-01

    Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at

  1. Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

    ERIC Educational Resources Information Center

    Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

    2013-01-01

    Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…

  2. Item Feature Effects in Evolution Assessment

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Ha, Minsu

    2011-01-01

    Despite concerted efforts by science educators to understand patterns of evolutionary reasoning in science students and teachers, the vast majority of evolution education studies have failed to carefully consider or control for item feature effects in knowledge measurement. Our study explores whether robust contextualization patterns emerge within…

  3. Cooperative Industrial/Vocational Education. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; Elias, Julie Whitaker

    This document contains multiple-choice test items and assessment techniques in the form of instructional management plans for Missouri's cooperative industrial-vocational education core curriculum. The test items and techniques are relevant to these 15 occupational duties: (1) career research and planning; (2) computer awareness; (3) employment…

  4. Constructing Items for Assessing English Writing Skills. Technical Note.

    ERIC Educational Resources Information Center

    Humes, Ann

    Specifying and writing appropriate items for student writing assessments is an exacting task. All too frequently, however, teachers approach this task by reading a skill statement and hurriedly writing a few items with correct answers combined with several distractors. This approach disregards the essentials of isolating a single skill for…

  5. The Effects of Item Preview on Video-Based Multiple-Choice Listening Assessments

    ERIC Educational Resources Information Center

    Koyama, Dennis; Sun, Angela; Ockey, Gary J.

    2016-01-01

    Multiple-choice formats remain a popular design for assessing listening comprehension, yet no consensus has been reached on how multiple-choice formats should be employed. Some researchers argue that test takers must be provided with a preview of the items prior to the input (Buck, 1995; Sherman, 1997); others argue that a preview may decrease the…

  6. Item Response Theory Models for Wording Effects in Mixed-Format Scales

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

    2015-01-01

    Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…

  7. Ethnic DIF in Reading Tests with Mixed Item Formats

    ERIC Educational Resources Information Center

    Taylor, Catherine S.; Lee, Yoonsun

    2011-01-01

    This article presents a study of ethnic Differential Item Functioning (DIF) for 4th-, 7th-, and 10th-grade reading items on a state criterion-referenced achievement test. The tests, administered 1997 to 2001, were composed of multiple-choice and constructed-response items. Item performance by focal groups (i.e., students from Asian/Pacific Island,…

  8. The 4-Item Negative Symptom Assessment (NSA-4) Instrument

    PubMed Central

    Morlock, Robert; Coon, Cheryl; van Willigenburg, Arjen; Panagides, John

    2010-01-01

    Objective. To assess the ability of mental health professionals to use the 4-item Negative Symptom Assessment instrument, derived from the Negative Symptom Assessment-16, to rapidly determine the severity of negative symptoms of schizophrenia. Design. Open participation. Setting. Medical education conferences. Participants. Attendees at two international psychiatry conferences. Measurements. Participants read a brief set of the 4-item Negative Symptom Assessment instructions and viewed a videotape of a patient with schizophrenia. Using the 1 to 6 4-item Negative Symptom Assessment severity rating scale, they rated four negative symptom items and the overall global negative symptoms. These ratings were compared with a consensus rating determination using frequency distributions and Chi-square tests for the proportion of participant ratings that were within one point of the expert rating. Results. More than 400 medical professionals (293 physicians, 50% with a European practice, and 55% who reported past utilization of schizophrenia ratings scales) participated. Between 82.1 and 91.1 percent of the 4-items and the global rating determinations by the participants were within one rating point of the consensus expert ratings. The differences between the percentage of participant rating scores that were within one point versus the percentage that were greater than one point different from those by the consensus experts was significant (p<0.0001). Participants rating of negative symptoms using the 4-item Negative Symptom Assessment did not generally differ among the geographic regions of practice, the professional credentialing, or their familiarity with the use of schizophrenia symptom rating instruments. Conclusion. These findings suggest that clinicians from a variety of geographic practices can, after brief training, use the 4-item Negative Symptom Assessment effectively to rapidly assess negative symptoms in patients with schizophrenia. PMID:20805916

  9. Alignment of Content and Effectiveness of Mathematics Assessment Items

    ERIC Educational Resources Information Center

    Kulm, Gerald; Dager Wilson, Linda; Kitchen, Richard

    2005-01-01

    Alignment has taken on increased importance given the current high-stakes nature of assessment. To make well-informed decisions about student learning on the basis of test results, assessment items need to be well aligned with standards. Project 2061 of the American Association for the Advancement of Science (AAAS) has developed a procedure for…

  10. Formative Assessment Probes

    ERIC Educational Resources Information Center

    Eberle, Francis; Keeley, Page

    2008-01-01

    Formative assessment probes can be effective tools to help teachers build a bridge between students' initial ideas and scientific ones. In this article, the authors describe how using two formative assessment probes can help teachers determine the extent to which students make similar connections between developing a concept of matter and a…

  11. Formative Assessment in Context

    ERIC Educational Resources Information Center

    Oxenford-O'Brian, Julie

    2013-01-01

    This dissertation responds to critical gaps in current research on formative assessment practice which could limit successful implementation of this practice within the K-12 classroom context. The study applies a socio cultural perspective of learning to interpret a cross-case analysis of formative assessment practice occurring during one…

  12. Test Item Linguistic Complexity and Assessments for Deaf Students

    ERIC Educational Resources Information Center

    Cawthon, Stephanie

    2011-01-01

    Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64…

  13. Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

    ERIC Educational Resources Information Center

    Wan, Lei; Henly, George A.

    2012-01-01

    Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

  14. Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment

    ERIC Educational Resources Information Center

    Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas

    2015-01-01

    The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…

  15. A Framework for Dimensionality Assessment for Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Svetina, Dubravka; Levy, Roy

    2014-01-01

    A framework is introduced for considering dimensionality assessment procedures for multidimensional item response models. The framework characterizes procedures in terms of their confirmatory or exploratory approach, parametric or nonparametric assumptions, and applicability to dichotomous, polytomous, and missing data. Popular and emerging…

  16. Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; And Others

    This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…

  17. Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; And Others

    This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…

  18. Goodness-of-Fit Assessment of Item Response Theory Models

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto

    2013-01-01

    The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…

  19. Assessing the factor structure of a role functioning item bank

    PubMed Central

    Ware, John E.; Bjorner, Jakob B.

    2013-01-01

    Purpose Role functioning (RF) is an important part of health-related quality of life, but is hard to measure due to the wide definition of roles and fluctuations in role participation. This study aims to explore the dimensionality of a newly developed item bank assessing the impact of health on RF. Methods A battery of measures with skip patterns including the new RF bank was completed by 2,500 participants answering only questions on social roles relevant to them. Confirmatory factor analyses were conducted for the participants answering items from all conceptual domains (N = 1193). Conceptually based dimensionality and method effects reflecting positively and negatively worded items were explored in a series of models. Results A bi-factor model (CFI = .93, RMSEA = .08) with one general and four conceptual factors (social, family, occupation, generic) was retained. Positively worded items were excluded from the final solution due to misfit. While a single factor model with methods factors had a poor fit (CFI = .88, RMSEA = .13), high loadings on the general factor in the bi-factor model suggest that the RF bank is sufficiently unidimensional for IRT analysis. Conclusions The bank demonstrated sufficient unidimensionality for IRT-based calibration of all the items on a common metric and development of a computerized adaptive test. PMID:21153710

  20. An Analysis of Sex-Related Differential Item Functioning in Attitude Assessment.

    ERIC Educational Resources Information Center

    Dodeen, Hamzeh; Johanson, George A.

    2003-01-01

    Analyzed and classified items that display sex-related differential item functioning (DIF) in attitude assessment. Analyzed 982 items, from 23 real data sets, that measure attitudes. Found that sex DIF is common in attitude scales: more than 27 percent of items showed DIF related to sex, 15 percent of the items exhibited moderate to large DIF, and…

  1. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  2. IRT-Estimated Reliability for Tests Containing Mixed Item Formats

    ERIC Educational Resources Information Center

    Shu, Lianghua; Schwarz, Richard D.

    2014-01-01

    As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

  3. Descriptive Study of High-Stakes Science Assessments: Prevalence, Content, and the Possible Effect of Incorporating Innovative Item Types

    NASA Astrophysics Data System (ADS)

    Keller, Shani Malaika

    Framed by a discussion of the heightened importance of science education in the U.S., this paper describes the prevalence, content, and format of high-stakes science assessments in the U.S. and explores the possibility that differences in assessment format may affect score gaps among student subgroups. An analysis of proficiency rates for 2010-11 high school exit exams in science was inconclusive; however, score gaps among ethnic subgroups on the 2009 grade 12 NAEP science assessment were larger for multiple choice items than for performance-based components. Further, a comparison of subgroup score gaps on the 2009 NAEP science assessment and those on the ACT science subtest suggest that the assessment with more diverse and innovative items resulted in a smaller gap in subgroup test scores. These findings point to the need for greater investigation of the extent to which item type affects subgroup score differences on science assessments.

  4. The Impact of Item Format and Examinee Characteristics on Response Times

    ERIC Educational Resources Information Center

    Hess, Brian J.; Johnston, Mary M.; Lipner, Rebecca S.

    2013-01-01

    Current research on examination response time has focused on tests comprised of traditional multiple-choice items. Consequently, the impact of other innovative or complex item formats on examinee response time is not understood. The present study used multilevel growth modeling to investigate examinee characteristics associated with response time…

  5. Formative Assessment: Simply, No Additives

    ERIC Educational Resources Information Center

    Roskos, Kathleen; Neuman, Susan B.

    2012-01-01

    Among the types of assessment the closest to daily reading instruction is formative assessment. In contrast to summative assessment, which occurs after instruction, formative assessment involves forming judgments frequently in the flow of instruction. Key features of formative assessment include identifying gaps between where students are and…

  6. Computerized Tailored Testing: Structured and Calibrated Item Banks for Summative and Formative Evaluation.

    ERIC Educational Resources Information Center

    Leclercq, Dieudonne

    1980-01-01

    Advancements in educational testing, especially in the computerized construction of tests from item banks, are outlined and explained. It is suggested that these methods open the door to more individualized and more formative type of testing. (MSE)

  7. Primary Science Assessment Item Setters' Misconceptions Concerning the State Changes of Water

    ERIC Educational Resources Information Center

    Boo, Hong Kwen

    2006-01-01

    Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the questions setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…

  8. Missouri Assessment Program (MAP), Spring 2000: Intermediate Science, Released Items, Grade 7.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This assessment sample provides information on the Missouri Assessment Program (MAP) for grade 7 science. The sample consists of seven items taken from the test booklet and scoring guides for the seven items. The items assess heat, minerals, graphing, and plant growth. (MM)

  9. Effects of Test Format, Self Concept and Anxiety on Item Response Changing Behaviour

    ERIC Educational Resources Information Center

    Afolabi, E. R. I.

    2007-01-01

    The study examined the effects of item format, self-concept and anxiety on response changing behaviour. Four hundred undergraduate students who offered a counseling psychology course in a Nigerian university participated in the study. Students' answers in multiple--choice and true--false formats of an achievement test were observed for response…

  10. Calibration of an Item Bank for the Assessment of Basque Language Knowledge

    ERIC Educational Resources Information Center

    Lopez-Cuadrado, Javier; Perez, Tomas A.; Vadillo, Jose A.; Gutierrez, Julian

    2010-01-01

    The main requisite for a functional computerized adaptive testing system is the need of a calibrated item bank. This text presents the tasks carried out during the calibration of an item bank for assessing knowledge of Basque language. It has been done in terms of the 3-parameter logistic model provided by the item response theory. Besides, this…

  11. Smoothed Standardization Assessment of Testlet Level DIF on a Math Free-Response Item Type.

    ERIC Educational Resources Information Center

    Lyu, C. Felicia; And Others

    A smoothed version of standardization, which merges kernel smoothing with the traditional standardization differential item functioning (DIF) approach, was used to examine DIF for student-produced response (SPR) items on the Scholastic Assessment Test (SAT) I mathematics test at both the item and testlet levels. This nonparametric technique avoids…

  12. Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Wang, Ning; Lane, Suzanne

    This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…

  13. Assessing Acquiescence in Binary Responses: IRT-Related Item-Factor-Analytic Procedures

    ERIC Educational Resources Information Center

    Ferrando, Pere J.; Condon, Lorena

    2006-01-01

    This article proposes procedures for assessing acquiescence in a balanced set of binary personality items. These procedures are based on the bidimensional item-factor analysis model, which is an alternative parameterization of the bidimensional 2-parameter normal-ogive item response theory model. First the rationale and general approach are…

  14. Modified Multiple-Choice Items for Alternate Assessments: Reliability, Difficulty, and Differential Boost

    ERIC Educational Resources Information Center

    Kettler, Ryan J.; Rodriguez, Michael C.; Bolt, Daniel M.; Elliott, Stephen N.; Beddow, Peter A.; Kurz, Alexander

    2011-01-01

    Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce…

  15. Formative Assessment: A Critical Review

    ERIC Educational Resources Information Center

    Bennett, Randy Elliot

    2011-01-01

    This paper covers six interrelated issues in formative assessment (aka, "assessment for learning"). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under-representation of measurement principles in that…

  16. The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin; Sun, Guo-Wei

    2012-01-01

    The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items…

  17. Development of the Assessment Items of Debris Flow Using the Delphi Method

    NASA Astrophysics Data System (ADS)

    Byun, Yosep; Seong, Joohyun; Kim, Mingi; Park, Kyunghan; Yoon, Hyungkoo

    2016-04-01

    In recent years in Korea, Typhoon and the localized extreme rainfall caused by the abnormal climate has increased. Accordingly, debris flow is becoming one of the most dangerous natural disaster. This study aimed to develop the assessment items which can be used for conducting damage investigation of debris flow. Delphi method was applied to classify the realms of assessment items. As a result, 29 assessment items which can be classified into 6 groups were determined.

  18. Formative Assessment Probes: Is It a Rock? Continuous Formative Assessment

    ERIC Educational Resources Information Center

    Keeley, Page

    2013-01-01

    A lesson plan is provided for a formative assessment probe entitled "Is It a Rock?" This probe is designed for teaching elementary school students about rocks through the use of a formative assessment classroom technique (FACT) known as the group Frayer Model. FACT activates students' thinking about a concept and can be used to…

  19. New Frontiers in Formative Assessment

    ERIC Educational Resources Information Center

    Noyce, Pendred E., Ed.; Hickey, Daniel T., Ed.

    2011-01-01

    "Formative assessment is a powerful learning tool that is too seldom, too haphazardly, and too ineffectively used in the United States," Pendred E. Noyce writes in the introduction to this volume. "The purpose of this book is to delve into why this is so and how it can be changed." Formative assessment involves constantly monitoring student…

  20. Assessment of health-related quality of life in arthritis: conceptualization and development of five item banks using item response theory

    PubMed Central

    Kopec, Jacek A; Sayre, Eric C; Davis, Aileen M; Badley, Elizabeth M; Abrahamowicz, Michal; Sherlock, Lesley; Williams, J Ivan; Anis, Aslam H; Esdaile, John M

    2006-01-01

    Background Modern psychometric methods based on item response theory (IRT) can be used to develop adaptive measures of health-related quality of life (HRQL). Adaptive assessment requires an item bank for each domain of HRQL. The purpose of this study was to develop item banks for five domains of HRQL relevant to arthritis. Methods About 1,400 items were drawn from published questionnaires or developed from focus groups and individual interviews and classified into 19 domains of HRQL. We selected the following 5 domains relevant to arthritis and related conditions: Daily Activities, Walking, Handling Objects, Pain or Discomfort, and Feelings. Based on conceptual criteria and pilot testing, 219 items were selected for further testing. A questionnaire was mailed to patients from two hospital-based clinics and a stratified random community sample. Dimensionality of the domains was assessed through factor analysis. Items were analyzed with the Generalized Partial Credit Model as implemented in Parscale. We used graphical methods and a chi-square test to assess item fit. Differential item functioning was investigated using logistic regression. Results Data were obtained from 888 individuals with arthritis. The five domains were sufficiently unidimensional for an IRT-based analysis. Thirty-one items were deleted due to lack of fit or differential item functioning. Daily Activities had the narrowest range for the item location parameter (-2.24 to 0.55) and Handling Objects had the widest range (-1.70 to 2.27). The mean (median) slope parameter for the items ranged from 1.15 (1.07) in Feelings to 1.73 (1.75) in Walking. The final item banks are comprised of 31–45 items each. Conclusion We have developed IRT-based item banks to measure HRQL in 5 domains relevant to arthritis. The items in the final item banks provide adequate psychometric information for a wide range of functional levels in each domain. PMID:16749932

  1. Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

    ERIC Educational Resources Information Center

    Li, Xiaomin; Wang, Wen-Chung

    2015-01-01

    The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…

  2. Development and community-based validation of eight item banks to assess mental health.

    PubMed

    Batterham, Philip J; Sunderland, Matthew; Carragher, Natacha; Calear, Alison L

    2016-09-30

    There is a need for precise but brief screening of mental health problems in a range of settings. The development of item banks to assess depression and anxiety has resulted in new adaptive and static screeners that accurately assess severity of symptoms. However, expansion to a wider array of mental health problems is required. The current study developed item banks for eight mental health problems: social anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder, adult attention-deficit hyperactivity disorder, drug use, psychosis and suicidality. The item banks were calibrated in a population-based Australian adult sample (N=3175) by administering large item pools (45-75 items) and excluding items on the basis of local dependence or measurement non-invariance. Item Response Theory parameters were estimated for each item bank using a two-parameter graded response model. Each bank consisted of 19-47 items, demonstrating excellent fit and precision across a range of -1 to 3 standard deviations from the mean. No previous study has developed such a broad range of mental health item banks. The calibrated item banks will form the basis of a new system of static and adaptive measures to screen for a broad array of mental health problems in the community. PMID:27500552

  3. Formative Assessment as Mediation

    ERIC Educational Resources Information Center

    De Vos, Mark; Belluigi, Dina Zoe

    2011-01-01

    Whilst principles of validity, reliability and fairness should be central concerns for the assessment of student learning in higher education, simplistic notions of "transparency" and "explicitness" in terms of assessment criteria should be critiqued more rigorously. This article examines the inherent tensions resulting from CRA's links to both…

  4. Formative Assessment: A Cybernetic Viewpoint

    ERIC Educational Resources Information Center

    Roos, Bertil; Hamilton, David

    2005-01-01

    This paper considers alternative assessment, feedback and cybernetics. For more than 30 years, debates about the bi-polarity of formative and summative assessment have served as surrogates for discussions about the workings of the mind, the social implications of assessment and, as important, the role of instruction in the advancement of learning.…

  5. Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

    ERIC Educational Resources Information Center

    Kim, Sooyeon; Moses, Tim

    2013-01-01

    The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…

  6. Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

    ERIC Educational Resources Information Center

    Lee, Won-Chan

    2010-01-01

    In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…

  7. Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

    ERIC Educational Resources Information Center

    Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

    2012-01-01

    We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…

  8. Using Kernel Equating to Assess Item Order Effects on Test Scores

    ERIC Educational Resources Information Center

    Moses, Tim; Yang, Wen-Ling; Wilson, Christine

    2007-01-01

    This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test…

  9. International Assessment: A Rasch Model and Teachers' Evaluation of TIMSS Science Achievement Items

    ERIC Educational Resources Information Center

    Glynn, Shawn M.

    2012-01-01

    The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform…

  10. Assessing the Validity of a Single-Item HIV Risk Stage-of-Change Measure

    ERIC Educational Resources Information Center

    Napper, Lucy E.; Branson, Catherine M.; Fisher, Dennis G.; Reynolds, Grace L.; Wood, Michelle M.

    2008-01-01

    This study examined the validity of a single-item measure of HIV risk stage of change that HIV prevention contractors were required to collect by the California State Office of AIDS. The single-item measure was compared to the more conventional University of Rhode Island Change Assessment (URICA). Participants were members of Los Angeles…

  11. A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

    ERIC Educational Resources Information Center

    Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

    2011-01-01

    The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…

  12. Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

    ERIC Educational Resources Information Center

    Min, Shangchao; He, Lianzhen

    2014-01-01

    This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the…

  13. Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Albano, Anthony D.

    2015-01-01

    This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…

  14. Formative Assessment in Dance Education

    ERIC Educational Resources Information Center

    Andrade, Heidi; Lui, Angela; Palma, Maria; Hefferen, Joanna

    2015-01-01

    Feedback is crucial to students' growth as dancers. When used within the framework of formative assessment, or assessment for learning, feedback results in actionable next steps that dancers can use to improve their performances. This article showcases the work of two dance specialists, one elementary and one middle school teacher, who have…

  15. Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

    NASA Astrophysics Data System (ADS)

    Greenberg, Ariela Caren

    Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

  16. Influences of Item Content and Format on the Dimensionality of Tests Combining Multiple-Choice and Open-Response Items: An Application of the Poly-DIMTEST Procedure.

    ERIC Educational Resources Information Center

    Perkhounkova, Yelena; Dunbar, Stephen B.

    The DIMTEST statistical procedure was used in a confirmatory manner to explore the dimensionality structures of three kinds of achievement tests: multiple-choice tests, constructed-response tests, and tests combining both formats. The DIMTEST procedure is based on estimating conditional covariances of the responses to the item pairs. The analysis…

  17. An instrument to assess quality of life in relation to nutrition: item generation, item reduction and initial validation

    PubMed Central

    2010-01-01

    Background It is arguable that modification of diet, given its potential for positive health outcomes, should be widely advocated and adopted. However, food intake, as a basic human need, and its modification may be accompanied by sensations of both pleasure and despondency and may consequently affect to quality of life (QoL). Thus, the feasibility and success of dietary changes will depend, at least partly, on whether potential negative influences on QoL can be avoided. This is of particular importance in the context of dietary intervention studies and in the development of new food products to improve health and well being. Instruments to measure the impact of nutrition on quality of life in the general population, however, are few and far between. Therefore, the aim of this project was to develop an instrument for measuring QoL related to nutrition in the general population. Methods and results We recruited participants from the general population and followed standard methodology for quality of life instrument development (identification of population, item selection, n = 24; item reduction, n = 81; item presentation, n = 12; pretesting of questionnaire and initial validation, n = 2576; construct validation n = 128; and test-retest reliability n = 20). Of 187 initial items, 29 were selected for final presentation. Factor analysis revealed an instrument with 5 domains. The instrument demonstrated good cross-sectional divergent and convergent construct validity when correlated with scores of the 8 domains of the SF-36 (ranging from -0.078 to 0.562, 19 out of 40 tested correlations were statistically significant and 24 correlations were predicted correctly) and good test-retest reliability (intra-class correlation coefficients from 0.71 for symptoms to 0.90). Conclusions We developed and validated an instrument with 29 items across 5 domains to assess quality of life related to nutrition and other aspects of food intake. The instrument demonstrated good face and

  18. DIF Detection and Interpretation in Large-Scale Science Assessments: Informing Item Writing Practices

    ERIC Educational Resources Information Center

    Zenisky, April L.; Hambleton, Ronald K.; Robin, Frederic

    2004-01-01

    Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content,…

  19. The Effect of the Multiple-Choice Item Format on the Measurement of Knowledge of Language Structure

    ERIC Educational Resources Information Center

    Currie, Michael; Chiramanee, Thanyapa

    2010-01-01

    Noting the widespread use of multiple-choice items in tests in English language education in Thailand, this study compared their effect against that of constructed-response items. One hundred and fifty-two university undergraduates took a test of English structure first in constructed-response format, and later in three, stem-equivalent…

  20. Comparison of Alternate and Original Items on the Montreal Cognitive Assessment

    PubMed Central

    Lebedeva, Elena; Huang, Mei; Koski, Lisa

    2016-01-01

    Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861

  1. Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

    ERIC Educational Resources Information Center

    Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

    2012-01-01

    Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…

  2. Development and Calibration of an Item Bank for PE Metrics Assessments: Standard 1

    ERIC Educational Resources Information Center

    Zhu, Weimo; Fox, Connie; Park, Youngsik; Fisette, Jennifer L.; Dyson, Ben; Graber, Kim C.; Avery, Marybell; Franck, Marian; Placek, Judith H.; Rink, Judy; Raynes, De

    2011-01-01

    The purpose of this study was to develop and calibrate an assessment system, or bank, using the latest measurement theories and methods to promote valid and reliable student assessment in physical education. Using an anchor-test equating design, a total of 30 items or assessments were administered to 5,021 (2,568 boys and 2,453 girls) students in…

  3. An Examination of Differential Item Functioning on the Vanderbilt Assessment of Leadership in Education

    ERIC Educational Resources Information Center

    Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph

    2009-01-01

    The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…

  4. Formative Assessment in Primary Science

    ERIC Educational Resources Information Center

    Loughland, Tony; Kilpatrick, Laetitia

    2015-01-01

    This action learning study in a year three classroom explored the implementation of five formative assessment principles to assist students' understandings of the scientific topic of liquids and solids. These principles were employed to give students a greater opportunity to express their understanding of the concepts. The study found that…

  5. Assessing the Validity of Single-item Life Satisfaction Measures: Results from Three Large Samples

    PubMed Central

    Cheung, Felix; Lucas, Richard E.

    2014-01-01

    Purpose The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS) - a more psychometrically established measure. Methods Two large samples from Washington (N=13,064) and Oregon (N=2,277) recruited by the Behavioral Risk Factor Surveillance System (BRFSS) and a representative German sample (N=1,312) recruited by the Germany Socio-Economic Panel (GSOEP) were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Results Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62 – 0.64; disattenuated r = 0.78 – 0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001 – 0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS were very small (average absolute difference = 0.015 −0.042). Conclusions Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use. PMID:24890827

  6. A Classification Matrix of Examination Items to Promote Transformative Assessment

    ERIC Educational Resources Information Center

    McMahon, Mark; Garrett, Michael

    2016-01-01

    The ability to assess learning hinges on the quality of the instruments that are used. This paper reports on the first stage of the design of software to assist educators in ensuring assessment questions meet educational outcomes. A review of the literature within the field of instructional psychology was undertaken with a view towards…

  7. Dimensionality Assessment of Ordered Polytomous Items with Parallel Analysis

    ERIC Educational Resources Information Center

    Timmerman, Marieke E.; Lorenzo-Seva, Urbano

    2011-01-01

    Parallel analysis (PA) is an often-recommended approach for assessment of the dimensionality of a variable set. PA is known in different variants, which may yield different dimensionality indications. In this article, the authors considered the most appropriate PA procedure to assess the number of common factors underlying ordered polytomously…

  8. Are Grid-In Response Format Items Usable in Secondary Classrooms?

    ERIC Educational Resources Information Center

    Hombo, Catherine M.; Pashley, Katharine; Jenkins, Frank

    The use of grid-in formats, such as those requiring students to solve problems and fill in bubbles, is common on large-scale standardized assessments, but little is known about the use of this format with a more general population of students than high school students taking college entrance examinations, including those attending public schools…

  9. Assessing Dimensionality of Noncompensatory Multidimensional Item Response Theory with Complex Structures

    ERIC Educational Resources Information Center

    Svetina, Dubravka

    2013-01-01

    The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…

  10. Gender-Related Differential Item Functioning on a Middle-School Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    This study examined gender-related differential item functioning (DIF) using a mathematics performance assessment, the QUASAR Cognitive Assessment Instrument (QCAI), administered to middle school students. The QCAI was developed for the Quantitative Understanding: Amplifying Student Achievement and Reading (QUASAR) project, which focuses on…

  11. Using Item Response Theory to Assess Changes in Student Performance Based on Changes in Question Wording

    ERIC Educational Resources Information Center

    Schurmeier, Kimberly D.; Atwood, Charles H.; Shepler, Carrie G.; Lautenschlager, Gary J.

    2010-01-01

    Five years of longitudinal data for general chemistry student assessments at the University of Georgia have been analyzed using item response theory (IRT). Our analysis indicates that minor changes in question wording on exams can make significant differences in student performance on assessment questions. This analysis encompasses data from over…

  12. An Anthropologist among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi

    ERIC Educational Resources Information Center

    Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin

    2015-01-01

    This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…

  13. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    PubMed

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. PMID:25524862

  14. Target Rotations and Assessing the Impact of Model Violations on the Parameters of Unidimensional Item Response Theory Models

    ERIC Educational Resources Information Center

    Reise, Steven; Moore, Tyler; Maydeu-Olivares, Alberto

    2011-01-01

    Reise, Cook, and Moore proposed a "comparison modeling" approach to assess the distortion in item parameter estimates when a unidimensional item response theory (IRT) model is imposed on multidimensional data. Central to their approach is the comparison of item slope parameter estimates from a unidimensional IRT model (a restricted model), with…

  15. How Do You Know if They're Getting It? Writing Assessment Items that Reveal Student Understanding

    ERIC Educational Resources Information Center

    Taylor, Melanie; Smith, Sean

    2009-01-01

    Through a project funded by the National Science Foundation, Horizon Research has been developing assessment items for students (in the process, compiling item-writing principles from several sources and adding their own). In this article, the authors share what they have learned about writing items that reveal student understanding, including…

  16. Investigation of Science Inquiry Items for Use on an Alternate Assessment Based on Modified Achievement Standards Using Cognitive Lab Methodology

    ERIC Educational Resources Information Center

    Dickenson, Tammiee S.; Gilmore, Joanna A.; Price, Karen J.; Bennett, Heather L.

    2013-01-01

    This study evaluated the benefits of item enhancements applied to science-inquiry items for incorporation into an alternate assessment based on modified achievement standards for high school students. Six items were included in the cognitive lab sessions involving both students with and without disabilities. The enhancements (e.g., use of visuals,…

  17. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    ERIC Educational Resources Information Center

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2015-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…

  18. Automatic Item Generation of Probability Word Problems

    ERIC Educational Resources Information Center

    Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

    2009-01-01

    Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…

  19. Differential Item Functioning by Gender on a Large-Scale Science Performance Assessment: A Comparison across Grade Levels.

    ERIC Educational Resources Information Center

    Holweger, Nancy; Taylor, Grace

    The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…

  20. Psychometrical assessment and item analysis of the General Health Questionnaire in victims of terrorism.

    PubMed

    Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

    2013-03-01

    There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and 729 relatives of the victims. All participants were evaluated using the 28-item version of the GHQ (GHQ-28). We examined the reliability and external validity of scores on the scale using Cronbach's alpha and Pearson correlation with the State-Trait Anxiety Inventory (STAI), respectively. The factor structure of the scale was analyzed with varimax rotation. Samejima's (1969) graded response model was used to explore the item properties. The GHQ-28 scores showed good reliability and item-scale correlations. The factor analysis identified 3 factors: anxious-somatic symptoms, social dysfunction, and depression symptoms. All factors showed good correlation with the STAI. Before rotation, the first, second, and third factor explained 44.0%, 6.4%, and 5.0% of the variance, respectively. Varimax rotation redistributed the percentages of variance accounted for to 28.4%, 13.8%, and 13.2%, respectively. Items with the highest loadings in the first factor measured anxiety symptoms, whereas items with the highest loadings in the third factor measured suicide ideation. Samejima's model found that high scores in suicide-related items were associated with severe depression. The factor structure of the GHQ-28 found in this study underscores the preeminence of anxiety symptoms among victims of terrorism and their relatives. Item response analysis identified the most difficult and significant items for each factor. PMID:23205624

  1. Positive and negative item wording and its influence on the assessment of callous-unemotional traits.

    PubMed

    Ray, James V; Frick, Paul J; Thornton, Laura C; Steinberg, Laurence; Cauffman, Elizabeth

    2016-04-01

    [Correction Notice: An Erratum for this article was reported in Vol 28(4) of Psychological Assessment (see record 2015-33818-001). In the article, the sixth sentence of the second full paragraph in the Data Analyses subsection of the Method section should read "For k response categories, there are k-1 threshold parameters."] This study examined the item functioning of the Inventory of Callous-Unemotional Traits (ICU) in an ethnically diverse sample 1,190 of first-time justice-involved adolescents (mean age = 15.28 years, SD = 1.29). On elimination of 2 items, the total ICU score provided a reliable (internally consistent and stable) and valid (correlated with and predictive of measures of empathy, school conduct problems, delinquency, and aggression) continuous measure of callous and unemotional (CU) traits. A shortened, 10-item version of the total scale, developed from item response theory (IRT) analyses, appeared to show psychometric properties similar to those of the full ICU and, thus, could be used as an abbreviated measure of CU traits. Finally, item analyses and tests of validity suggested that the factor structure of the ICU reported in a large number of past studies could reflect method variance related to the ICU, including equal numbers of positively and negatively worded items. Specifically, positively worded items (i.e., items for which higher ratings are indicative of higher levels of CU traits) were more likely to be rated in the lower response categories, showed higher difficulty levels in IRT analyses (i.e., discriminated best at higher levels of CU traits), and were more highly correlated with measures of antisocial and aggressive behavior. On the basis of these findings, we recommend using the total ICU as a continuous measure of CU traits and do not recommend continued use of the subscale structure that has been reported in multiple past studies. (PsycINFO Database Record PMID:26121386

  2. Assessing and Testing Interrater Agreement on a Single Target Using Multi-Item Rating Scales.

    ERIC Educational Resources Information Center

    Lindell, Michael K.

    2001-01-01

    Developed an index for assessing interrater agreement with respect to a single target using a multi-item rating scale. The variance of rater mean scale scores is used as the numerator of the agreement index. Studied four variants of a disattenuated agreement index that vary in the random response term used as the denominator. (SLD)

  3. Improving the Memory Sections of the Standardized Assessment of Concussion Using Item Analysis

    ERIC Educational Resources Information Center

    McElhiney, Danielle; Kang, Minsoo; Starkey, Chad; Ragan, Brian

    2014-01-01

    The purpose of the study was to improve the immediate and delayed memory sections of the Standardized Assessment of Concussion (SAC) by identifying a list of more psychometrically sound items (words). A total of 200 participants with no history of concussion in the previous six months (aged 19.60 ± 2.20 years; N?=?93 men, N?=?107 women)…

  4. Assessing the Discriminating Power of Item and Test Scores in the Linear Factor-Analysis Model

    ERIC Educational Resources Information Center

    Ferrando, Pere J.

    2012-01-01

    Model-based attempts to rigorously study the broad and imprecise concept of "discriminating power" are scarce, and generally limited to nonlinear models for binary responses. This paper proposes a comprehensive framework for assessing the discriminating power of item and test scores which are analyzed or obtained using Spearman's factor-analytic…

  5. PSSA Released Reading Items, 2000-2001. The Pennsylvania System of School Assessment.

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Education, Harrisburg. Bureau of Curriculum and Academic Services.

    This document contains materials directly related to the actual reading test of the Pennsylvania System of School Assessment (PSSA), including the reading rubric, released passages, selected-response questions with answer keys, performance tasks, and scored samples of students' responses to the tasks. All of these items may be duplicated to…

  6. The Use of Loglinear Models for Assessing Differential Item Functioning across Manifest and Latent Examinee Groups.

    ERIC Educational Resources Information Center

    Kelderman, Henk; Macready, George B.

    1990-01-01

    Loglinear latent class models are used to detect differential item functioning (DIF). Likelihood ratio tests for assessing the presence of various types of DIF are described, and these methods are illustrated through the analysis of a "real world" data set. (TJH)

  7. Assessing Model Data Fit of Unidimensional Item Response Theory Models in Simulated Data

    ERIC Educational Resources Information Center

    Kose, Ibrahim Alper

    2014-01-01

    The purpose of this paper is to give an example of how to assess the model-data fit of unidimensional IRT models in simulated data. Also, the present research aims to explain the importance of fit and the consequences of misfit by using simulated data sets. Responses of 1000 examinees to a dichotomously scoring 20 item test were simulated with 25…

  8. Applying Item Response Theory Methods to Design a Learning Progression-Based Science Assessment

    ERIC Educational Resources Information Center

    Chen, Jing

    2012-01-01

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1)…

  9. The Value of Item Response Theory in Clinical Assessment: A Review

    ERIC Educational Resources Information Center

    Thomas, Michael L.

    2011-01-01

    Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

  10. Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories

    ERIC Educational Resources Information Center

    Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L.

    2009-01-01

    Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…

  11. Randomised Items in Computer-Based Tests: Russian Roulette in Assessment?

    ERIC Educational Resources Information Center

    Marks, Anthony M.; Cronje, Johannes C.

    2008-01-01

    Computer-based assessments are becoming more commonplace, perhaps as a necessity for faculty to cope with large class sizes. These tests often occur in large computer testing venues in which test security may be compromised. In an attempt to limit the likelihood of cheating in such venues, randomised presentation of items is automatically…

  12. A Study of Item Bias in the Maine Educational Assessment Test.

    ERIC Educational Resources Information Center

    Smith, James Brian

    A study used four statistical item bias analysis strategies to determine the French cross-cultural validity of the Maine Educational Assessment, a standardized test administered in six content areas to students in grades 4, 8, and 11. Analysis was performed on eighth grade pupil performance in test year 1988-89, in the areas of the 100 common…

  13. Assessing the Structure of the GRE General Test Using Confirmatory Multidimensional Item Response Theory.

    ERIC Educational Resources Information Center

    Kingston, Neal M.; McKinley, Robert L.

    Confirmatory multidimensional item response theory (CMIRT) was used to assess the structure of the Graduate Record Examination General Test, about which much information about factorial structure exists, using a sample of 1,001 psychology majors taking the test in 1984 or 1985. Results supported previous findings that, for this population, there…

  14. The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

    ERIC Educational Resources Information Center

    Lee, HyeSun; Geisinger, Kurt F.

    2016-01-01

    The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…

  15. Successful Student Writing through Formative Assessment

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover

    2010-01-01

    Use formative assessment to dramatically improve your students' writing. In "Successful Student Writing Through Formative Assessment", educator and international speaker Harry G. Tuttle shows you how to guide middle and high school students through the prewriting, writing, and revision processes using formative assessment techniques that work.…

  16. Formative Assessment: Responding to Your Students

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover

    2009-01-01

    This "how-to" book on formative assessment is filled with practical suggestions for teachers who want to use formative assessment in their classrooms. With practical strategies, tools, and examples for teachers of all subjects and grade levels, this book shows you how to use formative assessment to promote successful student learning. Topics…

  17. Using a Constructed-Response Instrument to Explore the Effects of Item Position and Item Features on the Assessment of Students' Written Scientific Explanations

    NASA Astrophysics Data System (ADS)

    Federer, Meghan Rector; Nehm, Ross H.; Opfer, John E.; Pearl, Dennis

    2015-08-01

    A large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice instruments (MCI). Constructed-response instruments (CRI), however, have invited much less scrutiny, perhaps because of their reputation for avoiding many of the documented biases of MCIs. In this study we explored whether known biases of MCIs—specifically item sequencing and surface feature effects—were also apparent in a CRI designed to assess students' understanding of evolutionary change using written explanation (Assessment of COntextual Reasoning about Natural Selection [ACORNS]). We used three versions of the ACORNS CRI to investigate different aspects of assessment structure and their corresponding effect on inferences about student understanding. Our results identified several sources of (and solutions to) assessment bias in this practice-focused CRI. First, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. Second, a counterbalanced design (i.e., Latin Square) mitigated this bias at the population level of analysis. Third, ACORNS response scores were highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Our results suggest that as assessments in science education shift toward the measurement of scientific practices (e.g., explanation), it is critical that biases inherent in these types of assessments be investigated empirically.

  18. Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

    ERIC Educational Resources Information Center

    Parish, Jane A.; Karisch, Brandi B.

    2013-01-01

    Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…

  19. PISA Test Items and School-Based Examinations in Greece: Exploring the Relationship between Global and Local Assessment Discourses

    ERIC Educational Resources Information Center

    Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

    2013-01-01

    The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…

  20. Policy considerations based on a cost analysis of alternative test formats in large scale science assessments

    NASA Astrophysics Data System (ADS)

    Lawrenz, Frances; Huffman, Douglas; Welch, Wayne

    2000-08-01

    This article compares the costs of four assessment formats: multiple choice, open ended, laboratory station, and full investigation. The amount of time spent preparing the devices, developing scoring consistency for the devices, and scoring the devices was tracked as the devices were developed. These times are presented by individual item and by complete device. Times are also compared as if 1,000 students completed each assessment. Finally, the times are converted into cost estimates by assuming a potential hourly wage. The data show that a multiple choice item costs the least, and that it is approximately 80 times as much for an open ended item, 300 times as much for a content station, and 500 times as much for a full investigation item. The very large discrepancies in costs are used as a basis to raise several policy issues related to the inclusion of alternative assessment formats in large scale science achievement testing.

  1. Formative Assessment: Policy, Perspectives and Practice

    ERIC Educational Resources Information Center

    Clark, Ian

    2011-01-01

    Proponents of formative assessment (FA) assert that students develop a deeper understanding of their learning when the essential components of formative feedback and cultural responsiveness are effectively incorporated as central features of the formative assessment process. Even with growing international agreement among the research community…

  2. Formative Assessment: Guidance for Early Childhood Policymakers

    ERIC Educational Resources Information Center

    Riley-Ayers, Shannon

    2014-01-01

    This policy report provides a guide and framework to early childhood policymakers considering formative assessment. The report defines formative assessment and outlines its process and application in the context of early childhood. The substance of this document is the issues for consideration in the implementation of the formative assessment…

  3. Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

    ERIC Educational Resources Information Center

    Baghaei, Purya; Aryadoust, Vahid

    2015-01-01

    Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a…

  4. State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts

    ERIC Educational Resources Information Center

    Swanson, Leonard C.

    2010-01-01

    This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…

  5. APPREND: Formative Assessment Tools for APP

    ERIC Educational Resources Information Center

    Sherborne, Tony

    2009-01-01

    This article discusses how Assessing Pupils' Progress (APP) can be turned into more of a tool for formative assessment. It describes an approach called "APPREND" as a set of APP-based tools for formative assessment. The author provides a glimpse of how APPREND tools can help. (Contains 2 tables.)

  6. Formative and Summative Assessment in the Classroom

    ERIC Educational Resources Information Center

    Dixson, Dante D.; Worrell, Frank C.

    2016-01-01

    In this article, we provide brief overviews of the definitions of formative and summative assessment and a few examples of types of formative and summative assessments that can be used in classroom contexts. We highlight the points that these two types of assessment are complementary and the differences between them are often in the way these…

  7. Gender differences in national assessment of educational progress science items: What does i don't know really mean?

    NASA Astrophysics Data System (ADS)

    Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

    The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.

  8. Test-retest reliability, internal item consistency, and concurrent validity of the wheelchair seating discomfort assessment tool.

    PubMed

    Crane, Barbara A; Holm, Margo B; Hobson, Douglas; Cooper, Rory A; Reed, Matthew P; Stadelmeier, Steve

    2005-01-01

    Discomfort is a common problem for wheelchair users. Few researchers have investigated discomfort among wheelchair users or potential solutions for this problem. One of the impediments to quantitative research on wheelchair seating discomfort has been the lack of a reliable method for quantifying seat discomfort. The purpose of this study was to establish the test-retest reliability, internal item consistency, and concurrent validity of a newly developed Wheelchair Seating Discomfort Assessment Tool (WcS-DAT). Thirty full-time, active wheelchair users with intact sensation were asked to use this and other tools in order to rate their levels of discomfort in a test-retest reliability study format. Data from these measures were analyzed in SPSS using an intraclass correlation coefficient (ICC) model (2,k) to measure the test-retest reliability. Cronbach's alpha was used to examine the internal consistency of the items within the WcS-DAT. Concurrent validity with similar measures was analyzed using Pearson product-moment correlations. ICC scores for all analyses were above the established lower bound of .80, indicating a highly stable and reliable tool. In addition, alpha scores indicated good consistency of all items without redundancy. Finally, correlations with similar tools, such as the Chair Evaluation Checklist and the Short Form of the McGill Pain Questionnaire, were significant at the .05 level, and many were significant at the .001 level. These results support the use of the WcS-DAT as a reliable and stable tool for quantifying wheelchair seating discomfort. Its application will enhance the ability to assess and to research this important problem and will provide a means to validate the outcomes of specialized seating interventions for the study population of wheelchairs users. PMID:16392714

  9. A Tree-Based Analysis of Items from an Assessment of Basic Mathematics Skills.

    ERIC Educational Resources Information Center

    Sheehan, Kathleen; Mislevy, Robert J.

    The operating characteristics of 114 mathematics pretest items from the Praxis I: Computer Based Test were analyzed in terms of item attributes and test developers' judgments of item difficulty. Item operating characteristics were defined as the difficulty, discrimination, and asymptote parameters of a three parameter logistic item response theory…

  10. Item Difficulty Modeling of Paragraph Comprehension Items

    ERIC Educational Resources Information Center

    Gorin, Joanna S.; Embretson, Susan E.

    2006-01-01

    Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…

  11. The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

    ERIC Educational Resources Information Center

    Domyancich, John M.

    2014-01-01

    Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…

  12. An Item-Level Psychometric Analysis of the Personality Assessment Inventory: Clinical Scales in a Psychiatric Inpatient Unit

    ERIC Educational Resources Information Center

    Siefert, Caleb J.; Sinclair, Samuel J.; Kehl-Fie, Kendra A.; Blais, Mark A.

    2009-01-01

    Multi-item multiscale self-report measures are increasingly used in inpatient assessments. When considering a measure for this setting, it is important to evaluate the psychometric properties of the clinical scales and items to ensure that they are functioning as intended in a highly distressed clinical population. The present study examines scale…

  13. [Psychometric properties of the quality of life assessment instrument: 12-item health survey (SF-12)].

    PubMed

    Silveira, Marise Fagundes; Almeida, Júlio César; Freire, Rafael Silveira; Haikal, Desirrê Sant'Ana; Martins, Andrea Eleutério de Barros Lima

    2013-07-01

    This article aims to assess the psychometric properties of the 12-Item Health Survey (SF-12). Data from an epidemiological oral health survey conducted in 2008/2009 in the municipality of Montes Claros, MG were used, consisting of 2157 individuals of both sexes. The relational structure of the SF-12 was assessed by Factor Exploratory Analysis (FEA), the reliability was assessed using Cronbach's alpha coefficient and Pearson's correlation coefficient was adopted in order to assess the correlations between each questionnaire item and the final scores. The validity of the construct was investigated by comparing the physical (PCS) and mental (MCS) component scores of the SF-12 among population subgroups, using the Mann-Whitney and Kruskal-Wallis tests. The PCS and MCS domains presented averages (standard deviation), respectively, equal to 49.6 (9.0) and 51.9 (8.6). Cronbach's alpha coefficient (α= 0.836) presented a high degree of reliability. The relational structure was explained by two latent factors, which explained the 58.36% of the total variance. The psychometric properties of the SF-12 suggest that this is a sensitive tool to assess the different QL levels, is reliable, has satisfactory internal consistence and is fast and easy to use. PMID:23827896

  14. Formative Assessment: Not Just Another Test

    ERIC Educational Resources Information Center

    Education Digest: Essential Readings Condensed for Quick Review, 2011

    2011-01-01

    "Many educators think of formative assessment as another kind of test. Instead, it is a process to help instructors understand their students' day-to-day learning and develop appropriate interventions to improve that learning," says Nancy Gerzon, Senior Research Associate at WestEd. "We know from research that effective formative assessment has…

  15. The Political Dilemmas of Formative Assessment

    ERIC Educational Resources Information Center

    Dorn, Sherman

    2010-01-01

    The literature base on using formative assessment for instructional and intervention decisions is formidable, but the history of the practice of formative assessment is spotty. Even with the pressures of high-stakes accountability, its definition is fuzzy, its adoption is inconsistent, and the prognosis for future use is questionable. A historical…

  16. Formative Assessment in the High School IMC

    ERIC Educational Resources Information Center

    Edwards, Valerie A.

    2007-01-01

    In this article, the author discusses how she uses formative assessments of information literacy skills in the high school IMC. As a result of informal observation and conversations with individual students--a form of formative assessment itself--the author learned that students were not using indexes to locate relevant information in nonfiction…

  17. Formative E-Assessment: Practitioner Cases

    ERIC Educational Resources Information Center

    Pachler, Norbert; Daly, Caroline; Mor, Yishay; Mellar, Harvey

    2010-01-01

    This paper reports on one aspect of the Joint Information Systems Committee (JISC)-funded project "Scoping a vision of formative e-assessment", namely on cases of formative e-assessment developed iteratively with the UK education practitioner community. The project, which took place from June 2008 to January 2009, aimed to identify current…

  18. Formative Assessments in a Professional Learning Community

    ERIC Educational Resources Information Center

    Stanley, Todd; Moore, Betsy

    2011-01-01

    The ideas and examples in this book help teachers successfully collaborate to raise student achievement through the use of formative assessments. Here, Todd Stanley and Betsy Moore, educators with over 40 years of combined experience, offer proven formative assessment strategies to teachers in a professional learning community. Contents include:…

  19. Improving Foreign Language Speaking through Formative Assessment

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover; Tuttle, Alan Robert

    2012-01-01

    Want a quick way to get your students happily conversing more in the target language? This practical book shows you how to use formative assessments to gain immediate and lasting improvement in your students' fluency. You'll learn how to: (1) Imbed the 3-minute formative assessment into every lesson with ease; (2) Engage students in peer formative…

  20. Implementation of Formative Assessment in the Classroom

    ERIC Educational Resources Information Center

    Edman, Elaina; Gilbreth, Stephen G.; Wynn, Sheila

    2010-01-01

    This report details the work defined by a doctoral team looking at the literacy and implementation of formative assessment in classrooms in Southwest Missouri. The mission of this project was to identify the formative assessment literacy levels and the degree of classroom implementation of these strategies in districts and the resulting…

  1. Learning Progressions that Support Formative Assessment Practices

    ERIC Educational Resources Information Center

    Alonzo, Alicia C.

    2011-01-01

    Black, Wilson, and Yao (this issue) lay out a comprehensive vision for the way that learning progressions (or other "road maps") might be used to inform and coordinate formative and summative purposes of assessment. As Black, Wilson, and others have been arguing for over a decade, the effective use of formative assessment has great potential to…

  2. Harnessing Collaborative Annotations on Online Formative Assessments

    ERIC Educational Resources Information Center

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…

  3. Issues, Examples, and Challenges in Formative Assessment.

    ERIC Educational Resources Information Center

    Hunt, Earl; Pellegrino, James W.

    2002-01-01

    Describes new developments in formative assessment and challenges for the educational community. Asserts that many current assessment practices that serve certification and prediction functions well are not well suited for improving learning. Calls for alternative approaches to assessment, rooted in cognitive theories of knowledge and learning,…

  4. Examining Increased Flexibility in Assessment Formats

    ERIC Educational Resources Information Center

    Irwin, Brian; Hepplestone, Stuart

    2012-01-01

    There have been calls in the literature for changes to assessment practices in higher education, to increase flexibility and give learners more control over the assessment process. This article explores the possibilities of allowing student choice in the format used to present their work, as a starting point for changing assessment, based on…

  5. A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

    ERIC Educational Resources Information Center

    Masters, James S.

    2010-01-01

    With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…

  6. Exploring Formative Assessment as a Tool for Learning: Students' Experiences of Different Methods of Formative Assessment

    ERIC Educational Resources Information Center

    Weurlander, Maria; Soderberg, Magnus; Scheja, Max; Hult, Hakan; Wernerson, Annika

    2012-01-01

    This study aims to provide a greater insight into how formative assessments are experienced and understood by students. Two different formative assessment methods, an individual, written assessment and an oral group assessment, were components of a pathology course within a medical curriculum. In a cohort of 70 students, written accounts were…

  7. Formative Assessment at the Crossroads: Conformative, Deformative and Transformative Assessment

    ERIC Educational Resources Information Center

    Torrance, Harry

    2012-01-01

    The theory and practice of formative assessment seems to be at a crossroads, even an impasse. Different theoretical justifications for the development of formative assessment, and different empirical exemplifications, have been apparent for many years. Yet practice, while quite widespread, is often limited in terms of its scope and its utilisation…

  8. Formative Assessment: Assessment Is for Self-Regulated Learning

    ERIC Educational Resources Information Center

    Clark, Ian

    2012-01-01

    The article draws from 199 sources on assessment, learning, and motivation to present a detailed decomposition of the values, theories, and goals of formative assessment. This article will discuss the extent to which formative feedback actualizes and reinforces self-regulated learning (SRL) strategies among students. Theoreticians agree that SRL…

  9. Gender, Assessment and Students' Literacy Learning: Implications for Formative Assessment

    ERIC Educational Resources Information Center

    Murphy, Patricia; Ivinson, Gabrielle

    2005-01-01

    Formative assessment is intended to develop students' capacity to learn and increase the effectiveness of teaching. However, the extent to which formative assessment can meet these aims depends on the relationship between its conception and current conceptions of learning. In recent years concern about sex group differences in achievement has led…

  10. [Assessment of the distance between categories in rating scales by using the item response model].

    PubMed

    Wakita, Takafumi

    2004-10-01

    This study aimed to assess the distance between adjacent categories of rating scales. It is common practice to treat ordinal variables as interval-scaled variables in the analysis of rating scales. Strictly speaking, however, ordinal scale data should be treated as such, since there is little reason and assurance that they are equivalent to interval variables. In view of this practice, this study proposes a method to assess the interval of rating scales, and analyzes empirical data in order to examine the results obtained by the method. This method is based upon the generalized partial credit model which is one of item response theory (IRT) models. The experiment was carried out on two data sets that differed only on the verbal phrasing of the rating. Main results of the study were: 1) the difference in item content (positive or negative) affects the width of a neutral category; and 2) the distance between categories differs significantly reflecting the difference in verbal phrasing. PMID:15747553

  11. Promoting proximal formative assessment with relational discourse

    NASA Astrophysics Data System (ADS)

    Scherr, Rachel E.; Close, Hunter G.; McKagan, Sarah B.

    2012-02-01

    The practice of proximal formative assessment - the continual, responsive attention to students' developing understanding as it is expressed in real time - depends on students' sharing their ideas with instructors and on teachers' attending to them. Rogerian psychology presents an account of the conditions under which proximal formative assessment may be promoted or inhibited: (1) Normal classroom conditions, characterized by evaluation and attention to learning targets, may present threats to students' sense of their own competence and value, causing them to conceal their ideas and reducing the potential for proximal formative assessment. (2) In contrast, discourse patterns characterized by positive anticipation and attention to learner ideas increase the potential for proximal formative assessment and promote self-directed learning. We present an analysis methodology based on these principles and demonstrate its utility for understanding episodes of university physics instruction.

  12. Developing an item bank and short forms that assess the impact of asthma on quality of life.

    PubMed

    Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

    2014-02-01

    The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. PMID:24411842

  13. Developing the Theory of Formative Assessment

    ERIC Educational Resources Information Center

    Black, Paul; Wiliam, Dylan

    2009-01-01

    Whilst many definitions of formative assessment have been offered, there is no clear rationale to define and delimit it within broader theories of pedagogy. This paper aims to offer such a rationale, within a framework which can also unify the diverse set of practices which have been described as formative. The analysis is used to relate formative…

  14. A Comparison of Three Test Formats to Assess Word Difficulty

    ERIC Educational Resources Information Center

    Culligan, Brent

    2015-01-01

    This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…

  15. Assessment of the Assessment Tool: Analysis of Items in a Non-MCQ Mathematics Exam

    ERIC Educational Resources Information Center

    Khoshaim, Heba Bakr; Rashid, Saima

    2016-01-01

    Assessment is one of the vital steps in the teaching and learning process. The reported action research examines the effectiveness of an assessment process and inspects the validity of exam questions used for the assessment purpose. The instructors of a college-level mathematics course studied questions used in the final exams during the academic…

  16. Teachers' Self-Assessment of the Effects of Formative and Summative Electronic Portfolios on Professional Development

    ERIC Educational Resources Information Center

    Beck, Robert J.; Livne, Nava L.; Bear, Sharon L.

    2005-01-01

    This study compared the effects of four electronic portfolio curricula on pre-service and beginning teachers' self-ratings of their professional development (n =207), using a 34 item electronic Portfolio Assessment Scale (ePAS). Three formative portfolios, A, C and D, had teacher development as a primary objective and used participants' narrative…

  17. Item parameters dissociate between expectation formats: a regression analysis of time-frequency decomposed EEG data

    PubMed Central

    Monsalve, Irene F.; Pérez, Alejandro; Molinaro, Nicola

    2014-01-01

    During language comprehension, semantic contextual information is used to generate expectations about upcoming items. This has been commonly studied through the N400 event-related potential (ERP), as a measure of facilitated lexical retrieval. However, the associative relationships in multi-word expressions (MWE) may enable the generation of a categorical expectation, leading to lexical retrieval before target word onset. Processing of the target word would thus reflect a target-identification mechanism, possibly indexed by a P3 ERP component. However, given their time overlap (200–500 ms post-stimulus onset), differentiating between N400/P3 ERP responses (averaged over multiple linguistically variable trials) is problematic. In the present study, we analyzed EEG data from a previous experiment, which compared ERP responses to highly expected words that were placed either in a MWE or a regular non-fixed compositional context, and to low predictability controls. We focused on oscillatory dynamics and regression analyses, in order to dissociate between the two contexts by modeling the electrophysiological response as a function of item-level parameters. A significant interaction between word position and condition was found in the regression model for power in a theta range (~7–9 Hz), providing evidence for the presence of qualitative differences between conditions. Power levels within this band were lower for MWE than compositional contexts when the target word appeared later on in the sentence, confirming that in the former lexical retrieval would have taken place before word onset. On the other hand, gamma-power (~50–70 Hz) was also modulated by predictability of the item in all conditions, which is interpreted as an index of a similar “matching” sub-step for both types of contexts, binding an expected representation and the external input. PMID:25161630

  18. Formative Assessment Probes: Representing Microscopic Life

    ERIC Educational Resources Information Center

    Keeley, Page

    2011-01-01

    This column focuses on promoting learning through assessment. The author discusses the formative assessment probe "Pond Water," which reveals how elementary children will often apply what they know about animal structures to newly discovered microscopic organisms, connecting their knowledge of the familiar to the unfamiliar through…

  19. Screencasts: Formative Assessment for Mathematical Thinking

    ERIC Educational Resources Information Center

    Soto, Melissa; Ambrose, Rebecca

    2016-01-01

    Increased attention to reasoning and justification in mathematics classrooms requires the use of more authentic assessment methods. Particularly important are tools that allow teachers and students opportunities to engage in formative assessment practices such as gathering data, interpreting understanding, and revising thinking or instruction.…

  20. Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

    NASA Astrophysics Data System (ADS)

    Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly

    2014-09-01

    A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.

  1. Data Collection Design for Equivalent Groups Equating: Using a Matrix Stratification Framework for Mixed-Format Assessment

    ERIC Educational Resources Information Center

    Mbella, Kinge Keka

    2012-01-01

    Mixed-format assessments are increasingly being used in large scale standardized assessments to measure a continuum of skills ranging from basic recall to higher order thinking skills. These assessments are usually comprised of a combination of (a) multiple-choice items which can be efficiently scored, have stable psychometric properties, and…

  2. TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

    ERIC Educational Resources Information Center

    Brese, Falk, Ed.

    2012-01-01

    The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…

  3. A Pearson-Type-VII Item Response Model for Assessing Person Fluctuation

    ERIC Educational Resources Information Center

    Ferrando, Pere J.

    2007-01-01

    Using Lumsden's Thurstonian fluctuation model as a starting point, this paper attempts to develop a unidimensional item response theory model intended for binary personality items. Under some additional assumptions, a new model is obtained in which the item characteristic curves are defined by a cumulative Pearson-Type-VII distribution, and the…

  4. Missouri Assessment Program (MAP), Spring 2000: High School Communication Arts, Released Items, Grade 11.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in communication arts for 11th graders in Missouri public schools. The document contains the following items from Session 1 in the Test Booklet: "Thomas Hart Benton: Champion of the American Scene" (Jan Greenberg and Sandra Jordan) (Items 5, 6, and 7); "Rhythms of the River" (Rebecca Christian) (Item 9), a writing…

  5. Sex Differences in Item Functioning in the Comprehensive Inventory of Basic Skills-II Vocabulary Assessments

    ERIC Educational Resources Information Center

    French, Brian F.; Gotch, Chad M.

    2013-01-01

    The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…

  6. Teachers' Use of Test-Item Banks for Student Assessment in North Carolina Secondary Agricultural Education Programs

    ERIC Educational Resources Information Center

    Marshall, Joy Morgan

    2014-01-01

    Higher expectations are on all parties to ensure students successfully perform on standardized tests. Specifically in North Carolina agriculture classes, students are given a CTE Post Assessment to measure knowledge gained and proficiency. Prior to students taking the CTE Post Assessment, teachers have access to a test item bank system that…

  7. Developing Parallel Career and Occupational Development Objectives and Exercise (Test) Items in Spanish for Assessment and Evaluation.

    ERIC Educational Resources Information Center

    Muratti, Jose E.; And Others

    A parallel Spanish edition was developed of released objectives and objective-referenced items used in the National Assessment of Educational Progress (NAEP) in the field of Career and Occupational Development (COD). The Spanish edition was designed to assess the identical skills, attitudes, concepts, and knowledge of Spanish-dominant students…

  8. NAEP Validity Studies: Improving the Information Value of Performance Items in Large Scale Assessments. Working Paper No. 2003-08

    ERIC Educational Resources Information Center

    Pearson, P. David; Garavaglia, Diane R.

    2003-01-01

    The purpose of this essay is to explore both what is known and what needs to be learned about the information value of performance items "when they are used in large scale assessments." Within the context of the National Assessment of Educational Progress (NAEP), there is substantial motivation for answering these questions. Over the…

  9. Measuring Teaching Best Practice in the Induction Years: Development and Validation of an Item-Level Assessment

    ERIC Educational Resources Information Center

    Kingsley, Laurie; Romine, William

    2014-01-01

    Schools and teacher induction programs around the world routinely assess teaching best practice to inform accreditation, tenure/promotion, and professional development decisions. Routine assessment is also necessary to ensure that teachers entering the profession get the assistance they need to develop and succeed. We introduce the Item-Level…

  10. Innovative Application of a Multidimensional Item Response Model in Assessing the Influence of Social Desirability on the Pseudo-Relationship between Self-Efficacy and Behavior

    ERIC Educational Resources Information Center

    Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.

    2006-01-01

    This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…

  11. The Impact of Varied Discrimination Parameters on Mixed-Format Item Response Theory Model Selection

    ERIC Educational Resources Information Center

    Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G.

    2013-01-01

    Whittaker, Chang, and Dodd compared the performance of model selection criteria when selecting among mixed-format IRT models and found that the criteria did not perform adequately when selecting the more parameterized models. It was suggested by M. S. Johnson that the problems when selecting the more parameterized models may be because of the low…

  12. Online Formative Assessments with Social Network Awareness

    ERIC Educational Resources Information Center

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    Social network awareness (SNA) has been used extensively as one of the strategies to increase knowledge sharing and collaboration opportunities. However, most SNA studies either focus on being aware of peer's knowledge context or on social context. This work proposes online formative assessments with SNA, trying to address the problems of online…

  13. Formative Assessment Probes: To Hypothesize or Not

    ERIC Educational Resources Information Center

    Keeley, Page

    2010-01-01

    Formative assessment probes are used not only to uncover the ideas students bring to their learning, they can also be used to reveal teachers' common misconceptions. Consider a process widely used in inquiry science--developing hypotheses. In this article, the author features the probe "Is It a Hypothesis?", which serves as an example of how…

  14. Assessment formats in dental medicine: An overview

    PubMed Central

    Gerhard-Szep, Susanne; Güntsch, Arndt; Pospiech, Peter; Söhnel, Andreas; Scheutzel, Petra; Wassmann, Torsten; Zahn, Tugba

    2016-01-01

    Aim: At the annual meeting of German dentists in Frankfurt am Main in 2013, the Working Group for the Advancement of Dental Education (AKWLZ) initiated an interdisciplinary working group to address assessments in dental education. This paper presents an overview of the current work being done by this working group, some of whose members are also actively involved in the German Association for Medical Education's (GMA) working group for dental education. The aim is to present a summary of the current state of research on this topic for all those who participate in the design, administration and evaluation of university-specific assessments in dentistry. Method: Based on systematic literature research, the testing scenarios listed in the National Competency-based Catalogue of Learning Objectives (NKLZ) have been compiled and presented in tables according to assessment value. Results: Different assessment scenarios are described briefly in table form addressing validity (V), reliability (R), acceptance (A), cost (C), feasibility (F), and the influence on teaching and learning (EI) as presented in the current literature. Infoboxes were deliberately chosen to allow readers quick access to the information and to facilitate comparisons between the various assessment formats. Following each description is a list summarizing the uses in dental and medical education. Conclusion: This overview provides a summary of competency-based testing formats. It is meant to have a formative effect on dental and medical schools and provide support for developing workplace-based strategies in dental education for learning, teaching and testing in the future. PMID:27579365

  15. A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.

    2015-01-01

    Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…

  16. An Application of Cognitive Diagnostic Assessment on TIMMS-2007 8th Grade Mathematics Items

    ERIC Educational Resources Information Center

    Toker, Turker; Green, Kathy

    2012-01-01

    The least squares distance method (LSDM) was used in a cognitive diagnostic analysis of TIMSS (Trends in International Mathematics and Science Study) items administered to 4,498 8th-grade students from seven geographical regions of Turkey, extending analysis of attributes from content to process and skill attributes. Logit item positions were…

  17. Assessing Impact, DIF, and DFF in Accommodated Item Scores: A Comparison of Multilevel Measurement Model Parameterizations

    ERIC Educational Resources Information Center

    Beretvas, S. Natasha; Cawthon, Stephanie W.; Lockhart, L. Leland; Kaye, Alyssa D.

    2012-01-01

    This pedagogical article is intended to explain the similarities and differences between the parameterizations of two multilevel measurement model (MMM) frameworks. The conventional two-level MMM that includes item indicators and models item scores (Level 1) clustered within examinees (Level 2) and the two-level cross-classified MMM (in which item…

  18. Some Issues in Item Response Theory: Dimensionality Assessment and Models for Guessing

    ERIC Educational Resources Information Center

    Smith, Jessalyn

    2009-01-01

    Currently, standardized tests are widely used as a method to measure how well schools and students meet academic standards. As a result, measurement issues have become an increasingly popular topic of study. Unidimensional item response models are used to model latent abilities and specific item characteristics. This class of models makes…

  19. DIFFERENTIAL ITEM FUNCTIONING AT POST ASSESSMENT BETWEEN TREATMENT AND CONTROL GROUPS FROM AN INCREASE IN KNOWLEDGE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    There has been some concern that participation in an intervention and exposure to a measurement instrument can distort subsequent responses to a questionnaire, thereby biasing results. Differential Item Functioning (DIF) analysis with Item Response Modeling (IRM) can test these effects by testing f...

  20. Assessing the Interpretive Component of Criterion-Referenced Test Item Validity.

    ERIC Educational Resources Information Center

    Secolsky, Charles

    Undergraduates responded to an objective test in electronics and classified each item by domain (one of 14 topics covered in their text), and by type of knowledge (definition, fact, principle, or interpretation). These judgments were compared to their instructor's "standard" judgments. From these data, an index of item-domain divergence in…

  1. Efficiently Assessing Negative Cognition in Depression: An Item Response Theory Analysis of the Dysfunctional Attitude Scale

    ERIC Educational Resources Information Center

    Beevers, Christopher G.; Strong, David R.; Meyer, Bjorn; Pilkonis, Paul A.; Miller, Ivan R.

    2007-01-01

    Despite a central role for dysfunctional attitudes in cognitive theories of depression and the widespread use of the Dysfunctional Attitude Scale, form A (DAS-A; A. Weissman, 1979), the psychometric development of the DAS-A has been relatively limited. The authors used nonparametric item response theory methods to examine the DAS-A items and…

  2. Demonstrating the Utility of a Multilevel Model in the Assessment of Differential Item Functioning.

    ERIC Educational Resources Information Center

    Pommerich, Mary

    When tests contain few items, observed score may not be an accurate reflection of true score, and the Mantel Haenszel (MH) statistic may perform poorly in detecting differential item functioning. Applications of the MH procedure in such situations require an alternate strategy; one such strategy is to include background variables in the matching…

  3. Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment

    ERIC Educational Resources Information Center

    Lei, Pui-Wa; Chen, Shu-Ying; Yu, Lan

    2006-01-01

    Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional…

  4. A Historical Investigation into Item Formats of ACS Exams and Their Relationships to Science Practices

    ERIC Educational Resources Information Center

    Brandriet, Alexandra; Reed, Jessica J.; Holme, Thomas

    2015-01-01

    The release of the "NRC Framework for K-12 Science Education" and the "Next Generation Science Standards" has important implications for classroom teaching and assessment. Of particular interest is the implementation of science practices in the chemistry classroom, and the definitions established by the NRC makes these…

  5. The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

    PubMed

    Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

    2016-04-01

    The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. PMID:25616401

  6. Development of an Item Bank for Assessing Generic Competences in a Higher-Education Institute: A Rasch Modelling Approach

    ERIC Educational Resources Information Center

    Xie, Qin; Zhong, Xiaoling; Wang, Wen-Chung; Lim, Cher Ping

    2014-01-01

    This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social…

  7. Using Data Mining to Predict K-12 Students' Performance on Large-Scale Assessment Items Related to Energy

    ERIC Educational Resources Information Center

    Liu, Xiufeng; Ruiz, Miguel E.

    2008-01-01

    This article reports a study on using data mining to predict K-12 students' competence levels on test items related to energy. Data sources are the 1995 Third International Mathematics and Science Study (TIMSS), 1999 TIMSS-Repeat, 2003 Trend in International Mathematics and Science Study (TIMSS), and the National Assessment of Educational…

  8. Examination of the Assumptions and Properties of the Graded Item Response Model: An Example Using a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    1995-01-01

    Over 5,000 students participated in a study of the dimensionality and stability of the item parameter estimates of a mathematics performance assessment developed for the Quantitative Understanding: Amplifying Student Achievement and Reasoning (QUASAR) Project. Results demonstrate the test's dimensionality and illustrate ways to examine use of the…

  9. PISA Test Items and School-Based Examinations in Greece: Exploring the relationship between global and local assessment discourses

    NASA Astrophysics Data System (ADS)

    Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

    2013-03-01

    The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek school-based biology examinations' test items in terms of the nature of their textual construction. This nature is determined by the interplay of the notions of classification (content specialisation) and formality (code specialisation) modulated by both the linguistic and the visual expressive modes. The results of the analysis reveal disparities between assessment discourses promoted at the global and the local level. In particular, while PISA test items convey their scientific message (specialised content and code) principally through their visual mode, the specialised scientific meaning of school-based examinations test is mainly conveyed through their linguistic mode. On the other hand, the linguistic mode of PISA test items is mainly compatible with textual practices of the public domain (non-specialised content and code). Such a mismatch between assessment discourses at local and global level is expected to place Greek students at different discursive positions, promoting different types of knowledge. The expected shift from the epistemic positioning promoted in Greece to the one promoted by PISA could significantly restrict Greek students' ability to infer the PISA discursive context and produce appropriate responses. This factor could provide a meaningful contribution in the discussion of the relatively low achievement of Greek students in PISA scientific literacy assessment.

  10. Assessing Middle and High School Mathematics & Science: Differentiating Formative Assessment

    ERIC Educational Resources Information Center

    Waterman, Sheryn Spencer

    2010-01-01

    For middle and high school teachers of mathematics and science, this book is filled with examples of instructional strategies that address students' readiness levels, interests, and learning preferences. It shows teachers how to formatively assess their students by addressing differentiated learning targets. Included are detailed examples of…

  11. Review of Formative Assessment Use and Training in Africa

    ERIC Educational Resources Information Center

    Perry, Lindsey

    2013-01-01

    This literature review examines formative assessment education practices currently being utilized in Africa, as well as recent research regarding professional development on such assessments. Two main conclusions about formative assessment use and training, as well as a set of recommendations about teacher training on formative assessment, can be…

  12. Reducing the item number to obtain same-length self-assessment scales: a systematic approach using result of graphical loglinear Rasch modeling.

    PubMed

    Nielsen, Tine; Kreiner, Svend

    2011-01-01

    The Revised Danish Learning Styles Inventory (R-D-LSI) (Nielsen 2005), which is an adaptation of Sternberg-Wagner Thinking Styles Inventory (Sternberg, 1997), comprises 14 subscales, each measuring a separate learning style. Of these 14 subscales, 9 are eight items long and 5 are seven items long. For self-assessment, self-scoring and self-interpretational purposes it is deemed prudent that subscales measuring comparable constructs are of the same item length. Consequently, in order to obtain a self-assessment version of the R-D-LSI with an equal number of items in each subscale, a systematic approach to item reduction based on results of graphical loglinear Rasch modeling (GLLRM) was designed. This approach was then used to reduce the number of items in the subscales of the R-D-LSI which had an item-length of more than seven items, thereby obtaining the Danish Self-Assessment Learning Styles Inventory (D-SA-LSI) comprising 14 subscales each with an item length of seven. The systematic approach to item reduction based on results of GLLRM will be presented and exemplified by its application to the R-D-LSI. PMID:22357154

  13. A 14-Item Mediterranean Diet Assessment Tool and Obesity Indexes among High-Risk Subjects: The PREDIMED Trial

    PubMed Central

    Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

    2012-01-01

    Objective Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Design Cross-sectional assessment of all participants in the “PREvención con DIeta MEDiterránea” (PREDIMED) trial. Subjects 7,447 participants (55–80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Results Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were −0.0066 (95% confidence interval, –0.0088 to −0.0049) for women and –0.0059 (–0.0079 to –0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥10 points versus ≤7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. Conclusions A brief 14-item tool was able to capture a strong monotonic inverse association between

  14. A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

    ERIC Educational Resources Information Center

    Yao, Lihua; Schwarz, Richard D.

    2006-01-01

    Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

  15. Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

    ERIC Educational Resources Information Center

    Kleinke, David J.

    Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…

  16. Construct and Differential Item Functioning in the Assessment of Prescription Opioid Use Disorders among American Adolescents

    ERIC Educational Resources Information Center

    Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.

    2009-01-01

    DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.

  17. Analysis of Sources of Latent Class Differential Item Functioning in International Assessments

    ERIC Educational Resources Information Center

    Oliveri, Maria Elena; Ercikan, Kadriye; Zumbo, Bruno

    2013-01-01

    In this study, we investigated differential item functioning (DIF) and its sources using a latent class (LC) modeling approach. Potential sources of LC DIF related to instruction and teacher-related variables were investigated using substantive and three statistical approaches: descriptive discriminant function, multinomial logistic regression,…

  18. Two Prophecy Formulas for Assessing the Reliability of Item Response Theory-Based Ability Estimates

    ERIC Educational Resources Information Center

    Raju, Nambury S.; Oshima, T.C.

    2005-01-01

    Two new prophecy formulas for estimating item response theory (IRT)-based reliability of a shortened or lengthened test are proposed. Some of the relationships between the two formulas, one of which is identical to the well-known Spearman-Brown prophecy formula, are examined and illustrated. The major assumptions underlying these formulas are…

  19. An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.

    2014-01-01

    As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…

  20. Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

    ERIC Educational Resources Information Center

    Wells, Craig S.; Bolt, Daniel M.

    2008-01-01

    Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

  1. Assessing the Feasibility of a Test Item Bank and Assessment Clearinghouse: Strategies to Measure Technical Skill Attainment of Career and Technical Education Participants

    ERIC Educational Resources Information Center

    Derner, Seth; Klein, Steve; Hilber, Don

    2008-01-01

    This report documents strategies that can be used to initiate development of a technical skill test item bank and/or assessment clearinghouse and quantifies the cost of creating and maintaining such a system. It is intended to inform state administrators on the potential uses and benefits of system participation, test developers on the needs and…

  2. Student Perceptions of Formative Assessment in the Chemistry Classroom

    ERIC Educational Resources Information Center

    Haroldson, Rachelle Ann

    2012-01-01

    Research on formative assessment has focused on the ways teachers implement and use formative assessment to check student understanding in order to guide their instruction. This study shifted emphasis away from teachers to look at how students use and perceive formative assessment in the science classroom. Four key strategies of formative…

  3. Hitting the Reset Button: Using Formative Assessment to Guide Instruction

    ERIC Educational Resources Information Center

    Dirksen, Debra J.

    2011-01-01

    Using formative assessment gives students a second chance to learn material they didn't master the first time around. It lets failure become a learning experience rather than something to fear. Several types of formative assessment are discussed, including how to use summative assessments formatively. (Contains 2 figures.)

  4. A Socio-Cultural Theorisation of Formative Assessment

    ERIC Educational Resources Information Center

    Pryor, John; Crossouard, Barbara

    2008-01-01

    Formative assessment has attracted increasing attention from both practitioners and scholars over the last decade. This paper draws on the authors' empirical research conducted over eleven years in educational situations ranging from infant schools to postgraduate education to propose a theorisation of formative assessment. Formative assessment is…

  5. Development of a Simple 12-Item Theory-Based Instrument to Assess the Impact of Continuing Professional Development on Clinical Behavioral Intentions

    PubMed Central

    Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

    2014-01-01

    Background Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Methods and Findings Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. Conclusion A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral

  6. A Faculty Toolkit for Formative Assessment in Pharmacy Education

    PubMed Central

    Alston, Greg L.; Bird, Eleanora; Buring, Shauna M.; Kelley, Katherine A.; Murphy, Nanci L.; Schlesselman, Lauren S.; Stowe, Cindy D.; Szilagyi, Julianna E.

    2014-01-01

    This paper aims to increase understanding and appreciation of formative assessment and its role in improving student outcomes and the instructional process, while educating faculty on formative techniques readily adaptable to various educational settings. Included are a definition of formative assessment and the distinction between formative and summative assessment. Various formative assessment strategies to evaluate student learning in classroom, laboratory, experiential, and interprofessional education settings are discussed. The role of reflective writing and portfolios, as well as the role of technology in formative assessment, are described. The paper also offers advice for formative assessment of faculty teaching. In conclusion, the authors emphasize the importance of creating a culture of assessment that embraces the concept of 360-degree assessment in both the development of a student’s ability to demonstrate achievement of educational outcomes and a faculty member’s ability to become an effective educator. PMID:26056399

  7. Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

    ERIC Educational Resources Information Center

    Saß, Steffani; Schütte, Kerstin

    2016-01-01

    Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…

  8. Reducing the Item Number to Obtain Same-Length Self-Assessment Scales: A Systematic Approach Using Result of Graphical Loglinear Rasch Modeling

    ERIC Educational Resources Information Center

    Nielsen, Tine; Kreiner, Svend

    2011-01-01

    The Revised Danish Learning Styles Inventory (R-D-LSI) (Nielsen 2005), which is an adaptation of Sternberg-Wagner Thinking Styles Inventory (Sternberg, 1997), comprises 14 subscales, each measuring a separate learning style. Of these 14 subscales, 9 are eight items long and 5 are seven items long. For self-assessment, self-scoring and…

  9. Formative Assessment Probes: Is It Melting? Formative Assessment for Teacher Learning

    ERIC Educational Resources Information Center

    Keeley, Page

    2013-01-01

    Formative assessment probes are effective tools for uncovering students' ideas about the various concepts they encounter when learning science. They are used to build a bridge from where the student is in his or her thinking to where he or she needs to be in order to construct and understand the scientific explanation for observed phenomena.…

  10. Written formative assessment and silence in the classroom

    NASA Astrophysics Data System (ADS)

    Lee Hang, Desmond Mene; Bell, Beverley

    2015-09-01

    In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the account of assessment practices in the Chinese culture given by Yin and Buck. Secondly, we document the cultural practice of silence in Samoan classroom's which has lead to the use of written formative assessment as in the Yin and Buck article. We also discuss the use of written formative assessment as a scaffold for teacher development for formative assessment. Finally, we briefly discuss both studies on formative assessment as a sociocultural practice.

  11. Development and calibration of an item bank for the assessment of activities of daily living in cardiovascular patients using Rasch analysis

    PubMed Central

    2013-01-01

    Background To develop and calibrate the activities of daily living item bank (ADLib-cardio) as a prerequisite for a Computer-adaptive test (CAT) for the assessment of ADL in patients with cardiovascular diseases (CVD). Methods After pre-testing for relevance and comprehension a pool of 181 ADL items were answered on a five-point Likert scale by 720 CVD patients, who were recruited in fourteen German cardiac rehabilitation centers. To verify that the relationship between the items is due to one factor, a confirmatory factor analysis (CFA) was conducted. A Mokken analysis was computed to examine the double monotonicity (i.e. every item generates an equivalent order of person traits, and every person generates an equivalent order of item difficulties). Finally, a Rasch analysis based on the partial credit model was conducted to test for unidimensionality and to calibrate the item bank. Results Results of CFA and Mokken analysis confirmed a one factor structure and double monotonicity. In Rasch analysis, merging response categories and removing items with misfit, differential item functioning or local response dependency reduced the ADLib-cardio to 33 items. The ADLib-cardio fitted to the Rasch model with a nonsignificant item-trait interaction (chi-square=105.42, df=99; p=0.31). Person-separation reliability was 0.81 and unidimensionality could be verified. Conclusions The ADLib-cardio is the first calibrated, unidimensional item bank that allows for the assessment of ADL in rehabilitation patients with CVD. As such, it provides the basis for the development of a CAT for the assessment of ADL in patients with cardiovascular diseases. Calibrating the ADLib-cardio in other than rehabilitation cardiovascular patient settings would further increase its generalizability. PMID:23914735

  12. A Comparison of Methods for Estimating Conditional Item Score Differences in Differential Item Functioning (DIF) Assessments. Research Report. ETS RR-10-15

    ERIC Educational Resources Information Center

    Moses, Tim; Miao, Jing; Dorans, Neil

    2010-01-01

    This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…

  13. Conjoint Community Resiliency Assessment Measure-28/10 items (CCRAM28 and CCRAM10): A self-report tool for assessing community resilience.

    PubMed

    Leykin, Dmitry; Lahad, Mooli; Cohen, Odeya; Goldberg, Avishay; Aharonson-Daniel, Limor

    2013-12-01

    Community resilience is used to describe a community's ability to deal with crises or disruptions. The Conjoint Community Resiliency Assessment Measure (CCRAM) was developed in order to attain an integrated, multidimensional instrument for the measurement of community resiliency. The tool was developed using an inductive, exploratory, sequential mixed methods design. The objective of the present study was to portray and evaluate the CCRAM's psychometric features. A large community sample (N = 1,052) were assessed by the CCRAM tool, and the data was subjected to exploratory and confirmatory factor analysis. A Five factor model (21 items) was obtained, explaining 67.67 % of the variance. This scale was later reduced to 10-item brief instrument. Both scales showed good internal consistency coefficients (α = .92 and α = .85 respectively), and acceptable fit indices to the data. Seven additional items correspond to information requested by leaders, forming the CCRAM28. The CCRAM has been shown to be an acceptable practical tool for assessing community resilience. Both internal and external validity have been demonstrated, as all factors obtained in the factor analytical process, were tightly linked to previous literature on community resilience. The CCRAM facilitates the estimation of an overall community resiliency score but furthermore, it detects the strength of five important constructs of community function following disaster: Leadership, Collective Efficacy, Preparedness, Place Attachment and Social Trust. Consequently, the CCRAM can serve as an aid for community leaders to assess, monitor, and focus actions to enhance and restore community resilience for crisis situations. PMID:24091563

  14. Written Formative Assessment and Silence in the Classroom

    ERIC Educational Resources Information Center

    Lee Hang, Desmond Mene; Bell, Beverley

    2015-01-01

    In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the…

  15. Exploring Elementary Teachers' Implementation of Formative Assessment Practices for Reading

    ERIC Educational Resources Information Center

    Richardson, Irving

    2010-01-01

    The purpose of the study was to determine whether or not elementary classroom teachers' exploration of an integrated theoretical model of formative assessment would change participants' understandings of formative assessment and whether or not participants would apply this newly-acquired knowledge to their classroom assessment practices. After…

  16. Making Room for Formative Assessment Processes: A Multiple Case Study

    ERIC Educational Resources Information Center

    McEntarffer, Robert E.

    2012-01-01

    This qualitative instrumental multiple case study (Stake, 2005) explored how teachers made room for formative assessment processes in their classrooms, and how thinking about assessment changed during those formative assessment experiences. Data were gathered from six teachers over three months and included teacher interviews, student interviews,…

  17. The School Age Gender Gap in Reading Achievement: Examining the Influences of Item Format and Intrinsic Reading Motivation

    ERIC Educational Resources Information Center

    Schwabe, Franziska; McElvany, Nele; Trendtel, Matthias

    2015-01-01

    The importance of reading competence for both individuals and society underlines the strong need to understand the gender gap in reading achievement. Beyond mean differences in reading comprehension, research has indicated that girls possess specific advantages on constructed-response items compared with boys of the same reading ability. Moreover,…

  18. The 4-Item Negative Symptom Assessment (NSA-4) Instrument: A Simple Tool for Evaluating Negative Symptoms in Schizophrenia Following Brief Training.

    PubMed

    Alphs, Larry; Morlock, Robert; Coon, Cheryl; van Willigenburg, Arjen; Panagides, John

    2010-07-01

    Objective. To assess the ability of mental health professionals to use the 4-item Negative Symptom Assessment instrument, derived from the Negative Symptom Assessment-16, to rapidly determine the severity of negative symptoms of schizophrenia.Design. Open participation.Setting. Medical education conferences.Participants. Attendees at two international psychiatry conferences.Measurements. Participants read a brief set of the 4-item Negative Symptom Assessment instructions and viewed a videotape of a patient with schizophrenia. Using the 1 to 6 4-item Negative Symptom Assessment severity rating scale, they rated four negative symptom items and the overall global negative symptoms. These ratings were compared with a consensus rating determination using frequency distributions and Chi-square tests for the proportion of participant ratings that were within one point of the expert rating.Results. More than 400 medical professionals (293 physicians, 50% with a European practice, and 55% who reported past utilization of schizophrenia ratings scales) participated. Between 82.1 and 91.1 percent of the 4-items and the global rating determinations by the participants were within one rating point of the consensus expert ratings. The differences between the percentage of participant rating scores that were within one point versus the percentage that were greater than one point different from those by the consensus experts was significant (p<0.0001). Participants rating of negative symptoms using the 4-item Negative Symptom Assessment did not generally differ among the geographic regions of practice, the professional credentialing, or their familiarity with the use of schizophrenia symptom rating instruments.Conclusion. These findings suggest that clinicians from a variety of geographic practices can, after brief training, use the 4-item Negative Symptom Assessment effectively to rapidly assess negative symptoms in patients with schizophrenia. PMID:20805916

  19. Motivating student learning using a formative assessment journey.

    PubMed

    Evans, Darrell J R; Zeun, Paul; Stanier, Robert A

    2014-03-01

    Providing formative assessment opportunities has been recognised as a significant benefit to student learning. The outcome of any formative assessment should be one that ultimately helps improve student learning through familiarising students with the levels of learning required, informing them about gaps in their learning and providing feedback to guide the direction of learning. This article provides an example of how formative assessments can be developed into a formative assessment journey where a number of different assessments can be offered to students during the course of a module of teaching, thus utilising a spaced-education approach. As well as incorporating the specific drivers of formative assessment, we demonstrate how approaches deemed to be stimulating, interactive and entertaining with the aim of maximising enthusiasm and engagement can be incorporated. We provide an example of a mixed approach to evaluating elements of the assessment journey that focuses student reaction, appraisal of qualitative and quantitative feedback from student questionnaires, focus group analysis and teacher observations. Whilst it is not possible to determine a quantifiable effect of the assessment journey on student learning, usage data and student feedback shows that formative assessment can achieve high engagement and positive response to different assessments. Those assessments incorporating an active learning element and a quiz-based approach appear to be particularly popular. A spaced-education format encourages a building block approach to learning that is continuous in nature rather than focussed on an intense period of study prior to summative examinations. PMID:24111930

  20. Motivating student learning using a formative assessment journey

    PubMed Central

    Evans, Darrell J R; Zeun, Paul; Stanier, Robert A

    2014-01-01

    Providing formative assessment opportunities has been recognised as a significant benefit to student learning. The outcome of any formative assessment should be one that ultimately helps improve student learning through familiarising students with the levels of learning required, informing them about gaps in their learning and providing feedback to guide the direction of learning. This article provides an example of how formative assessments can be developed into a formative assessment journey where a number of different assessments can be offered to students during the course of a module of teaching, thus utilising a spaced-education approach. As well as incorporating the specific drivers of formative assessment, we demonstrate how approaches deemed to be stimulating, interactive and entertaining with the aim of maximising enthusiasm and engagement can be incorporated. We provide an example of a mixed approach to evaluating elements of the assessment journey that focuses student reaction, appraisal of qualitative and quantitative feedback from student questionnaires, focus group analysis and teacher observations. Whilst it is not possible to determine a quantifiable effect of the assessment journey on student learning, usage data and student feedback shows that formative assessment can achieve high engagement and positive response to different assessments. Those assessments incorporating an active learning element and a quiz-based approach appear to be particularly popular. A spaced-education format encourages a building block approach to learning that is continuous in nature rather than focussed on an intense period of study prior to summative examinations. PMID:24111930

  1. Multilevel Item Response Modeling: Applications to Large-Scale Assessment of Academic Achievement

    ERIC Educational Resources Information Center

    Zheng, Xiaohui

    2009-01-01

    The call for standards-based reform and educational accountability has led to increased attention to large-scale assessments. Over the past two decades, large-scale assessments have been providing policymakers and educators with timely information about student learning and achievement to facilitate their decisions regarding schools, teachers and…

  2. Exploring Plausible Causes of Differential Item Functioning in the PISA Science Assessment: Language, Curriculum or Culture

    ERIC Educational Resources Information Center

    Huang, Xiaoting; Wilson, Mark; Wang, Lei

    2016-01-01

    In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the…

  3. Category Scoring Techniques from National Assessment: Applications to Free Response Items from Career and Occupational Development.

    ERIC Educational Resources Information Center

    Phillips, Donald L.

    The Career and Occupational Development (COD) assessment of the National Assessment of Educational Progress (NAEP) was made up of about 70 percent free response exercises requiring hand scoring. This paper describes the techniques used in developing the "scoring guides" for these exercises and summarizes the results of two empirical studies of the…

  4. Virginia Standards of Learning Assessments. Grade 8 Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that approximately 79,000 students in grade 8 were required to answer as part of the SOL assessments. These…

  5. Virginia Standards of Learning Assessments. Grade 3 Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) Assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that approximately 83,000 students in grade 3 were required to answer as part of the SOL assessments. These…

  6. Virginia Standards of Learning Assessments. Grade 5 Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that approximately 80,000 students in grade 5 were required to answer as part of the SOL assessments. These…

  7. Virginia Standards of Learning Assessments. End of Course Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that students were required to answer as part of the SOL End-of-Course assessments. These questions are…

  8. A MULTIDIMENSIONAL ASSESSMENT OF THE VALIDITY AND UTILITY OF ALCOHOL USE DISORDER SEVERITY AS DETERMINED BY ITEM RESPONSE THEORY MODELS

    PubMed Central

    Dawson, Deborah A.; Saha, Tulshi D.; Grant, Bridget F.

    2010-01-01

    Background The relative severity of the 11 DSM-IV alcohol use disorder (AUD) criteria are represented by their severity threshold scores, an item response theory (IRT) model parameter inversely proportional to their prevalence. These scores can be used to create a continuous severity measure comprising the total number of criteria endorsed, each weighted by its relative severity. Methods This paper assesses the validity of the severity ranking of the 11 criteria and the overall severity score with respect to known AUD correlates, including alcohol consumption, psychological functioning, family history, antisociality, and early initiation of drinking, in a representative population sample of U.S. past-year drinkers (n=26,946). Results The unadjusted mean values for all validating measures increased steadily with the severity threshold score, except that legal problems, the criterion with the highest score, was associated with lower values than expected. After adjusting for the total number of criteria endorsed, this direct relationship was no longer evident. The overall severity score was no more highly correlated with the validating measures than a simple count of criteria endorsed, nor did the two measures yield different risk curves. This reflects both within-criterion variation in severity and the fact that the number of criteria endorsed and their severity are so highly correlated that severity is essentially redundant. Conclusions Attempts to formulate a scalar measure of AUD will do as well by relying on simple counts of criteria or symptom items as by using scales weighted by IRT measures of severity. PMID:19782481

  9. Formative Assessment in the Visual Arts

    ERIC Educational Resources Information Center

    Andrade, Heidi; Hefferen, Joanna; Palma, Maria

    2014-01-01

    Classroom assessment is a hot topic in K-12 education because of compelling evidence that assessment in the form of feedback is a powerful teaching and learning tool (Hattie & Timperley, 2007). Although formal evaluation has been anathema to many art specialists and teachers (Colwell, 2004), informal assessment in the form of feedback is not.…

  10. Formative Assessment Jump-Starts a Middle Grades Differentiation Initiative

    ERIC Educational Resources Information Center

    Doubet, Kristina J.

    2012-01-01

    A rural middle level school had stalled in its third year of a district-wide differentiation initiative. This article describes the way teachers and the leadership team engaged in collaborative practices to put a spotlight on formative assessment. Teachers learned to systematically gather formative assessment data from their students and to use…

  11. Construct Validity in Formative Assessment: Purpose and Practices

    ERIC Educational Resources Information Center

    Rix, Samantha

    2012-01-01

    This paper examines the utilization of construct validity in formative assessment for classroom-based purposes. Construct validity pertains to the notion that interpretations are made by educators who analyze test scores during formative assessment. The purpose of this paper is to note the challenges that educators face when interpreting these…

  12. Revisiting the Impact of Formative Assessment Opportunities on Student Learning

    ERIC Educational Resources Information Center

    Peat, Mary; Franklin, Sue; Devlin, Marcia; Charles, Margaret

    2005-01-01

    This project developed as a result of some inconclusive data from an investigation of whether a relationship existed between the use of formative assessment opportunities and performance, as measured by final grade. We were expecting to show our colleagues and students that use of formative assessment resources had the potential to improve…

  13. Connected Classroom Technology Facilitates Multiple Components of Formative Assessment Practice

    ERIC Educational Resources Information Center

    Shirley, Melissa L.; Irving, Karen E.

    2015-01-01

    Formative assessment has been demonstrated to result in increased student achievement across a variety of educational contexts. When using formative assessment strategies, teachers engage students in instructional tasks that allow the teacher to uncover levels of student understanding so that the teacher may change instruction accordingly. Tools…

  14. Formative Assessment and Teachers' Sensitivity to Student Responses

    ERIC Educational Resources Information Center

    Haug, Berit S.; Ødegaard, Marianne

    2015-01-01

    Formative assessment, and especially feedback, is considered essential to student learning. To provide effective feedback, however, teachers must act upon the information that students reveal during instruction. In this study, we apply a framework of formative assessment to explore how sensitive teachers are to students' thoughts and ideas when…

  15. Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

    ERIC Educational Resources Information Center

    Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

    2016-01-01

    The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…

  16. Psychometric Evaluation of 5- and 4-Item Versions of the LATCH Breastfeeding Assessment Tool during the Initial Postpartum Period among a Multiethnic Population

    PubMed Central

    Htun, Tha Pyai; Lim, Peng Im; Ho-Lim, Sarah

    2016-01-01

    Objectives The aim of this study was to evaluate the internal consistency, structural validity, sensitivity and specificity of the 5- and 4-item versions of the LATCH assessment tool among a multiethnic population in Singapore. Methods The study was a secondary analysis of a subset of data (n = 907) from our previous breastfeeding survey from 2013 to 2014. The internal consistency of the LATCH was examined using Cronbach’s alpha. The structural validity was assessed using an exploratory factor analysis (EFA), and the proposed factors were confirmed by confirmatory factor analysis (CFA) using separate samples. Receiver operating characteristic analysis was used to evaluate the sensitivity and specificity of the LATCH score thresholds for predicting non-exclusive breastfeeding. Results The Cronbach’s alpha values of the 5- and 4-item LATCH assessments were 0.70 and 0.74, respectively. The EFA demonstrated a one-factor structure for the 5- and 4-item LATCH assessments among a randomized split of 334 vaginally delivered women. Two CFA of the 4-item LATCH demonstrated better fit indices of the models compared to the two CFA of the 5-item LATCH among another randomized split of 335 vaginally delivered women and 238 cesarean delivered women. Using cutoffs of 5.5 and 3.5 were recommended when predicting non-exclusive breastfeeding for 5- and 4-item versions of the LATCH assessment among vaginally delivered women (n = 669), with satisfactory sensitivities (94% and 95%), low specificities (0% and 2%), low positive predictive values (25%) and negative predictive values (20% and 47%). A cutoff of 5.5 was recommended to predict non-exclusive breastfeeding for 5- and 4-item versions among cesarean delivered women (n = 238) with satisfactory sensitivities (93% and 98%), low specificities (4% and 9%), low positive predictive values (41%) and negative predictive values (65% and 75%). Therefore, the tool has good sensitivity but poor specificity, positive and negative predictive

  17. A Third-Order Item Response Theory Model for Modeling the Effects of Domains and Subdomains in Large-Scale Educational Assessment Surveys

    ERIC Educational Resources Information Center

    Rijmen, Frank; Jeon, Minjeong; von Davier, Matthias; Rabe-Hesketh, Sophia

    2014-01-01

    Second-order item response theory models have been used for assessments consisting of several domains, such as content areas. We extend the second-order model to a third-order model for assessments that include subdomains nested in domains. Using a graphical model framework, it is shown how the model does not suffer from the curse of…

  18. Assessing the Dimensionality of Item Response Matrices with Small Sample Sizes and Short Test Lengths.

    ERIC Educational Resources Information Center

    De Champlain, Andre; Gessaroli, Marc E.

    1998-01-01

    Type I error rates and rejection rates for three-dimensionality assessment procedures were studied with data sets simulated to reflect short tests and small samples. Results show that the G-squared difference test (D. Bock, R. Gibbons, and E. Muraki, 1988) suffered from a severely inflated Type I error rate at all conditions simulated. (SLD)

  19. e-GovQual: A Multiple-Item Scale for Assessing e-Government Service Quality

    ERIC Educational Resources Information Center

    Papadomichelaki, Xenia; Mentzas, Gregoris

    2012-01-01

    A critical element in the evolution of governmental services through the internet is the development of sites that better serve the citizens' needs. To deliver superior service quality, we must first understand how citizens perceive and evaluate online. Citizen assessment is built on defining quality, identifying underlying dimensions, and…

  20. Exploring Proficiency-Based vs. Performance-Based Items with Elicited Imitation Assessment

    ERIC Educational Resources Information Center

    Cox, Troy L.; Bown, Jennifer; Burdis, Jacob

    2015-01-01

    This study investigates the effect of proficiency- vs. performance-based elicited imitation (EI) assessment. EI requires test-takers to repeat sentences in the target language. The accuracy at which test-takers are able to repeat sentences highly correlates with test-takers' language proficiency. However, in EI, the factors that render an item…

  1. Psychometrical Assessment and Item Analysis of the General Health Questionnaire in Victims of Terrorism

    ERIC Educational Resources Information Center

    Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

    2013-01-01

    There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and…

  2. Using Systematic Item Selection Methods to Improve Universal Design of Assessments. Policy Directions. Number 18

    ERIC Educational Resources Information Center

    Johnstone, Christopher; Thurlow, Martha; Moore, Michael; Altman, Jason

    2006-01-01

    The No Child Left Behind Act of 2001 (NCLB) and other recent changes in federal legislation have placed greater emphasis on accountability in large-scale testing. Included in this emphasis are regulations that require assessments to be accessible. States are accountable for the success of all students, and tests should be designed in a way that…

  3. Test Item Construction and Validation: Developing a Statewide Assessment for Agricultural Science Education

    ERIC Educational Resources Information Center

    Rivera, Jennifer E.

    2011-01-01

    The State of New York Agriculture Science Education secondary program is required to have a certification exam for students to assess their agriculture science education experience as a Regent's requirement towards graduation. This paper focuses on the procedure used to develop and validate two content sub-test questions within a…

  4. Informal Formative Assessment: The Role of Instructional Dialogues in Assessing Students' Learning

    ERIC Educational Resources Information Center

    Ruiz-Primo, Maria Araceli

    2011-01-01

    This paper focuses on an unceremonious type of formative assessment--"informal formative assessment"--in which much of what teachers and students do in the classroom can be described as potential assessments that can provide evidence about the students' level of understanding. More specifically, the paper focuses on assessment conversations, or…

  5. Does Computer-Aided Formative Assessment Improve Learning Outcomes?

    ERIC Educational Resources Information Center

    Hannah, John; James, Alex; Williams, Phillipa

    2014-01-01

    Two first-year engineering mathematics courses used computer-aided assessment (CAA) to provide students with opportunities for formative assessment via a series of weekly quizzes. Most students used the assessment until they achieved very high (>90%) quiz scores. Although there is a positive correlation between these quiz marks and the final…

  6. Using Concept Cartoons in Formative Assessment: Scaffolding Students' Argumentation

    ERIC Educational Resources Information Center

    Chin, Christine; Teou, Lay-Yen

    2009-01-01

    The purpose of this study was to investigate how concept cartoons, together with other diagnostic and scaffolding tools, could be used in formative assessment, to stimulate talk and argumentation among students in small groups, as part of peer-assessment and self-assessment; and to provide diagnostic feedback about students' misconceptions to the…

  7. Formative Assessment Probes: How Far Did It Go?

    ERIC Educational Resources Information Center

    Keeley, Page

    2011-01-01

    Assessment serves many purposes in the elementary classroom. Formative assessment, often called assessment for learning, is characterized by its primary purpose--promoting learning. It takes place both formally and informally, is embedded in various stages of an instructional cycle, informs the teacher about appropriate next steps for instruction,…

  8. Mathematics Formative Assessment: 75 Practical Strategies for Linking Assessment, Instruction, and Learning

    ERIC Educational Resources Information Center

    Keeley, Page; Tobey, Cheryl Rose

    2011-01-01

    Award-winning author Page Keeley and mathematics expert Cheryl Rose Tobey apply the successful format of Keeley's best-selling "Science Formative Assessment" to mathematics. They provide 75 formative assessment strategies and show teachers how to use them to inform instructional planning and better meet the needs of all students. Research shows…

  9. Determining if Active Learning through a Formative Assessment Process Translates to Better Performance in Summative Assessment

    ERIC Educational Resources Information Center

    Grosas, Aidan Bradley; Raju, Shiwani Rani; Schuett, Burkhardt Siegfried; Chuck, Jo-Anne; Millar, Thomas James

    2016-01-01

    Formative assessment used in a level 2 unit, Immunology, gave outcomes that were both surprising and applicable across disciplines. Four formative tests were given and reviewed during class time. The students' attitudes to formative assessment were evaluated using questionnaires and its effectiveness in closing the gap was measured by the…

  10. Assessing reliability and validity of the Arabic language version of the Post-traumatic Diagnostic Scale (PDS) symptom items.

    PubMed

    Norris, Anne E; Aroian, Karen J

    2008-09-30

    Arab immigrant women are vulnerable to post-traumatic stress disorder (PTSD) because of gender, higher probability of being exposed to war-related violence, traditional cultural values, and immigration stressors. A valid and reliable screen is needed to assess PTSD incidence in this population. This study evaluated the reliability and validity of an Arabic language version of the symptom items in Foa et al.'s [Foa, E.B., Cashman, L., Jaycox, L., and Perry, K. 1997. The validation of a self report measure of posttraumatic stress disorder: the Posttraumatic Diagnostic Scale. Psychological Assessment 9(4), 445-451]. Post-traumatic Diagnostic Scale (PDS) in a sample of Arab immigrant women (n=453). Reliability was supported by Cronbach's alpha values for the Arabic language version (0.93) and its subscales (0.77-0.91). Results of group comparisons supported validity: Women who had lived in a refugee camp or emigrated from Iraq - a country where exposure to war and torture is common - were exhibiting depressive symptoms (Center for Epidemiological Studies-Depression Scale (CES-D) score above 18), or reported moderately to severely impaired functioning had significantly higher mean PDS total and symptom subscale scores than women who had not had these experiences or were not exhibiting depressive symptoms. Scores on the PDS and its subscales were also positively correlated with the Profile of Mood States (POMS) depression and anxiety subscales and negatively correlated with the POMS vigor subscale (r=-.29 to-.39). PMID:18718671