chemistry test item: Topics by Science.gov

Sample records for chemistry test item

ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data

ERIC Educational Resources Information Center

Magno, Carlo

2009-01-01

The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

ERIC Educational Resources Information Center

Domyancich, John M.

2014-01-01

Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…
Chemistry, Grades 7-12. Annotated Bibliography of Tests.

ERIC Educational Resources Information Center

Educational Testing Service, Princeton, NJ. Test Collection.

Among the 22 tests cited in this bibliography are end-of-course tests, tests of the American Chemical Society, and chemistry item banks. This document is one in a series of topical bibliographies from the Test Collection (TC) at the Educational Testing Service (ETS) containing descriptions of more than 18,000 tests and other measurement devices…
A Valid and Reliable Instrument for Cognitive Complexity Rating Assignment of Chemistry Exam Items

ERIC Educational Resources Information Center

Knaus, Karen; Murphy, Kristen; Blecking, Anja; Holme, Thomas

2011-01-01

The design and use of a valid and reliable instrument for the assignment of cognitive complexity ratings to chemistry exam items is described in this paper. Use of such an instrument provides a simple method to quantify the cognitive demands of chemistry exam items. Instrument validity was established in two different ways: statistically…
Answering Fixed Response Items in Chemistry: A Pilot Study.

ERIC Educational Resources Information Center

Hateley, R. J.

1979-01-01

Presents a pilot study on student thinking in chemistry. Verbal comments of a group of six college students were recorded and analyzed to identify how each student arrives at the correct answer in fixed response items in chemisty. (HM)
Science Library of Test Items. Volume Twenty-Three. Geology (Part One). Free Response Testing Program.

ERIC Educational Resources Information Center

Hopley, Ken; And Others

The first of several planned volumes of Free Response Test Items contains geology questions developed by the Assessment and Evaluation Unit of the New South Wales Department of Education. Two additional geology volumes and biology and chemistry volumes are in preparation. The questions in this volume were written and reviewed by practicing…
Evolution of a Test Item

ERIC Educational Resources Information Center

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
New decision criteria for selecting delta check methods based on the ratio of the delta difference to the width of the reference range can be generally applicable for each clinical chemistry test item.

PubMed

Park, Sang Hyuk; Kim, So-Young; Lee, Woochang; Chun, Sail; Min, Won-Ki

2012-09-01

Many laboratories use 4 delta check methods: delta difference, delta percent change, rate difference, and rate percent change. However, guidelines regarding decision criteria for selecting delta check methods have not yet been provided. We present new decision criteria for selecting delta check methods for each clinical chemistry test item. We collected 811,920 and 669,750 paired (present and previous) test results for 27 clinical chemistry test items from inpatients and outpatients, respectively. We devised new decision criteria for the selection of delta check methods based on the ratio of the delta difference to the width of the reference range (DD/RR). Delta check methods based on these criteria were compared with those based on the CV% of the absolute delta difference (ADD) as well as those reported in 2 previous studies. The delta check methods suggested by new decision criteria based on the DD/RR ratio corresponded well with those based on the CV% of the ADD except for only 2 items each in inpatients and outpatients. Delta check methods based on the DD/RR ratio also corresponded with those suggested in the 2 previous studies, except for 1 and 7 items in inpatients and outpatients, respectively. The DD/RR method appears to yield more feasible and intuitive selection criteria and can easily explain changes in the results by reflecting both the biological variation of the test item and the clinical characteristics of patients in each laboratory. We suggest this as a measure to determine delta check methods.
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Criterion-Referenced Test Items for Welding.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
Test Bias: An Objective Definition for Test Items.

ERIC Educational Resources Information Center

Durovic, Jerry J.

A test bias definition, applicable at the item-level of a test is presented. The definition conceptually equates test bias with measuring different things in different groups, and operationally equates test bias with a difference in item fit to the Rasch Model, greater than one, between groups. It is suggested that the proposed definition avoids…
Interactions Between Item Content And Group Membership on Achievement Test Items.

ERIC Educational Resources Information Center

Linn, Robert L.; Harnisch, Delwyn L.

The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
Agriculture Library of Test Items.

ERIC Educational Resources Information Center

Sutherland, Duncan, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Validity and Reliability Testing of an e-learning Questionnaire for Chemistry Instruction

NASA Astrophysics Data System (ADS)

Guspatni, G.; Kurniawati, Y.

2018-04-01

The aim of this paper is to examine validity and reliability of a questionnaire used to evaluate e-learning implementation in chemistry instruction. 48 questionnaires were filled in by students who had studied chemistry through e-learning system. The questionnaire consisted of 20 indicators evaluating students’ perception on using e-learning. Parametric testing was done as data were assumed to follow normal distribution. Item validity of the questionnaire was examined through item-total correlation using Pearson’s formula while its reliability was assessed with Cronbach’s alpha formula. Moreover, convergent validity was assessed to see whether indicators building a factor had theoretically the same underlying construct. The result of validity testing revealed 19 valid indicators while the result of reliability testing revealed Cronbach’s alpha value of .886. The result of factor analysis showed that questionnaire consisted of five factors, and each of them had indicators building the same construct. This article shows the importance of factor analysis to get a construct valid questionnaire before it is used as research instrument.

Item difficulty and item validity for the Children's Group Embedded Figures Test.

PubMed

Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

1994-02-01

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
Item Analysis in Introductory Economics Testing.

ERIC Educational Resources Information Center

Tinari, Frank D.

1979-01-01

Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
Computerized Numerical Control Test Item Bank.

ERIC Educational Resources Information Center

Reneau, Fred; And Others

This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
Criterion-Referenced Test Items for Small Engines.

ERIC Educational Resources Information Center

Herd, Amon

This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Formative Assessment in High School Chemistry Teaching: Investigating the Alignment of Teachers' Goals with Their Items

ERIC Educational Resources Information Center

Sandlin, Benjamin; Harshman, Jordan; Yezierski, Ellen

2015-01-01

A 2011 report by the Department of Education states that understanding how teachers use results from formative assessments to guide their practice is necessary to improve instruction. Chemistry teachers have goals for items in their formative assessments, but the degree of alignment between what is assessed by these items and the teachers' goals…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Unidimensional Interpretations for Multidimensional Test Items

ERIC Educational Resources Information Center

Kahraman, Nilufer

2013-01-01

This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

ERIC Educational Resources Information Center

Penfield, Randall D.

2006-01-01

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Some Effects of Changes in Question Structure and Sequence on Performance in a Multiple Choice Chemistry Test.

ERIC Educational Resources Information Center

Hodson, D.

1984-01-01

Investigated the effect on student performance of changes in question structure and sequence on a GCE 0-level multiple-choice chemistry test. One finding noted is that there was virtually no change in test reliability on reducing the number of options (from five to per test item). (JN)
Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2013-01-01

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
Detecting Gender Bias Through Test Item Analysis

NASA Astrophysics Data System (ADS)

González-Espada, Wilson J.

2009-03-01

Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Optimal Test Design with Rule-Based Item Generation

ERIC Educational Resources Information Center

Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.

2013-01-01

Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
Electronics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Science Library of Test Items. Volume Two.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…

An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
Machine Shop. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…
Criterion-Referenced Test Items for Auto Body.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This test item bank on auto body repair contains criterion-referenced test questions based upon competencies found in the Missouri Auto Body Competency Profile. Some test items are keyed for multiple competencies. The tests cover the following 26 competency areas in the auto body curriculum: auto body careers; measuring and mixing; tools and…
Optimal Bayesian Adaptive Design for Test-Item Calibration.

PubMed

van der Linden, Wim J; Ren, Hao

2015-06-01

An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
A Process for Reviewing and Evaluating Generated Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2016-01-01

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Mathematics Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Fraser, Graham, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from previous tests are made available to teachers for the construction of pretests or posttests, reference tests for inter-class comparisons and general assignments. The collection was reviewed for content…
Geography Library of Test Items. Volume Four.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Three.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Meeve, Brian, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Five.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume Two.

ERIC Educational Resources Information Center

Meeve, Brian, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Six.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography: Library of Test Items. Volume II.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

PubMed

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
The Selection of Test Items for Decision Making with a Computer Adaptive Test.

ERIC Educational Resources Information Center

Spray, Judith A.; Reckase, Mark D.

The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests

ERIC Educational Resources Information Center

Veldkamp, Bernard P.

2010-01-01

Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

ERIC Educational Resources Information Center

Wetzel, C. Douglas; McBride, James R.

Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…
Detecting Item Drift in Large-Scale Testing

ERIC Educational Resources Information Center

Guo, Hongwen; Robin, Frederic; Dorans, Neil

2017-01-01

The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on-demand testing. Building on existing residual analyses, the authors propose…
Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

ERIC Educational Resources Information Center

van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.

2006-01-01

Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

ERIC Educational Resources Information Center

Baghaei, Purya; Ravand, Hamdollah

2016-01-01

In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Ethnic Group Bias in Intelligence Test Items.

ERIC Educational Resources Information Center

Scheuneman, Janice

In previous studies of ethnic group bias in intelligence test items, the question of bias has been confounded with ability differences between the ethnic group samples compared. The present study is based on a conditional probability model in which an unbiased item is defined as one where the probability of a correct response to an item is the…
Item Response Theory Models for Performance Decline during Testing

ERIC Educational Resources Information Center

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Testing enhances both encoding and retrieval for both tested and untested items.

PubMed

Cho, Kit W; Neely, James H; Crocco, Stephanie; Vitrano, Deana

2017-07-01

In forward testing effects, taking a test enhances memory for subsequently studied material. These effects have been observed for previously studied and tested items, a potentially item-specific testing effect, and newly studied untested items, a purely generalized testing effect. We directly compared item-specific and generalized forward testing effects using procedures to separate testing benefits due to encoding versus retrieval. Participants studied two lists of Swahili-English word pairs, with the second study list containing "new" pairs intermixed with the previously studied "old" pairs. Participants completed a review phase in which they took a cued-recall test on only the "old" pairs or restudied them. In Experiments 1a, 1b, and 2, the review phase was given either before or after the second study list. Testing benefited memory to the same degree for both "new" and "old" pairs, suggesting that there were no pair-specific benefits of testing. The larger benefit from testing when review was given before rather than after the second study list suggests that the memory enhancement was due to both testing-enhanced encoding and testing-enhanced retrieval. To better equate generalized testing effects for "new" and "old" pairs, Experiment 3 intermixed them in the review phase. A statistically significant pair-specific testing effect for "old" items was now observed. Overall, these results show that forward testing effects are due to both testing-enhanced encoding and retrieval effects and that direct, pair-specific forward testing benefits are considerably smaller than indirect, generalized forward testing benefits.
A Chemistry Concept Reasoning Test

ERIC Educational Resources Information Center

Cloonan, Carrie A.; Hutchinson, John S.

2011-01-01

A Chemistry Concept Reasoning Test was created and validated providing an easy-to-use tool for measuring conceptual understanding and critical scientific thinking of general chemistry models and theories. The test is designed to measure concept understanding comparable to that found in free-response questions requiring explanations over…
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Chemistry, College Level. Annotated Bibliography of Tests.

ERIC Educational Resources Information Center

Educational Testing Service, Princeton, NJ. Test Collection.

Most of the 30 tests cited in this bibliography are those of the American Chemical Society. Subjects covered include physical chemistry, organic chemistry, inorganic chemistry, analytical chemistry, and other specialized areas. The tests are designed only for advanced high school, and both bachelor/graduate degree levels of college students. This…
Samejima Items in Multiple-Choice Tests: Identification and Implications

ERIC Educational Resources Information Center

Rahman, Nazia

2013-01-01

Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
Home Science Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Smith, Jan, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Quality of dry chemistry testing.

PubMed

Nakamura, H; Tatsumi, N

1999-01-01

Since the development of the qualitative test paper for urine in 1950s, several kinds of dry-state-reagents and their automated analyzers have been developed. "Dry chemistry" has become to be called since the report on the development of quantitative test paper for serum bilirubin with reflectometer in the end of 1960s and dry chemistry has been world widely known since the presentation on the development of multilayer film reagent for serum biochemical analytes by Eastman Kodak Co at the 10th IFCC Meeting in the end of 1970s. We have reported test menu, results in external quality assessment, merits and demerits, and the future possibilities of dry chemistry.
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

2016-01-01

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Procedures for Selecting Items for Computerized Adaptive Tests.

ERIC Educational Resources Information Center

Kingsbury, G. Gage; Zara, Anthony R.

1989-01-01

Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)
Effect of Differential Item Functioning on Test Equating

ERIC Educational Resources Information Center

Kabasakal, Kübra Atalay; Kelecioglu, Hülya

2015-01-01

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…

Tree versus Geometric Representation of Tests and Items.

ERIC Educational Resources Information Center

Beller, Michael

1990-01-01

Geometric approaches to representing interrelations among tests and items are compared with an additive tree model (ATM), using 2,644 examinees and 2 other data sets. The ATM's close fit to the data and its coherence of presentation indicate that it is the best means of representing tests and items. (TJH)
The Role of Item Feedback in Self-Adapted Testing.

ERIC Educational Resources Information Center

Roos, Linda L.; And Others

1997-01-01

The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)
Validation of Physics Standardized Test Items

NASA Astrophysics Data System (ADS)

Marshall, Jill

2008-10-01

The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
Investigating Item Exposure Control Methods in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Ozturk, Nagihan Boztunc; Dogan, Nuri

2015-01-01

This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Li, Johnson

2013-01-01

The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
Using Distractor-Driven Standards-Based Multiple-Choice Assessments and Rasch Modeling to Investigate Hierarchies of Chemistry Misconceptions and Detect Structural Problems with Individual Items

ERIC Educational Resources Information Center

Herrmann-Abell, Cari F.; DeBoer, George E.

2011-01-01

Distractor-driven multiple-choice assessment items and Rasch modeling were used as diagnostic tools to investigate students' understanding of middle school chemistry ideas. Ninety-one items were developed according to a procedure that ensured content alignment to the targeted standards and construct validity. The items were administered to 13360…
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

ERIC Educational Resources Information Center

Kim, Jihye; Oshima, T. C.

2013-01-01

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Multistage Computerized Adaptive Testing with Uniform Item Exposure

ERIC Educational Resources Information Center

Edwards, Michael C.; Flora, David B.; Thissen, David

2012-01-01

This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
Omani Twelfth Grade Students' Most Common Misconceptions in Chemistry

ERIC Educational Resources Information Center

Al-Balushi, Sulaiman M.; Ambusaidi, Abdullah K.; Al-Shuaili, Ali H.; Taylor, Neil

2012-01-01

The current study, undertaken in the Sultanate of Oman, explored twelfth grade students' common misconceptions in seven chemistry conceptual areas. The sample included 786 twelfth grade students in Oman while the instrument was a two-tier test called Chemistry Misconceptions Diagnostic Test (CMDT), consisting of 25 items with 12 items…
Effects of Test Item Disclosure on Medical Licensing Examination

ERIC Educational Resources Information Center

Yang, Eunbae B.; Lee, Myung Ae; Park, Yoon Soo

2018-01-01

In 2012, the National Health Personnel Licensing Examination Board of Korea decided to publicly disclose all test items and answers to satisfy the test takers' right to know and enhance the transparency of tests administered by the government. This study investigated the effects of item disclosure on the medical licensing examination (MLE),…
A Procedure To Detect Test Bias Present Simultaneously in Several Items.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…
Languages Library of Test Items. Volume Two: German, Latin.

ERIC Educational Resources Information Center

Campbell, Thomas; And Others

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Languages Library of Test Items. Volume One: French, Indonesian.

ERIC Educational Resources Information Center

Campbell, Thomas; And Others

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Textiles and Design Library of Test Items. Volume I.

ERIC Educational Resources Information Center

Smith, Jan, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
A Quantum Chemistry Concept Inventory for Physical Chemistry Classes

ERIC Educational Resources Information Center

Dick-Perez, Marilu; Luxford, Cynthia J.; Windus, Theresa L.; Holme, Thomas

2016-01-01

A 14-item, multiple-choice diagnostic assessment tool, the quantum chemistry concept inventory or QCCI, is presented. Items were developed based on published student misconceptions and content coverage and then piloted and used in advanced physical chemistry undergraduate courses. In addition to the instrument itself, data from both a pretest,…
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
The Multidimensional Structure of Verbal Comprehension Test Items.

ERIC Educational Resources Information Center

Peled, Zimra

1984-01-01

The multidimensional structure of verbal comprehension test items was investigated. Empirical evidence was provided to support the theory that item tasks are multivariate-multiordered composites of faceted components: language, contextual knowledge, and cognitive operation. Linear and circular properties of cylindrical manifestation were…
Field Test of Expedient Pavement Repairs (Test Items 16-35).

DTIC Science & Technology

1980-11-01

82 61 Surface Profiles After Repairs, Item 34 ............ ... 83 62 Cracking of Bond, Item 34 ..... ................. . 84 ix JI x LIST OF...Limestone Base Course .... ............... . 79 18 Summary of Test Results ...... .................. . 88 x ABBREVIATIONS AND NOMENCLATURE Abbreviations AFESC...coverage$ Lateral quarter X - M5 coverages -020- - 0.251 +0.10- centerline -0.05- 0 0 I> + o.IO_ quarter point -0.10 -0.20- - 0.25+ TRAFFIC ZONES . Lonituina
Differential item functioning analysis of the Vanderbilt Expertise Test for cars

PubMed Central

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499

Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

PubMed

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Bayes Factor Covariance Testing in Item Response Models.

PubMed

Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

2017-12-01

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Are Learning Disabled Students "Test-Wise?": An Inquiry into Reading Comprehension Test Items.

ERIC Educational Resources Information Center

Scruggs, Thomas E.; Lifson, Steve

The ability to correctly answer reading comprehension test items, without having read the accompanying reading passage, was compared for third grade learning disabled students and their peers from a regular classroom. In the first experiment, fourteen multiple choice items were selected from the Stanford Achievement Test. No reading passages were…
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

2013-01-01

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Controlling Item Exposure Conditional on Ability in Computerized Adaptive Testing.

ERIC Educational Resources Information Center

Stocking, Martha L.; Lewis, Charles

1998-01-01

Ensuring item and pool security in a continuous testing environment is explored through a new method of controlling exposure rate of items conditional on ability level in computerized testing. Properties of this conditional control on exposure rate, when used in conjunction with a particular adaptive testing algorithm, are explored using simulated…
Examination of Polytomous Items' Psychometric Properties According to Nonparametric Item Response Theory Models in Different Test Conditions

ERIC Educational Resources Information Center

Sengul Avsar, Asiye; Tavsancil, Ezel

2017-01-01

This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
A Person Fit Test for IRT Models for Polytomous Items

ERIC Educational Resources Information Center

Glas, C. A. W.; Dagohoy, Anna Villa T.

2007-01-01

A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests

ERIC Educational Resources Information Center

Wright, Keith D.; Oshima, T. C.

2015-01-01

This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

ERIC Educational Resources Information Center

Guo, Rui; Zheng, Yi; Chang, Hua-Hua

2015-01-01

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

PubMed Central

Michaelides, Michalis P.

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

PubMed

Michaelides, Michalis P

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Han, Kyung T.

2012-01-01

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Screening Test Items for Differential Item Functioning

ERIC Educational Resources Information Center

Longford, Nicholas T.

2014-01-01

A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

ERIC Educational Resources Information Center

Hertz, Norman R.; Chinn, Roberta N.

This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing

ERIC Educational Resources Information Center

Yen, Yung-Chin; Ho, Rong-Guey; Liao, Wen-Wei; Chen, Li-Ju

2012-01-01

In a test, the testing score would be closer to examinee's actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee's ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability…
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
A teaching intervention for reading laboratory experiments in college-level introductory chemistry

NASA Astrophysics Data System (ADS)

Kirk, Maria Kristine

The purpose of this study was to determine the effects that a pre-laboratory guide, conceptualized as a "scientific story grammar," has on college chemistry students' learning when they read an introductory chemistry laboratory manual and perform the experiments in the chemistry laboratory. The participants (N = 56) were students enrolled in four existing general chemistry laboratory sections taught by two instructors at a women's liberal arts college. The pre-laboratory guide consisted of eight questions about the experiment, including the purpose, chemical species, variables, chemical method, procedure, and hypothesis. The effects of the intervention were compared with those of the traditional pre-laboratory assignment for the eight chemistry experiments. Measures included quizzes, tests, chemistry achievement test, science process skills test, laboratory reports, laboratory average, and semester grade. The covariates were mathematical aptitude and prior knowledge of chemistry and science processes, on which the groups differed significantly. The study captured students' perceptions of their experience in general chemistry through a survey and interviews with eight students. The only significant differences in the treatment group's performance were in some subscores on lecture items and laboratory items on the quizzes. An apparent induction period was noted, in that significant measures occurred in mid-semester. Voluntary study with the pre-laboratory guide by control students precluded significant differences on measures given later in the semester. The groups' responses to the survey were similar. Significant instructor effects on three survey items were corroborated by the interviews. The researcher's students were more positive about their pre-laboratory tasks, enjoyed the laboratory sessions more, and were more confident about doing chemistry experiments than the laboratory instructor's groups due to differences in scaffolding by the instructors.

Item analysis of three Spanish naming tests: a cross-cultural investigation.

PubMed

Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
A Model-Based Method for Content Validation of Automatically Generated Test Items

ERIC Educational Resources Information Center

Zhang, Xinxin; Gierl, Mark

2016-01-01

The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

ERIC Educational Resources Information Center

He, Wei; Reckase, Mark D.

2014-01-01

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

PubMed Central

de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Difficulty and Discriminability of Introductory Psychology Test Items.

ERIC Educational Resources Information Center

Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis

2001-01-01

Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Sleep can reduce the testing effect: it enhances recall of restudied items but can leave recall of retrieved items unaffected.

PubMed

Bäuml, Karl-Heinz T; Holterman, Christoph; Abel, Magdalena

2014-11-01

The testing effect refers to the finding that retrieval practice in comparison to restudy of previously encoded contents can improve memory performance and reduce time-dependent forgetting. Naturally, long retention intervals include both wake and sleep delay, which can influence memory contents differently. In fact, sleep immediately after encoding can induce a mnemonic benefit, stabilizing and strengthening the encoded contents. We investigated in a series of 5 experiments whether sleep influences the testing effect. After initial study of categorized item material (Experiments 1, 2, and 4A), paired associates (Experiment 3), or educational text material (Experiment 4B), subjects were asked to restudy encoded contents or engage in active retrieval practice. A final recall test was conducted after a 12-hr delay that included diurnal wakefulness or nocturnal sleep. The results consistently showed typical testing effects after the wake delay. However, these testing effects were reduced or even eliminated after sleep, because sleep benefited recall of restudied items but left recall of retrieved items unaffected. The findings are consistent with the bifurcation model of the testing effect (Kornell, Bjork, & Garcia, 2011), according to which the distribution of memory strengths across items is shifted differentially by retrieving and restudying, with retrieval strengthening items to a much higher degree than restudy does. On the basis of this model, most of the retrieved items already fall above recall threshold in the absence of sleep, so additional sleep-induced strengthening may not improve recall of retrieved items any further. PsycINFO Database Record (c) 2014 APA, all rights reserved.
The Impact of Settable Test Item Exposure Control Interface Format on Postsecondary Business Student Test Performance

ERIC Educational Resources Information Center

Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.

2005-01-01

The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

ERIC Educational Resources Information Center

Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

2006-01-01

In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
Item Response Theory Modeling of the Philadelphia Naming Test

ERIC Educational Resources Information Center

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.

2015-01-01

Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.

ERIC Educational Resources Information Center

O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith

2000-01-01

Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
The development of an instrument to assess chemistry perceptions

NASA Astrophysics Data System (ADS)

Wells, Raymond R.

The instrument, developed in this study, attempted to correct the deficiencies of previous instruments. Statements of belief and opinion can be validly included under the construct of chemistry perceptions. Further, statements that might be better characterized as science attitudes, math attitudes, or attitudes toward a specific course or program were not included. Eliminating statements of math anxiety and test anxiety insured that responses to statements of anxiety were perceptions of anxiety solely related to chemistry. The results of the expert judges' responses to the Validation of Proposed Perception Statements forms were detailed to establish construct and content validity. The nature of Likert scale construction and calculation of internal consistency also supported the validity of the instrument. A pilot Chemistry Perception Questionnaire (CPQ) was then constructed based on agreement of the appropriate subscale and mean importance of the perception statements. The pilot CPQ results were subjected to an item analysis based on three sets of statistics: the frequency of each response and the percentage of respondents making each response for each perception statement, the mean and standard deviations for each item, and the item discrimination index which correlated the item scores with the subscale scores. With no zero or negative correlations to the subscale scores, it was not necessary to replace any of the perception statements contained in the pilot instrument. Therefore, the piloted Chemistry Perception Questionnaire became the final instrument. Factor analysis confirmed the multidimensionality of the instrument. The instrument was administered twice with a separation interval of approximately one month in order to perform a test-retest reliability analysis. One hundred and forty-one pairs were matched and results detailed. The correlation between forms, for the total instrument, was 0.9342. The mean coefficient alpha, for the total instrument, was 0
Using Response Times for Item Selection in Adaptive Testing

ERIC Educational Resources Information Center

van der Linden, Wim J.

2008-01-01

Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the…
V-TECS Criterion-Referenced Test Item Bank for Radiologic Technology Occupations.

ERIC Educational Resources Information Center

Reneau, Fred; And Others

This Vocational-Technical Education Consortium of States (V-TECS) criterion-referenced test item bank provides 696 multiple-choice items and 33 matching items for radiologic technology occupations. These job titles are included: radiologic technologist, chief; radiologic technologist; nuclear medicine technologist; radiation therapy technologist;…
Science Literacy: How do High School Students Solve PISA Test Items?

NASA Astrophysics Data System (ADS)

Wati, F.; Sinaga, P.; Priyandoko, D.

2017-09-01

The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.
Automatic Generation of Rasch-Calibrated Items: Figural Matrices Test GEOM and Endless-Loops Test EC

ERIC Educational Resources Information Center

Arendasy, Martin

2005-01-01

The future of test construction for certain psychological ability domains that can be analyzed well in a structured manner may lie--at the very least for reasons of test security--in the field of automatic item generation. In this context, a question that has not been explicitly addressed is whether it is possible to embed an item response theory…
Factors Affecting Item Difficulty in English Listening Comprehension Tests

ERIC Educational Resources Information Center

Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia

2015-01-01

Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
Optimal Stratification of Item Pools in a-Stratified Computerized Adaptive Testing.

ERIC Educational Resources Information Center

Chang, Hua-Hua; van der Linden, Wim J.

2003-01-01

Developed a method based on 0-1 linear programming to stratify an item pool optimally for use in alpha-stratified adaptive testing. Applied the method to a previous item pool from the computerized adaptive test of the Graduate Record Examinations. Results show the new method performs well in practical situations. (SLD)
The Development and Management of Banks of Performance Based Test Items.

ERIC Educational Resources Information Center

Curtis, H. A., Ed.

Symposium papers presented at an Annual Meeting of the National Council on Measurement in Education (Chicago, 1972), all of which concern banks of test items for use in constructing criterion referenced tests, comprise this document. The first paper, "Locally Produced Item Banks" by Thomas J. Slocum, presents information on the…

Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Implications of Changing Answers on Objective Test Items

ERIC Educational Resources Information Center

Mueller, Daniel J.; Wasser, Virginia

1977-01-01

Eighteen studies of the effects of changing initial answers to objective test items are reviewed. While students throughout the total test score range tended to gain more points than they lost, higher scoring students gain more than did lower scoring students. Suggestions for further research are made. (Author/JKS)
[Difference analysis among majors in medical parasitology exam papers by test item bank proposition].

PubMed

Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu

2012-04-30

The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.
Item Response Theory Modeling of the Philadelphia Naming Test.

PubMed

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

2015-06-01

In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.

ERIC Educational Resources Information Center

Zhu, Renbang; Yu, Feng; Liu, Su

A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

PubMed Central

Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

2014-01-01

We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Mutual Information Item Selection in Adaptive Classification Testing

ERIC Educational Resources Information Center

Weissman, Alexander

2007-01-01

A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…
Test-Wiseness Cues in the Options of Mathematics Items.

ERIC Educational Resources Information Center

Kuntz, Patricia

The quality of mathematics multiple choice items and their susceptibility to test wiseness were examined. Test wiseness was defined as "a subject's capacity to utilize the characteristics and formats of the test and/or test taking situation to receive a high score." The study used results of the Graduate Record Examinations Aptitude Test (GRE) and…
Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

ERIC Educational Resources Information Center

Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

2012-01-01

Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests

ERIC Educational Resources Information Center

Bryant, William

2017-01-01

As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Criterion-Referenced Test (CRT) Items for Building Trades.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This test item bank is intended to help instructors construct criterion-referenced tests for secondary-level courses in building trades. The bank is keyed to the Missouri Building Trades Competency Profile, which was developed by industry and education professionals in Missouri, and is designed to be used in conjunction with the Vocational…
Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing

ERIC Educational Resources Information Center

Deng, Hui; Ansley, Timothy; Chang, Hua-Hua

2010-01-01

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with…
Precision-Based Item Selection for Exposure Control in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Carroll, Ian A.

2017-01-01

Item exposure control is, relative to adaptive testing, a nascent concept that has emerged only in the last two to three decades on an academic basis as a practical issue in high-stakes computerized adaptive tests. This study aims to implement a new strategy in item exposure control by incorporating the standard error of the ability estimate into…
The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing

ERIC Educational Resources Information Center

Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi

2013-01-01

Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…
Science Library of Test Items. Volume Four: Practical Testing Guide.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test items collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, the guide gives a wide range of questions and activities for the manipulation of scientific equipment to allow assessment of students' practical laboratory skills. Instructions are given to make norm-referenced or…
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

PubMed

Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

2018-02-01

The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Application of Item Response Theory to Tests of Substance-related Associative Memory

PubMed Central

Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

2015-01-01

A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Preliminary development of an ultrabrief two-item bedside test for delirium.

PubMed

Fick, Donna M; Inouye, Sharon K; Guess, Jamey; Ngo, Long H; Jones, Richard N; Saczynski, Jane S; Marcantonio, Edward R

2015-10-01

Delirium is common, morbid, and costly, yet is greatly under-recognized among hospitalized older adults. To identify the best single and pair of mental status test items that predict the presence of delirium. Diagnostic test evaluation study that enrolled medicine inpatients aged 75 years or older at an academic medical center. Patients underwent a clinical reference standard assessment involving a patient interview, medical record review, and interviews with family members and nurses to determine the presence or absence of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition defined delirium. Participants also underwent the three-dimensional Confusion Assessment Method (3D-CAM), a brief, validated assessment for delirium. Individual items and pairs of items from the 3D-CAM were evaluated to determine sensitivity and specificity relative to the reference standard delirium diagnosis. Of the 201 participants (mean age 84 years, 62% female), 42 (21%) had delirium based on the clinical reference standard. The single item with the best test characteristics was "months of the year backwards" with a sensitivity of 83% (95% confidence interval [CI]: 69%-93%) and specificity of 69% (95% CI: 61%-76%). The best 2-item screen was the combination of "months of the year backwards" and "what is the day of the week?" with a sensitivity of 93% (95% CI: 81%-99%) and specificity of 64% (95% CI: 56%-70%). We identified a single item with >80% and pair of items with >90% sensitivity for delirium. If validated prospectively, these items will serve as an initial innovative screening step for delirium identification in hospitalized older adults. © 2015 Society of Hospital Medicine.
Development and evaluation of a thermochemistry concept inventory for college-level general chemistry

NASA Astrophysics Data System (ADS)

Wren, David A.

The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with
A Paradox in the Study of the Benefits of Test-Item Review

ERIC Educational Resources Information Center

van der Linden, Wim J.; Jeon, Minjeong; Ferrara, Steve

2011-01-01

According to a popular belief, test takers should trust their initial instinct and retain their initial responses when they have the opportunity to review test items. More than 80 years of empirical research on item review, however, has contradicted this belief and shown minor but consistently positive score gains for test takers who changed…

Test Item Linguistic Complexity and Assessments for Deaf Students

ERIC Educational Resources Information Center

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64…
Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Mathematical-Programming Approaches to Test Item Pool Design. Research Report.

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; van der Linden, Wim J.; Ariel, Adelaide

This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and thus to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item…
Differential Item Functioning Amplification and Cancellation in a Reading Test

ERIC Educational Resources Information Center

Bao, Han; Dayton, C. Mitchell; Hendrickson, Amy B.

2009-01-01

When testlet effects and item idiosyncratic features are both considered to be the reasons of DIF in educational tests using testlets (Wainer & Kiely, 1987) or item bundles (Rosenbaum, 1988), it is interesting to investigate the phenomena of DIF amplification and cancellation due to the interactive effects of these two factors. This research…
Five-Point Likert Items: t Test versus Mann-Whitney-Wilcoxon

ERIC Educational Resources Information Center

de Winter, Joost C. F.; Dodou, Dimitra

2010-01-01

Likert questionnaires are widely used in survey research, but it is unclear whether the item data should be investigated by means of parametric or nonparametric procedures. This study compared the Type I and II error rates of the "t" test versus the Mann-Whitney-Wilcoxon (MWW) for five-point Likert items. Fourteen population…
Testing for Differential Item Functioning with Measures of Partial Association

ERIC Educational Resources Information Center

Woods, Carol M.

2009-01-01

Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for one group of people versus another, irrespective of mean differences on the construct. There are many methods available for DIF assessment. The present article is focused on indices of partial association. A family of average…
What We Don't Test: What an Analysis of Unreleased ACS Exam Items Reveals about Content Coverage in General Chemistry Assessments

ERIC Educational Resources Information Center

Reed, Jessica J.; Villafan~e, Sachel M.; Raker, Jeffrey R.; Holme, Thomas A.; Murphy, Kristen L.

2017-01-01

General chemistry courses are often the foundation for the study of other science disciplines and upper-level chemistry concepts. Students who take introductory chemistry courses are more often from health and science-related fields than chemistry. As such, the content taught and assessed in general chemistry courses is envisioned as building…
Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study

ERIC Educational Resources Information Center

Yi, Qing; Zhang, Jinming; Chang, Hua-Hua

2008-01-01

Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive…
Chemistry inside an epistemological community box! Discursive exclusions and inclusions in Swedish National tests in Chemistry

NASA Astrophysics Data System (ADS)

Ståhl, Marie; Hussénius, Anita

2017-06-01

This study examined the Swedish national tests in chemistry for implicit and explicit values. The chemistry subject is understudied compared to biology and physics and students view chemistry as their least interesting science subject. The Swedish national science assessments aim to support equitable and fair evaluation of students, to concretize the goals in the chemistry syllabus and to increase student achievement. Discourse and multimodal analyses, based on feminist and critical didactic theories, were used to examine the test's norms and values. The results revealed that the chemistry discourse presented in the tests showed a traditional view of science from the topics discussed (for example, oil and metal), in the way women, men and youth are portrayed, and how their science interests are highlighted or neglected. An elitist view of science emerges from the test, with distinct gender and age biases. Students could interpret these biases as a message that only "the right type" of person may come into the chemistry epistemological community, that is, into this special sociocultural group that harbours a common view about this knowledge. This perspective may have an impact on students' achievement and thereby prevent support for an equitable and fair evaluation. Understanding the underlying evaluative meanings that come with science teaching is a question of democracy since it may affect students' feelings of inclusion or exclusion. The norms and values harboured in the tests will also affect teaching since the teachers are given examples of how the goals in the syllabus can be concretized.
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

ERIC Educational Resources Information Center

Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

2010-01-01

This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Using Out-of-Level Items in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Wei, Hua; Lin, Jie

2015-01-01

Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate…
Chemistry inside an Epistemological Community Box! Discursive Exclusions and Inclusions in Swedish National Tests in Chemistry

ERIC Educational Resources Information Center

Ståhl, Marie; Hussénius, Anita

2017-01-01

This study examined the Swedish national tests in chemistry for implicit and explicit values. The chemistry subject is understudied compared to biology and physics and students view chemistry as their least interesting science subject. The Swedish national science assessments aim to support equitable and fair evaluation of students, to concretize…
Development and testing of item response theory-based item banks and short forms for eye, skin and lung problems in sarcoidosis.

PubMed

Victorson, David E; Choi, Seung; Judson, Marc A; Cella, David

2014-05-01

Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool. After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches. From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids. Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

PubMed

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test

ERIC Educational Resources Information Center

Kahraman, Nilüfer

2014-01-01

Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Examination of Different Item Response Theory Models on Tests Composed of Testlets

ERIC Educational Resources Information Center

Kogar, Esin Yilmaz; Kelecioglu, Hülya

2017-01-01

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

NASA Astrophysics Data System (ADS)

Ilich, Maria O.

Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

ERIC Educational Resources Information Center

Pohl, Steffi; Gräfe, Linda; Rose, Norman

2014-01-01

Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Detection of Item Preknowledge Using Likelihood Ratio Test and Score Test

ERIC Educational Resources Information Center

Sinharay, Sandip

2017-01-01

An increasing concern of producers of educational assessments is fraudulent behavior during the assessment (van der Linden, 2009). Benefiting from item preknowledge (e.g., Eckerly, 2017; McLeod, Lewis, & Thissen, 2003) is one type of fraudulent behavior. This article suggests two new test statistics for detecting individuals who may have…

Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

The purposes of this study were to: (1) extend the sequential probability ratio testing (SPRT) procedure to polytomous item response theory (IRT) models in computerized classification testing (CCT); (2) compare polytomous items with dichotomous items using the SPRT procedure for their accuracy and efficiency; (3) study a direct approach in…
Weighted Maximum-a-Posteriori Estimation in Tests Composed of Dichotomous and Polytomous Items

ERIC Educational Resources Information Center

Sun, Shan-Shan; Tao, Jian; Chang, Hua-Hua; Shi, Ning-Zhong

2012-01-01

For mixed-type tests composed of dichotomous and polytomous items, polytomous items often yield more information than dichotomous items. To reflect the difference between the two types of items and to improve the precision of ability estimation, an adaptive weighted maximum-a-posteriori (WMAP) estimation is proposed. To evaluate the performance of…
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

PubMed

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Implementing Sympson-Hetter Item-Exposure Control in a Shadow-Test Approach to Constrained Adaptive Testing

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; van der Linden, Wim J.

2008-01-01

In most operational computerized adaptive testing (CAT) programs, the Sympson-Hetter (SH) method is used to control the exposure of the items. Several modifications and improvements of the original method have been proposed. The Stocking and Lewis (1998) version of the method uses a multinomial experiment to select items. For severely constrained…
Science Library of Test Items. Volume Eight. Mastery Testing Program. Series 3 & 4 Supplements to Introduction and Manual.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…
Criterion-Referenced Test (CRT) Items for Air Conditioning, Heating and Refrigeration.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

These criterion-referenced test (CRT) items for air conditioning, heating, and refrigeration are keyed to the Missouri Air Conditioning, Heating, and Refrigeration Competency Profile. The items are designed to work with both the Vocational Instructional Management System and Vocational Administrative Management System. For word processing and…
The Rasch Model and Missing Data, with an Emphasis on Tailoring Test Items.

ERIC Educational Resources Information Center

de Gruijter, Dato N. M.

Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the…
Evaluating innovative items for the NCLEX, part I: usability and pilot testing.

PubMed

Wendt, Anne; Harmes, J Christine

2009-01-01

National Council of State Boards of Nursing (NCSBN) has recently conducted preliminary research on the feasibility of including various types of innovative test questions (items) on the NCLEX. This article focuses on the participants' reactions to and their strategies for interacting with various types of innovative items. Part 2 in the May/June issue will focus on the innovative item templates and evaluation of the statistical characteristics and the level of cognitive processing required to answer the examination items.
Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

ERIC Educational Resources Information Center

Quaigrain, Kennedy; Arhin, Ato Kwamina

2017-01-01

Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
The Role of Item Models in Automatic Item Generation

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2012-01-01

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Development of an item bank and computer adaptive test for role functioning.

PubMed

Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B

2012-11-01

Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
Methodology in diagnostic laboratory test research in clinical chemistry and clinical chemistry and laboratory medicine.

PubMed

Lumbreras-Lacarra, Blanca; Ramos-Rincón, José Manuel; Hernández-Aguado, Ildefonso

2004-03-01

The application of epidemiologic principles to clinical diagnosis has been less developed than in other clinical areas. Knowledge of the main flaws affecting diagnostic laboratory test research is the first step for improving its quality. We assessed the methodologic aspects of articles on laboratory tests. We included articles that estimated indexes of diagnostic accuracy (sensitivity and specificity) and were published in Clinical Chemistry or Clinical Chemistry and Laboratory Medicine in 1996, 2001, and 2002. Clinical Chemistry has paid special attention to this field of research since 1996 by publishing recommendations, checklists, and reviews. Articles were identified through electronic searches in Medline. The strategy combined the Mesh term "sensitivity and specificity" (exploded) with the text words "specificity", "false negative", and "accuracy". We examined adherence to seven methodologic criteria used in the study by Reid et al. (JAMA1995;274:645-51) of papers published in general medical journals. Three observers evaluated each article independently. Seventy-nine articles fulfilled the inclusion criteria. The percentage of studies that satisfied each criterion improved from 1996 to 2002. Substantial improvement was observed in reporting of the statistical uncertainty of indices of diagnostic accuracy, in criteria based on clinical information from the study population (spectrum composition), and in avoidance of workup bias. Analytical reproducibility was reported frequently (68%), whereas information about indeterminate results was rarely provided. The mean number of methodologic criteria satisfied showed a statistically significant increase over the 3 years in Clinical Chemistry but not in Clinical Chemistry and Laboratory Medicine. The methodologic quality of the articles on diagnostic test research published in Clinical Chemistry and Clinical Chemistry and Laboratory Medicine is comparable to the quality observed in the best general medical journals
Development of an item bank for computerized adaptive test (CAT) measurement of pain.

PubMed

Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens

2016-01-01

Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.
The Sequential Probability Ratio Test and Binary Item Response Models

ERIC Educational Resources Information Center

Nydick, Steven W.

2014-01-01

The sequential probability ratio test (SPRT) is a common method for terminating item response theory (IRT)-based adaptive classification tests. To decide whether a classification test should stop, the SPRT compares a simple log-likelihood ratio, based on the classification bound separating two categories, to prespecified critical values. As has…
Using Kernel Equating to Assess Item Order Effects on Test Scores

ERIC Educational Resources Information Center

Moses, Tim; Yang, Wen-Ling; Wilson, Christine

2007-01-01

This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test…
Home Economics. Sample Test Items. Levels I and II.

ERIC Educational Resources Information Center

New York State Education Dept., Albany. Bureau of Elementary and Secondary Educational Testing.

A sample of behavioral objectives and related test items that could be developed for content modules in Home Economics levels I and II, this book is intended to enable teachers to construct more valid and reliable test materials. Forty-eight one-page modules are presented, and opposite each module are listed two to seven specific behavioral…
Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

ERIC Educational Resources Information Center

Baghaei, Purya; Aryadoust, Vahid

2015-01-01

Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…
Fostering a student's skill for analyzing test items through an authentic task

NASA Astrophysics Data System (ADS)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
Assessing Student Preparation through Placement Tests

NASA Astrophysics Data System (ADS)

McFate, Craig; Olmsted, John, III

1999-04-01

The chemistry department at California State University, Fullerton, uses a placement test of its own design to assess student readiness to enroll in General Chemistry. This test contains items designed to test cognitive skills more than factual knowledge. We have analyzed the ability of this test to predict student success (defined as passing the first-semester course with a C or better) using data for 845 students from four consecutive semesters. In common with other placement tests, we find a weak but statistically significant correlation between test performance and course grades. More meaningfully, there is a strong correlation (R2 = 0.82) between test score and course success, sufficient to use for counseling purposes. An item analysis was conducted to determine what types of questions provide the best predictability. Six questions from the full set of 25 were identified as strong predictors, on the basis of discrimination indices and coefficients of determination that were more than one standard deviation above the mean values for test items. These questions had little in common except for requiring multistep mathematical operations and formal reasoning.
Item Analysis Appropriate for Domain-Referenced Classroom Testing. (Project Technical Report Number 1).

ERIC Educational Resources Information Center

Nitko, Anthony J.; Hsu, Tse-chi

Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…

The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Designing P-Optimal Item Pools in Computerized Adaptive Tests with Polytomous Items

ERIC Educational Resources Information Center

Zhou, Xuechun

2012-01-01

Current CAT applications consist of predominantly dichotomous items, and CATs with polytomously scored items are limited. To ascertain the best approach to polytomous CAT, a significant amount of research has been conducted on item selection, ability estimation, and impact of termination rules based on polytomous IRT models. Few studies…
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

PubMed

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

ERIC Educational Resources Information Center

Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

2013-01-01

Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua

2018-01-01

Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…
The Impact of Test Dimensionality, Common-Item Set Format, and Scale Linking Methods on Mixed-Format Test Equating

ERIC Educational Resources Information Center

Öztürk-Gübes, Nese; Kelecioglu, Hülya

2016-01-01

The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Treatment of Not-Administered Items on Individually Administered Intelligence Tests

ERIC Educational Resources Information Center

He, Wei; Wolfe, Edward W.

2012-01-01

In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…
Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model.

PubMed

Cho, Sun-Joo; Athay, Michele; Preacher, Kristopher J

2013-05-01

Even though many educational and psychological tests are known to be multidimensional, little research has been done to address how to measure individual differences in change within an item response theory framework. In this paper, we suggest a generalized explanatory longitudinal item response model to measure individual differences in change. New longitudinal models for multidimensional tests and existing models for unidimensional tests are presented within this framework and implemented with software developed for generalized linear models. In addition to the measurement of change, the longitudinal models we present can also be used to explain individual differences in change scores for person groups (e.g., learning disabled students versus non-learning disabled students) and to model differences in item difficulties across item groups (e.g., number operation, measurement, and representation item groups in a mathematics test). An empirical example illustrates the use of the various models for measuring individual differences in change when there are person groups and multiple skill domains which lead to multidimensionality at a time point. © 2012 The British Psychological Society.
Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework.

PubMed

Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A

2006-11-01

To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Wyse, Adam E.; Albano, Anthony D.

2015-01-01

This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
The Psychological Effect of Errors in Standardized Language Test Items on EFL Students' Responses to the Following Item

ERIC Educational Resources Information Center

Khaksefidi, Saman

2017-01-01

This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…
Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000

ERIC Educational Resources Information Center

von Schrader, Sarah; Ansley, Timothy

2006-01-01

Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

PubMed

Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

2015-12-01

The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

ERIC Educational Resources Information Center

Alsadaawi, Abdullah Saleh

2017-01-01

The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Item Selection in Multidimensional Computerized Adaptive Testing--Gaining Information from Different Angles

ERIC Educational Resources Information Center

Wang, Chun; Chang, Hua-Hua

2011-01-01

Over the past thirty years, obtaining diagnostic information from examinees' item responses has become an increasingly important feature of educational and psychological testing. The objective can be achieved by sequentially selecting multidimensional items to fit the class of latent traits being assessed, and therefore Multidimensional…
Three Essays on Teacher Education Programs and Test-Takers' Response Times on Test Items

ERIC Educational Resources Information Center

Qian, Hong

2013-01-01

This dissertation includes three essays: one essay focuses on the effect of teacher preparation programs on teacher knowledge while the other two focus on test-takers' response times on test items. Essay One addresses the problem of how opportunities to learn in teacher preparation programs influence future elementary mathematics teachers'…
The effects of linguistic modification on ESL students' comprehension of nursing course test items.

PubMed

Bosher, Susan; Bowles, Melissa

2008-01-01

Recent research has indicated that language may be a source of construct-irrelevant variance for non-native speakers of English, or English as a second language (ESL) students, when they take exams. As a result, exams may not accurately measure knowledge of nursing content. One accommodation often used to level the playing field for ESL students is linguistic modification, a process by which the reading load of test items is reduced while the content and integrity of the item are maintained. Research on the effects of linguistic modification has been conducted on examinees in the K-12 population, but is just beginning in other areas. This study describes the collaborative process by which items from a pathophysiology exam were linguistically modified and subsequently evaluated for comprehensibility by ESL students. Findings indicate that in a majority of cases, modification improved examinees' comprehension of test items. Implications for test item writing and future research are discussed.

How Small the Number of Test Items Can Be for the Basis of Estimating the Operating Characteristics of the Discrete Responses to Unknown Test Items.

ERIC Educational Resources Information Center

Samejima, Fumiko; Changas, Paul S.

The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…
Evaluation of Floors and Item Gradients for Reading and Math Tests for Young Children

ERIC Educational Resources Information Center

Bradley-Johnson, Sharon; Durmusoglu, Gokce

2005-01-01

Ignoring the adequacy of floors and item gradients for tests used with young children can have serious consequences. Thus, because of the importance of early intervention for reading and math problems, we used the criteria suggested by Bracken for adequate floors and item gradients, and reviewed 15 reading tests and 12 math tests for ages 4-0…
Computerized Adaptive Testing with Item Clones. Research Report.

ERIC Educational Resources Information Center

Glas, Cees A. W.; van der Linden, Wim J.

To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Prediction of true test scores from observed item scores and ancillary data.

PubMed

Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

2015-05-01

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test

ERIC Educational Resources Information Center

Ho, Tsung-Han; Dodd, Barbara G.

2012-01-01

In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
The Development of a Visual-Perceptual Chemistry Specific (VPCS) Assessment Tool

ERIC Educational Resources Information Center

Oliver-Hoyo, Maria; Sloan, Caroline

2014-01-01

The development of the Visual-Perceptual Chemistry Specific (VPCS) assessment tool is based on items that align to eight visual-perceptual skills considered as needed by chemistry students. This tool includes a comprehensive range of visual operations and presents items within a chemistry context without requiring content knowledge to solve…
Comparing Recent Organizing Templates for Test Content between ACS Exams in General Chemistry and AP Chemistry

ERIC Educational Resources Information Center

Holme, Thomas

2014-01-01

Two different versions of "big ideas" rooted content maps have recently been published for general chemistry. As embodied in the content outline from the College Board, one of these maps is designed to guide curriculum development and testing for advanced placement (AP) chemistry. The Anchoring Concepts Content Map for general chemistry…
Estimating the Reliability of a Test Battery Composite or a Test Score Based on Weighted Item Scoring

ERIC Educational Resources Information Center

Feldt, Leonard S.

2004-01-01

In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
Factor analysis for instruments of science learning motivation and its implementation for the chemistry and biology teacher candidates

NASA Astrophysics Data System (ADS)

Prasetya, A. T.; Ridlo, S.

2018-03-01

The purpose of this study is to test the learning motivation of science instruments and compare the learning motivation of science from chemistry and biology teacher candidates. Kuesioner Motivasi Sains (KMS) in Indonesian adoption of the Science Motivation Questionnaire II (SMQ II) consisting of 25 items with a 5-point Likert scale. The number of respondents for the Exploratory Factor Analysis (EFA) test was 312. The Kaiser-Meyer-Olkin (KMO), determinant, Bartlett’s Sphericity, Measures of Sampling Adequacy (MSA) tests against KMS using SPSS 20.0, and Lisrel 8.51 software indicate eligible indications. However testing of Communalities obtained results that there are 4 items not qualified, so the item is discarded. The second test, all parameters of eligibility and has a magnitude of Root Mean Square Error of Approximation (RMSEA), P-Value for the Test of Close Fit (RMSEA <0.05), Goodness of Fit Index (GFI) was good. The new KMS with 21 valid items and composite reliability of 0.9329 can be used to test the level of learning motivation of science which includes Intrinsic Motivation, Sefl-Efficacy, Self-Determination, Grade Motivation and Career Motivation for students who master the Indonesian language. KMS trials of chemistry and biology teacher candidates obtained no significant difference in the learning motivation between the two groups.
Evaluation of Item Candidates: The PROMIS Qualitative Item Review

PubMed Central

DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.

2009-01-01

One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

ERIC Educational Resources Information Center

Woods, Carol M.; Grimm, Kevin J.

2011-01-01

In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.

ERIC Educational Resources Information Center

Marzano, Robert J.

To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
The Development and Validation of a Formula for Measuring Single-Sentence Test Item Readability.

ERIC Educational Resources Information Center

Homan, Susan; And Others

1994-01-01

A study was conducted with 782 elementary school students to determine whether the Homan-Hewitt Readability Formula could identify the readability of a single-sentence test item. Results indicate that a relationship exists between students' reading grade levels and responses to test items written at higher readability levels. (SLD)
Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

PubMed

Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

2015-08-01

The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
A Rapid Item-Search Procedure for Bayesian Adaptive Testing.

DTIC Science & Technology

1977-05-01

properties of the • procedure , they migh t well introduce undesirable psychological effects on test scores (e.g., Betz & Weiss , 1976r.’ , 1976b...ge of results and adaptive ability test .~~~~ (Research Rep . 76—4). Minneapolis: University of Minnesota , Departmen t of Psychology , Psychometric...t~~[AH ~~~ ~~~~ r _ _ _ _ A RAPID ITEM -SEARC H PROCEDURE FOR BAYESIAN ADAPTIVE TESTING C. David Vale d D D Can David J . Weiss RESEARCH REPORT 77-n
IRT-Estimated Reliability for Tests Containing Mixed Item Formats

ERIC Educational Resources Information Center

Shu, Lianghua; Schwarz, Richard D.

2014-01-01

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…
Applications of NLP Techniques to Computer-Assisted Authoring of Test Items for Elementary Chinese

ERIC Educational Resources Information Center

Liu, Chao-Lin; Lin, Jen-Hsiang; Wang, Yu-Chun

2010-01-01

The authors report an implemented environment for computer-assisted authoring of test items and provide a brief discussion about the applications of NLP techniques for computer assisted language learning. Test items can serve as a tool for language learners to examine their competence in the target language. The authors apply techniques for…
The High Energy Lightning Simulator (HELS) Test Facility for Testing Explosive Items

DTIC Science & Technology

1996-08-01

Center, Redstone Arsenal, AL Thomas E. Roy and David W. Bagwell AMTEC Corporation, Huntsville, AL ABSTRACT Details of the High Energy Lightning...simulated lightning testing of inerted missiles and inerted explosive items containing electrically initiated explosive trains is to determine the...penetrate the safety cages, which are electrically conductive and grounded, without loss of current. This transmission system consists of six large

Redefining diagnostic symptoms of depression using Rasch analysis: testing an item bank suitable for DSM-V and computer adaptive testing.

PubMed

Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S

2011-10-01

We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Relationships among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models

ERIC Educational Resources Information Center

Kohli, Nidhi; Koran, Jennifer; Henn, Lisa

2015-01-01

There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

ERIC Educational Resources Information Center

Smith, Clifton L.; And Others

This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…
The Relationship of Item-Level Response Times with Test-Taker and Item Variables in an Operational CAT Environment. LSAC Research Report Series.

ERIC Educational Resources Information Center

Swygert, Kimberly A.

In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

PubMed

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
A Comparative Study of Online Item Calibration Methods in Multidimensional Computerized Adaptive Testing

ERIC Educational Resources Information Center

Chen, Ping

2017-01-01

Calibration of new items online has been an important topic in item replenishment for multidimensional computerized adaptive testing (MCAT). Several online calibration methods have been proposed for MCAT, such as multidimensional "one expectation-maximization (EM) cycle" (M-OEM) and multidimensional "multiple EM cycles"…
Geography, Years 7-10, Library of Test Items. Volume Eight. Junior Secondary Items To Be Used With 1976 to 1980 H.S.C. Geography Exam. Broadsheets.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Development of a National Item Bank for Tests of Driving Knowledge.

ERIC Educational Resources Information Center

Pollock, William T.; McDole, Thomas L.

Materials intended for driving knowledge test development use by operational licensing and education agencies were prepared. Candidate test items were developed, using literature and operational practice sources, to reflect current state-of-knowledge with respect to principles of safe, efficient driving, to legal regulations, and to traffic…
Test Score Equating Using Discrete Anchor Items versus Passage-Based Anchor Items: A Case Study Using "SAT"® Data. Research Report. ETS RR-14-14

ERIC Educational Resources Information Center

Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill

2014-01-01

The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Solving the measurement invariance anchor item problem in item response theory.

PubMed

Meade, Adam W; Wright, Natalie A

2012-09-01

The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
Validating Translation Test Items via the Many-Facet Rasch Model.

PubMed

Tseng, Wen-Ta; Su, Tzi-Ying; Nix, John-Michael L

2018-01-01

This study applied the many-facet Rasch model to assess learners' translation ability in an English as a foreign language context. Few attempts have been made in extant research to detect and calibrate rater severity in the domain of translation testing. To fill the research gap, this study documented the process of validating a test of Chinese-to-English sentence translation and modeled raters' scoring propensity defined by harshness or leniency, expert/novice effects on severity, and concomitant effects on item difficulty. Two hundred twenty-five, third-year senior high school Taiwanese students and six educators from tertiary and secondary educational institutions served as participants. The students' mean age was 17.80 years ( SD = 1.20, range 17-19). The exam consisted of 10 translation items adapted from two entrance exam tests. The results showed that this subjectively scored performance assessment exhibited robust unidimensionality, thus reliably measuring translation ability free from unmodeled disturbances. Furthermore, discrepancies in ratings between novice and expert raters were also identified and modeled by the many-facet Rasch model. The implications for applying the many-facet Rasch model in translation tests at the tertiary level were discussed.
On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
Refinement of a Chemistry Attitude Measure for College Students

ERIC Educational Resources Information Center

Xu, Xiaoying; Lewis, Jennifer E.

2011-01-01

This work presents the evaluation and refinement of a chemistry attitude measure, Attitude toward the Subject of Chemistry Inventory (ASCI), for college students. The original 20-item and revised 8-item versions of ASCI (V1 and V2) were administered to different samples. The evaluation for ASCI had two main foci: reliability and validity. This…
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.

ERIC Educational Resources Information Center

Marco, Gary L.; And Others

Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

1995-01-01

This article reports the results of using a three-layer back propagation artificial neural network to predict item difficulty in a reading comprehension test. Three classes of variables were examined: text structure, propositional analysis, and cognitive demand. Results demonstrate that the networks can consistently predict item difficulty. (JL)
Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items.

ERIC Educational Resources Information Center

van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.

2002-01-01

Compared the nominal and empirical null distributions of the standardized log-likelihood statistic for polytomous items for paper-and-pencil (P&P) and computerized adaptive tests (CATs). Results show that the empirical distribution of the statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Also…
Australian Biology Test Item Bank, Years 11 and 12. Volume II: Year 12.

ERIC Educational Resources Information Center

Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…
A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Weissman, Alexander

2006-01-01

A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level ([theta]) estimation and vice versa. When discrepancies exist between an examinee's estimated and true [theta] levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with…
Australian Biology Test Item Bank, Years 11 and 12. Volume I: Year 11.

ERIC Educational Resources Information Center

Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…
Quality Multiple-Choice Test Questions: Item-Writing Guidelines and an Analysis of Auditing Testbanks.

ERIC Educational Resources Information Center

Hansen, James D.; Dexter, Lee

1997-01-01

Analysis of test item banks in 10 auditing textbooks found that 75% of questions violated one or more guidelines for multiple-choice items. In comparison, 70% of a certified public accounting exam bank had no violations. (SK)

Thai Grade 11 Students' Alternative Conceptions for Acid-Base Chemistry

ERIC Educational Resources Information Center

Artdej, Romklao; Ratanaroutai, Thasaneeya; Coll, Richard Kevin; Thongpanchang, Tienthong

2010-01-01

This study involved the development of a two-tier diagnostic instrument to assess Thai high school students' understanding of acid-base chemistry. The acid-base diagnostic test (ABDT) comprising 18 items was administered to 55 Grade 11 students in a science and mathematics programme during the second semester of the 2008 academic year. Analysis of…
An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2016-01-01

This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

ERIC Educational Resources Information Center

Lynch, Mervin D.; Chaves, John

Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
Using Item Response Theory and Adaptive Testing in Online Career Assessment

ERIC Educational Resources Information Center

Betz, Nancy E.; Turner, Brandon M.

2011-01-01

The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory…
Strategies for Controlling Item Exposure in Computerized Adaptive Testing with the Generalized Partial Credit Model

ERIC Educational Resources Information Center

Davis, Laurie Laughlin

2004-01-01

Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline…
Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

PubMed

McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

2018-01-23

Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item
International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.

ERIC Educational Resources Information Center

Smith, Clifton L.; And Others

This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…
Development and analysis of an instrument to assess student understanding of GOB chemistry knowledge relevant to clinical nursing practice.

PubMed

Brown, Corina E; Hyslop, Richard M; Barbera, Jack

2015-01-01

The General, Organic, and Biological Chemistry Knowledge Assessment (GOB-CKA) is a multiple-choice instrument designed to assess students' understanding of the chemistry topics deemed important to clinical nursing practice. This manuscript describes the development process of the individual items along with a psychometric evaluation of the final version of the items and instrument. In developing items for the GOB-CKA, essential topics were identified through a series of expert interviews (with practicing nurses, nurse educators, and GOB chemistry instructors) and confirmed through a national survey. Individual items were tested in qualitative studies with students from the target population for clarity and wording. Data from pilot and beta studies were used to evaluate each item and narrow the total item count to 45. A psychometric analysis performed on data from the 45-item final version was used to provide evidence of validity and reliability. The final version of the instrument has a Cronbach's alpha value of 0.76. Feedback from an expert panel provided evidence of face and content validity. Convergent validity was estimated by comparing the results from the GOB-CKA with the General-Organic-Biochemistry Exam (Form 2007) of the American Chemical Society. Instructors who wish to use the GOB-CKA for teaching and research may contact the corresponding author for a copy of the instrument. © 2014 Wiley Periodicals, Inc.
Item Development and Validity Testing for a Self- and Proxy Report: The Safe Driving Behavior Measure

PubMed Central

Classen, Sherrilene; Winter, Sandra M.; Velozo, Craig A.; Bédard, Michel; Lanford, Desiree N.; Brumback, Babette; Lutz, Barbara J.

2010-01-01

OBJECTIVE We report on item development and validity testing of a self-report older adult safe driving behaviors measure (SDBM). METHOD On the basis of theoretical frameworks (Precede–Proceed Model of Health Promotion, Haddon’s matrix, and Michon’s model), existing driving measures, and previous research and guided by measurement theory, we developed items capturing safe driving behavior. Item development was further informed by focus groups. We established face validity using peer reviewers and content validity using expert raters. RESULTS Peer review indicated acceptable face validity. Initial expert rater review yielded a scale content validity index (CVI) rating of 0.78, with 44 of 60 items rated ≥0.75. Sixteen unacceptable items (≤0.5) required major revision or deletion. The next CVI scale average was 0.84, indicating acceptable content validity. CONCLUSION The SDBM has relevance as a self-report to rate older drivers. Future pilot testing of the SDBM comparing results with on-road testing will define criterion validity. PMID:20437917
A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

ERIC Educational Resources Information Center

Masters, James S.

2010-01-01

With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Easy and Informative: Using Confidence-Weighted True-False Items for Knowledge Tests in Psychology Courses

ERIC Educational Resources Information Center

Dutke, Stephan; Barenberg, Jonathan

2015-01-01

We introduce a specific type of item for knowledge tests, confidence-weighted true-false (CTF) items, and review experiences of its application in psychology courses. A CTF item is a statement about the learning content to which students respond whether the statement is true or false, and they rate their confidence level. Previous studies using…
Strategies for controlling item exposure in computerized adaptive testing with the partial credit model.

PubMed

Davis, Laurie Laughlin; Dodd, Barbara G

2008-01-01

Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). The current study investigated the performance of four procedures for controlling item exposure in a CAT under the partial credit model. In addition to a no exposure control baseline condition, the Kingsbury-Zara, modified-within-.10-logits, Sympson-Hetter, and conditional Sympson-Hetter procedures were implemented to control exposure rates. The Kingsbury-Zara and the modified-within-.10-logits procedures were implemented with 3 and 6 item candidate conditions. The results show that the Kingsbury-Zara and modified-within-.10-logits procedures with 6 item candidates performed as well as the conditional Sympson-Hetter in terms of exposure rates, overlap rates, and pool utilization. These two procedures are strongly recommended for use with partial credit CATs due to their simplicity and strength of their results.
Measuring the ICF components of impairment, activity limitation and participation restriction: an item analysis using classical test theory and item response theory

PubMed Central

Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie

2009-01-01

Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in
Item Selection Criteria with Practical Constraints for Computerized Classification Testing

ERIC Educational Resources Information Center

Lin, Chuan-Ju

2011-01-01

This study compares four item selection criteria for a two-category computerized classification testing: (1) Fisher information (FI), (2) Kullback-Leibler information (KLI), (3) weighted log-odds ratio (WLOR), and (4) mutual information (MI), with respect to the efficiency and accuracy of classification decision using the sequential probability…
Analysis of Individual "Test Of Astronomy STandards" (TOAST) Item Responses

ERIC Educational Resources Information Center

Slater, Stephanie J.; Schleigh, Sharon Price; Stork, Debra J.

2015-01-01

The development of valid and reliable strategies to efficiently determine the knowledge landscape of introductory astronomy college students is an effort of great interest to the astronomy education community. This study examines individual item response rates from a widely used conceptual understanding survey, the Test Of Astronomy Standards…
National Chemistry Week 2000: JCE Resources in Food Chemistry

NASA Astrophysics Data System (ADS)

Jacobsen, Erica K.

2000-10-01

November brings another National Chemistry Week, and this year's theme is food chemistry. I was asked to collect and evaluate JCE resources for use with this theme, a project that took me deep into past issues of JCE and yielded many treasures. Here we present the results of searches for food chemistry information and activities. While the selected articles are mainly at the high school and college levels, there are some excellent ones for the elementary school level and some that can be adapted for younger students. The focus of all articles is on the chemistry of food itself. Activities that only use food to demonstrate a principle other than food chemistry are not included. Articles that cover household products such as cleansers and pharmaceuticals are also not included. Each article has been characterized as a demonstration, experiment, calculation, activity, or informational item; several fit more than one classification. Also included are keywords and an evaluation as to which levels the article may serve.
A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

ERIC Educational Resources Information Center

Lee, Guemin; Park, In-Yong

2012-01-01

Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

PubMed

Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

2013-12-01

A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
IRT Item Parameter Scaling for Developing New Item Pools

ERIC Educational Resources Information Center

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua

2017-01-01

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Cognitive Diagnostic Models for Tests with Multiple-Choice and Constructed-Response Items

ERIC Educational Resources Information Center

Kuo, Bor-Chen; Chen, Chun-Hua; Yang, Chih-Wei; Mok, Magdalena Mo Ching

2016-01-01

Traditionally, teachers evaluate students' abilities via their total test scores. Recently, cognitive diagnostic models (CDMs) have begun to provide information about the presence or absence of students' skills or misconceptions. Nevertheless, CDMs are typically applied to tests with multiple-choice (MC) items, which provide less diagnostic…
Assessing the Performance of Classical Test Theory Item Discrimination Estimators in Monte Carlo Simulations

ERIC Educational Resources Information Center

Bazaldua, Diego A. Luna; Lee, Young-Sun; Keller, Bryan; Fellers, Lauren

2017-01-01

The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT:…
Gender-Based Differential Item Performance in Mathematics Achievement Items.

ERIC Educational Resources Information Center

Doolittle, Allen E.; Cleary, T. Anne

1987-01-01

Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

PubMed

Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

2015-08-19

Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these
Fairness in Computerized Testing: Detecting Item Bias Using CATSIB with Impact Present

ERIC Educational Resources Information Center

Chu, Man-Wai; Lai, Hollis

2013-01-01

In educational assessment, there is an increasing demand for tailoring assessments to individual examinees through computer adaptive tests (CAT). As such, it is particularly important to investigate the fairness of these adaptive testing processes, which require the investigation of differential item function (DIF) to yield information about item…
Interest Inventory Items as Reinforcing Stimuli: A Test of the A-R-D Theory.

ERIC Educational Resources Information Center

Staats, Arthur W.; And Others

An experiement was conducted to test the hypothesis that interest inventory items would function as reinforcing stimuli in a visual discrimination task. When previously rated liked and disliked items from the Strong Vocational Interest Blank were differentially presented following one of two responses, subjects learned to respond to the stimulus…
Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments

ERIC Educational Resources Information Center

Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D.

2012-01-01

In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Negative and Positive Testing Effects in Terms of Item-Specific and Relational Information

ERIC Educational Resources Information Center

Mulligan, Neil W.; Peterson, Daniel J.

2015-01-01

Though retrieving information typically results in improved memory on a subsequent test (the testing effect), Peterson and Mulligan (2013) outlined the conditions under which retrieval practice results in poorer recall relative to restudy, a phenomenon dubbed the "negative testing effect." The item-specific-relational account proposes…
Item Type and Gender Differences on the Mental Rotations Test

ERIC Educational Resources Information Center

Voyer, Daniel; Doyle, Randi A.

2010-01-01

This study investigated gender differences on the Mental Rotations Test (MRT) as a function of item and response types. Accordingly, 86 male and 109 female undergraduate students completed the MRT without time limits. Responses were coded as reflecting two correct (CC), one correct and one wrong (CW), two wrong (WW), one correct and one blank…
Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

ERIC Educational Resources Information Center

Kleinke, David J.

Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
A Preliminary Analysis of the Linguistic Complexity of Numeracy Skills Test Items for Pre Service Teachers

ERIC Educational Resources Information Center

O'Keeffe, Lisa

2016-01-01

Language is frequently discussed as barrier to mathematics word problems. Hence this paper presents the initial findings of a linguistic analysis of numeracy skills test sample items. The theoretical perspective of multi-modal text analysis underpinned this study, in which data was extracted from the ten sample numeracy test items released by the…
Component Identification and Item Difficulty of Raven's Matrices Items.

ERIC Educational Resources Information Center

Green, Kathy E.; Kluever, Raymond C.

Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Undergraduate chemistry students' conceptions of atomic structure, molecular structure and chemical bonding

NASA Astrophysics Data System (ADS)

Campbell, Erin Roberts

The process of chemical education should facilitate students' construction of meaningful conceptual structures about the concepts and processes of chemistry. It is evident, however, that students at all levels possess concepts that are inconsistent with currently accepted scientific views. The purpose of this study was to examine undergraduate chemistry students' conceptions of atomic structure, chemical bonding and molecular structure. A diagnostic instrument to evaluate students' conceptions of atomic and molecular structure was developed by the researcher. The instrument incorporated multiple-choice items and reasoned explanations based upon relevant literature and a categorical summarization of student responses (Treagust, 1988, 1995). A covalent bonding and molecular structure diagnostic instrument developed by Peterson and Treagust (1989) was also employed. The ex post facto portion of the study examined the conceptual understanding of undergraduate chemistry students using descriptive statistics to summarize the results obtained from the diagnostic instruments. In addition to the descriptive portion of the study, a total score for each student was calculated based on the combination of correct and incorrect choices made for each item. A comparison of scores obtained on the diagnostic instruments by the upper and lower classes of undergraduate students was made using a t-Test. This study also examined an axiomatic assumption that an understanding of atomic structure is important in understanding bonding and molecular structure. A Pearson Correlation Coefficient, ṟ, was calculated to provide a measure of the strength of this association. Additionally, this study gathered information regarding expectations of undergraduate chemistry students' understanding held by the chemical community. Two questionnaires were developed with items based upon the propositional knowledge statements used in the development of the diagnostic instruments. Subgroups of items from
A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

PubMed

van Rijn, Peter W; Ali, Usama S

2017-05-01

We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.
Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items

ERIC Educational Resources Information Center

Mariano, Louis T.; Junker, Brian W.

2007-01-01

When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…
The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory

ERIC Educational Resources Information Center

Anil, Duygu

2008-01-01

In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…
Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison with Short Forms

ERIC Educational Resources Information Center

Hol, A. Michiel; Vorst, Harrie C. M.; Mellenbergh, Gideon J.

2007-01-01

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible…
Item Parameter Invariance of the Kaufman Adolescent and Adult Intelligence Test across Male and Female Samples

ERIC Educational Resources Information Center

Immekus, Jason C.; Maller, Susan J.

2009-01-01

The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

ERIC Educational Resources Information Center

Jones, Andrew T.

2011-01-01

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

ERIC Educational Resources Information Center

Kaskowitz, Gary S.; De Ayala, R. J.

2001-01-01

Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…

Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing with Short Test Length

ERIC Educational Resources Information Center

Wang, Chun

2013-01-01

Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to combine the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models aim at classifying examinees into the correct mastery profile group so as to pinpoint the strengths and weakness of each examinee whereas CAT algorithms choose items to determine those…
A Method for Generating Educational Test Items That Are Aligned to the Common Core State Standards

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Hogan, James B.; Matovinovic, Donna

2015-01-01

The demand for test items far outstrips the current supply. This increased demand can be attributed, in part, to the transition to computerized testing, but, it is also linked to dramatic changes in how 21st century educational assessments are designed and administered. One way to address this growing demand is with automatic item generation.…
A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

ERIC Educational Resources Information Center

Kingsbury, G. Gage; Zara, Anthony R.

1991-01-01

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)
Promoting the hydrostatic conceptual change test (HCCT) with four-tier diagnostic test item

NASA Astrophysics Data System (ADS)

Purwanto, M. G.; Nurliani, R.; Kaniawati, I.; Samsudin, A.

2018-05-01

Hydrostatic Conceptual Change Test (HCCT) is a diagnostic test instrument to identify students’ conception on Hydrostatic field. It is very important to support the learning process in the classroom. Based on that point of view, the researcher decided to develop HCCT instrument test into four-tier test diagnostic items. The resolve of this research is planned as the first step of four-tier test-formatted HCCT development as one of investigative test instrument on Hydrostatic. The research method used the 4D model which has four comprehensive steps: 1) defining, 2) designing, 3) developing and 4) disseminating. The instrument developed has been tried to 30 students in one of senior high schools. The data showed that four-tier- test-formatted HCCT is able to identify student’s conception level of Hydrostatic. In conclusion, the development of four-tier test-formatted HCCT is one of potential diagnostic test instrument that able to classify the category of students who misconception, no understanding, understanding, partial understanding and no codeable about concept of Hydrostatic.
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Practical Implications of Test Dimensionality for Item Response Theory Calibration of the Medical College Admission Test. MCAT Monograph.

ERIC Educational Resources Information Center

Childs, Ruth A.; Oppler, Scott H.

The use of item response theory (IRT) in the Medical College Admission Test (MCAT) testing program has been limited. This study provides a basis for future IRT analyses of the MCAT by exploring the dimensionality of each of the MCAT's three multiple-choice test sections (Verbal Reasoning, Physical Sciences, and Biological Sciences) and the…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Audio Adapted Assessment Data: Does the Addition of Audio to Written Items Modify the Item Calibration?

ERIC Educational Resources Information Center

Snyder, James

2010-01-01

This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…
A Procedure to Detect Item Bias Present Simultaneously in Several Items

DTIC Science & Technology

1991-04-25

exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a
Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

ERIC Educational Resources Information Center

Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

2017-01-01

In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
Developing multiple-choices test items as tools for measuring the scientific-generic skills on solar system

NASA Astrophysics Data System (ADS)

Bhakti, Satria Seto; Samsudin, Achmad; Chandra, Didi Teguh; Siahaan, Parsaoran

2017-05-01

The aim of research is developing multiple-choices test items as tools for measuring the scientific of generic skills on solar system. To achieve the aim that the researchers used the ADDIE model consisting Of: Analyzing, Design, Development, Implementation, dan Evaluation, all of this as a method research. While The scientific of generic skills limited research to five indicator including: (1) indirect observation, (2) awareness of the scale, (3) inference logic, (4) a causal relation, and (5) mathematical modeling. The participants are 32 students at one of junior high schools in Bandung. The result shown that multiple-choices that are constructed test items have been declared valid by the expert validator, and after the tests show that the matter of developing multiple-choices test items be able to measuring the scientific of generic skills on solar system.
Item-Writing Guidelines for Physics

ERIC Educational Resources Information Center

Regan, Tom

2015-01-01

A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…
The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

PubMed

Bodenburg, Sebastian; Dopslaff, Nina

2008-01-01

The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
Development of Test Items Related to Selected Concepts Within the Scheme the Particle Nature of Matter.

ERIC Educational Resources Information Center

Doran, Rodney L.; Pella, Milton O.

The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…
Item Selection and Pre-equating with Empirical Item Characteristic Curves.

ERIC Educational Resources Information Center

Livingston, Samuel A.

An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing

ERIC Educational Resources Information Center

Kang, Hyeon-Ah; Zhang, Susu; Chang, Hua-Hua

2017-01-01

The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery…
Expertise sensitive item selection.

PubMed

Chow, P; Russell, H; Traub, R E

2000-12-01

In this paper we describe and illustrate a procedure for selecting items from a large pool for a certification test. The proposed procedure, which is intended to improve the alignment of the certification test with on-the-job performance, is based on an expertise sensitive index. This index for an item is the difference between the item's p values for experts and novices. An example is provided of the application of the index for selecting items to be used in certifying bakers.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
POREWATER CHEMISTRY: EFFECTS OF SAMPLING, STORAGE, HANDLING, AND TOXICITY TESTING

EPA Science Inventory

As a general principle, it is nearly impossible to remove a porewater sample from sediment, use it in a toxicity testing vessel with test organisms, and prevent changes in the chemistry of the natural and anthropogenic organic and inorganic constituents. The degree of change in t...

A leukocyte activation test identifies food items which induce release of DNA by innate immune peripheral blood leucocytes.

PubMed

Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z

2018-01-01

Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.
Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life.

PubMed

Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P

2017-11-01

Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
How do video-based demonstration assessment tasks affect problem-solving process, test anxiety, chemistry anxiety and achievement in general chemistry students?

NASA Astrophysics Data System (ADS)

Terrell, Rosalind Stephanie

2001-12-01

Because paper-and-pencil testing provides limited knowledge about what students know about chemical phenomena, we have developed video-based demonstrations to broaden measurement of student learning. For example, students might be shown a video demonstrating equilibrium shifts. Two methods for viewing equilibrium shifts are changing the concentration of the reactants and changing the temperature of the system. The students are required to combine the data collected from the video and their knowledge of chemistry to determine which way the equilibrium shifts. Video-based demonstrations are important techniques for measuring student learning because they require students to apply conceptual knowledge learned in class to a specific chemical problem. This study explores how video-based demonstration assessment tasks affect problem-solving processes, test anxiety, chemistry anxiety and achievement in general chemistry students. Several instruments were used to determine students' knowledge about chemistry, students' test and chemistry anxiety before and after treatment. Think-aloud interviews were conducted to determine students' problem-solving processes after treatment. The treatment group was compared to a control group and a group watching video demonstrations. After treatment students' anxiety increased and achievement decreased. There were also no significant differences found in students' problem-solving processes following treatment. These negative findings may be attributed to several factors that will be explored in this study.
Performance of Certification and Recertification Examinees on Multiple Choice Test Items: Does Physician Age Have an Impact?

PubMed

Shen, Linjun; Juul, Dorthea; Faulkner, Larry R

2016-01-01

The development of recertification programs (now referred to as Maintenance of Certification or MOC) by the members of the American Board of Medical Specialties provides the opportunity to study knowledge base across the professional lifespan of physicians. Research results to date are mixed with some studies finding negative associations between age and various measures of competency and others finding no or minimal relationships. Four groups of multiple choice test items that were independently developed for certification and MOC examinations in psychiatry and neurology were administered to certification and MOC examinees within each specialty. Percent correct scores were calculated for each examinee. Differences between certification and MOC examinees were compared using unpaired t tests, and logistic regression was used to compare MOC and certification examinee performance on the common test items. Except for the neurology certification test items that addressed basic neurology concepts, the performance of the certification and MOC examinees was similar. The differences in performance on individual test items did not consistently favor one group or the other and could not be attributed to any distinguishable content or format characteristics of those items. The findings of this study are encouraging in that physicians who had recently completed residency training possessed clinical knowledge that was comparable to that of experienced physicians, and the experienced physicians' clinical knowledge was equivalent to that of recent residency graduates. The role testing can play in enhancing expertise is described.
Food and Nutrition (Intermediate). Performance Objectives and Criterion-Referenced Test Items.

ERIC Educational Resources Information Center

Missouri Univ., Columbia. Instructional Materials Lab.

This document contains competencies and criterion-referenced test items for the Intermediate Food and Nutrition semester course in Missouri that were derived from the duties and tasks of the Missouri homemaker and identified and validated by home economics teachers and subject matter specialists. The guide is designed to assist home economics…
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

PubMed

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Investigating Measurement Invariance in Computer-Based Personality Testing: The Impact of Using Anchor Items on Effect Size Indices

ERIC Educational Resources Information Center

Egberink, Iris J. L.; Meijer, Rob R.; Tendeiro, Jorge N.

2015-01-01

A popular method to assess measurement invariance of a particular item is based on likelihood ratio tests with all other items as anchor items. The results of this method are often only reported in terms of statistical significance, and researchers proposed different methods to empirically select anchor items. It is unclear, however, how many…
The Role of Psychometric Modeling in Test Validation: An Application of Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Schilling, Stephen G.

2007-01-01

In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…
Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

ERIC Educational Resources Information Center

Arendasy, Martin E.; Sommer, Markus

2012-01-01

The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Optimal Item Pool Design for a Highly Constrained Computerized Adaptive Test

ERIC Educational Resources Information Center

He, Wei

2010-01-01

Item pool quality has been regarded as one important factor to help realize enhanced measurement quality for the computerized adaptive test (CAT) (e.g., Flaugher, 2000; Jensema, 1977; McBride & Wise, 1976; Reckase, 1976; 2003; van der Linden, Ariel, & Veldkamp, 2006; Veldkamp & van der Linden, 2000; Xing & Hambleton, 2004). However, studies are…
Automatic Item Generation: A More Efficient Process for Developing Mathematics Achievement Items?

ERIC Educational Resources Information Center

Embretson, Susan E.; Kingston, Neal M.

2018-01-01

The continual supply of new items is crucial to maintaining quality for many tests. Automatic item generation (AIG) has the potential to rapidly increase the number of items that are available. However, the efficiency of AIG will be mitigated if the generated items must be submitted to traditional, time-consuming review processes. In two studies,…
Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test.

PubMed

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin

2018-01-01

The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

ERIC Educational Resources Information Center

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
Effects of Content Balancing and Item Selection Method on Ability Estimation in Computerized Adaptive Tests

ERIC Educational Resources Information Center

Sahin, Alper; Ozbasi, Durmus

2017-01-01

Purpose: This study aims to reveal effects of content balancing and item selection method on ability estimation in computerized adaptive tests by comparing Fisher's maximum information (FMI) and likelihood weighted information (LWI) methods. Research Methods: Four groups of examinees (250, 500, 750, 1000) and a bank of 500 items with 10 different…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

ERIC Educational Resources Information Center

Li, Zhushan

2014-01-01

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Geography library of Test Items. Volume Seven: A Selection of Test Items to Accompany the Resource Kit: Rice Growing & Rice Milling in South-Western New South Wales.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

Accompanying a multimedia resource unit on aspects of rice growing, volume eight of the geography collection includes a section introducing terminology, a viewing guide to the filmstrips and unit test items. Rice farming and marketing in Australia and growing methods in several countries are presented with regional studies in southeast Australia.…
Effects of Item Parameter Drift on Vertical Scaling with the Nonequivalent Groups with Anchor Test (NEAT) Design

ERIC Educational Resources Information Center

Ye, Meng; Xin, Tao

2014-01-01

The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…
What is the Ability Emotional Intelligence Test (MSCEIT) good for? An evaluation using item response theory.

PubMed

Fiori, Marina; Antonietti, Jean-Philippe; Mikolajczak, Moira; Luminet, Olivier; Hansenne, Michel; Rossier, Jérôme

2014-01-01

The ability approach has been indicated as promising for advancing research in emotional intelligence (EI). However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.

Model Choice and Sample Size in Item Response Theory Analysis of Aphasia Tests

ERIC Educational Resources Information Center

Hula, William D.; Fergadiotis, Gerasimos; Martin, Nadine

2012-01-01

Purpose: The purpose of this study was to identify the most appropriate item response theory (IRT) measurement model for aphasia tests requiring 2-choice responses and to determine whether small samples are adequate for estimating such models. Method: Pyramids and Palm Trees (Howard & Patterson, 1992) test data that had been collected from…
Measuring the development of conceptual understanding in chemistry

NASA Astrophysics Data System (ADS)

Claesgens, Jennifer Marie

The purpose of this dissertation research is to investigate and characterize how students learn chemistry from pre-instruction to deeper understanding of the subject matter in their general chemistry coursework. Based on preliminary work, I believe that students have a general pathway of learning across the "big ideas," or concepts, in chemistry that can be characterized over the course of instruction. My hypothesis is that as students learn chemistry they build from experience and logical reasoning then relate chemistry specific ideas in a pair-wise fashion before making more complete multi-relational links for deeper understanding of the subject matter. This proposed progression of student learning, which starts at Notions, moves to Recognition, and then to Formulation, is described in the ChemQuery Perspectives framework. My research continues the development of ChemQuery, an NSF-funded assessment system that uses a framework of the key ideas in the discipline and criterion-referenced analysis using item response theory (IRT) to map student progress. Specifially, this research investigates the potential for using criterion-referenced analysis to describe and measure how students learn chemistry followed by more detailed task analysis of patterns in student responses found in the data. My research question asks: does IRT work to describe and measure how students learn chemistry and if so, what is discovered about how students learn? Although my findings seem to neither entirely support nor entirely refute the pathway of student understanding proposed in the ChemQuery Perspectives framework. My research does provide an indication of trouble spots. For example, it seems like the pathway from Notions to Recognition is holding but there are difficulties around the transition from Recognition to Formulation that cannot be resolved with this data. Nevertheless, this research has produced the following, which has contributed to the development of the Chem
Technical Characteristics of the Peabody Individual Achievement Test as a Function of Item Arrangement and Basal and Ceiling Rules.

ERIC Educational Resources Information Center

Browning, Robert; And Others

1979-01-01

Effects that item order and basal and ceiling rules have on test means, variances, and internal consistency estimates for the Peabody Individual Achievement Test mathematics and reading recognition subtests were examined. Items on the math and reading recognition subtests were significantly easier or harder than test placements indicated. (Author)
Item Analyses of Memory Differences

PubMed Central

Salthouse, Timothy A.

2017-01-01

Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
A Bayesian Method for the Detection of Item Preknowledge in CAT. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.

ERIC Educational Resources Information Center

McLeod, Lori D.; Lewis, Charles; Thissen, David.

With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…
Testing Three-Item Versions for Seven of Young's Maladaptive Schema

ERIC Educational Resources Information Center

Blau, Gary; DiMino, John; Sheridan, Natalie; Pred, Robert S.; Beverly, Clyde; Chessler, Marcy

2015-01-01

The Young Schema Questionnaire (YSQ) in either long-form (205- item) or short-form (75-item or 90-item) versions has demonstrated its clinical usefulness for assessing early maladaptive schemas. However, even a 75 or 90-item "short form", particularly when combined with other measures, can represent a lengthy…
Ramsay-Curve Differential Item Functioning

ERIC Educational Resources Information Center

Woods, Carol M.

2011-01-01

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

PubMed Central

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

2018-01-01

Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with
On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2016-01-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the
Differential Item Functioning in While-Listening Performance Tests: The Case of the International English Language Testing System (IELTS) Listening Module

ERIC Educational Resources Information Center

Aryadoust, Vahid

2012-01-01

This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…
Investigating Linguistic Sources of Differential Item Functioning Using Expert Think-Aloud Protocols in Science Achievement Tests

NASA Astrophysics Data System (ADS)

Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye

2013-03-01

Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.
Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

PubMed

Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

2017-01-01

The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

PubMed

Sinharay, Sandip

2017-09-01

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua

2014-01-01

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…
Chemistry for whom? Gender awareness in teaching and learning chemistry

NASA Astrophysics Data System (ADS)

Andersson, Kristina

2017-06-01

Marie Ståhl and Anita Hussénius have defined what discourses dominate national tests in chemistry for Grade 9 in Sweden by using feminist, critical didactic perspectives. This response seeks to expand the results in Ståhl and Hussénius's article Chemistry inside an epistemological community box!— Discursive exclusions and inclusions in the Swedish national tests in chemistry, by using different facets of gender awareness. The first facet—Gender awareness in relations to the test designers' own conceptions—highlighted how the gender order where women are subordinated men becomes visible in the national tests as a consequence of the test designers internalized conceptions. The second facet—Gender awareness in relation to chemistry—discussed the hierarchy between discourses within chemistry. The third facet—Gender awareness in relation to students—problematized chemistry in relation to the students' identity formation. In summary, I suggest that the different discourses can open up new ways to interpret chemistry and perhaps dismantle the hegemonic chemistry discourse.
Student Perceptions of Chemistry Laboratory Learning Environments, Student-Teacher Interactions and Attitudes in Secondary School Gifted Education Classes in Singapore

NASA Astrophysics Data System (ADS)

Lang, Quek Choon; Wong, Angela F. L.; Fraser, Barry J.

2005-09-01

This study investigated the chemistry laboratory classroom environment, teacher-student interactions and student attitudes towards chemistry among 497 gifted and non-gifted secondary-school students in Singapore. The data were collected using the 35-item Chemistry Laboratory Environment Inventory (CLEI), the 48-item Questionnaire on Teacher Interaction (QTI) and the 30-item Questionnaire on Chemistry-Related Attitudes (QOCRA). Results supported the validity and reliability of the CLEI and QTI for this sample. Stream (gifted versus non-gifted) and gender differences were found in actual and preferred chemistry laboratory classroom environments and teacher-student interactions. Some statistically significant associations of modest magnitude were found between students' attitudes towards chemistry and both the laboratory classroom environment and the interpersonal behaviour of chemistry teachers. Suggestions for improving chemistry laboratory classroom environments and the teacher-student interactions for gifted students are provided.
Reliability of a science admission test (HAM-Nat) at Hamburg medical school.

PubMed

Hissbach, Johanna; Klusmann, Dietrich; Hampe, Wolfgang

2011-01-01

The University Hospital in Hamburg (UKE) started to develop a test of knowledge in natural sciences for admission to medical school in 2005 (Hamburger Auswahlverfahren für Medizinische Studiengänge, Naturwissenschaftsteil, HAM-Nat). This study is a step towards establishing the HAM-Nat. We are investigating parallel forms reliability, the effect of a crash course in chemistry on test results, and correlations of HAM-Nat test results with a test of scientific reasoning (similar to a subtest of the "Test for Medical Studies", TMS). 316 first-year students participated in the study in 2007. They completed different versions of the HAM-Nat test which consisted of items that had already been used (HN2006) and new items (HN2007). Four weeks later half of the participants were tested on the HN2007 version of the HAM-Nat again, while the other half completed the test of scientific reasoning. Within this four week interval students were offered a five day chemistry course. Parallel forms reliability for four different test versions ranged from r(tt)=.53 to r(tt)=.67. The retest reliabilities of the HN2007 halves were r(tt)=.54 and r(tt )=.61. Correlations of the two HAM-Nat versions with the test of scientific reasoning were r=.34 und r=.21. The crash course in chemistry had no effect on HAM-Nat scores. The results suggest that further versions of the test of natural sciences will not easily conform to the standards of internal consistency, parallel-forms reliability and retest reliability. Much care has to be taken in order to assemble items which could be used interchangeably for the construction of new test versions. The test of scientific reasoning and the HAM-Nat are tapping different constructs. Participation in a chemistry course did not improve students' achievement, probably because the content of the course was not coordinated with the test and many students lacked of motivation to do well in the second test.
Reliability of a science admission test (HAM-Nat) at Hamburg medical school

PubMed Central

Hissbach, Johanna; Klusmann, Dietrich; Hampe, Wolfgang

2011-01-01

Objective: The University Hospital in Hamburg (UKE) started to develop a test of knowledge in natural sciences for admission to medical school in 2005 (Hamburger Auswahlverfahren für Medizinische Studiengänge, Naturwissenschaftsteil, HAM-Nat). This study is a step towards establishing the HAM-Nat. We are investigating parallel forms reliability, the effect of a crash course in chemistry on test results, and correlations of HAM-Nat test results with a test of scientific reasoning (similar to a subtest of the "Test for Medical Studies", TMS). Methods: 316 first-year students participated in the study in 2007. They completed different versions of the HAM-Nat test which consisted of items that had already been used (HN2006) and new items (HN2007). Four weeks later half of the participants were tested on the HN2007 version of the HAM-Nat again, while the other half completed the test of scientific reasoning. Within this four week interval students were offered a five day chemistry course. Results: Parallel forms reliability for four different test versions ranged from rtt=.53 to rtt=.67. The retest reliabilities of the HN2007 halves were rtt=.54 and rtt =.61. Correlations of the two HAM-Nat versions with the test of scientific reasoning were r=.34 und r=.21. The crash course in chemistry had no effect on HAM-Nat scores. Conclusions: The results suggest that further versions of the test of natural sciences will not easily conform to the standards of internal consistency, parallel-forms reliability and retest reliability. Much care has to be taken in order to assemble items which could be used interchangeably for the construction of new test versions. The test of scientific reasoning and the HAM-Nat are tapping different constructs. Participation in a chemistry course did not improve students’ achievement, probably because the content of the course was not coordinated with the test and many students lacked of motivation to do well in the second test. PMID:21866246
Item Content of the Group Personality Projective Test

ERIC Educational Resources Information Center

Boudreaux, Ronald F.; Dreger, Ralph M.

1974-01-01

Examined the content factors of the GPPT using factor analytic procedures based on item intercorrelations, in contrast to the published version's use of part scores from a prior groupings of items. In terms of what it proposes to measure, it was concluded that the GPPT has very limited utility. (Author/RC)

PISA Test Items and School-Based Examinations in Greece: Exploring the Relationship between Global and Local Assessment Discourses

ERIC Educational Resources Information Center

Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

2013-01-01

The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…
Using Response-Time Constraints in Item Selection To Control for Differential Speededness in Computerized Adaptive Testing. LSAC Research Report Series.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.

This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
Item Banking. ERIC/AE Digest.

ERIC Educational Resources Information Center

Rudner, Lawrence

This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…
Clinical utility of a single-item test for DSM-5 alcohol use disorder among outpatients with anxiety and depressive disorders.

PubMed

Bartoli, Francesco; Crocamo, Cristina; Biagi, Enrico; Di Carlo, Francesco; Parma, Francesca; Madeddu, Fabio; Capuzzi, Enrico; Colmegna, Fabrizia; Clerici, Massimo; Carrà, Giuseppe

2016-08-01

There is a lack of studies testing accuracy of fast screening methods for alcohol use disorder in mental health settings. We aimed at estimating clinical utility of a standard single-item test for case finding and screening of DSM-5 alcohol use disorder among individuals suffering from anxiety and mood disorders. We recruited adults consecutively referred, in a 12-month period, to an outpatient clinic for anxiety and depressive disorders. We assessed the National Institute on Alcohol Abuse and Alcoholism (NIAAA) single-item test, using the Mini- International Neuropsychiatric Interview (MINI), plus an additional item of Composite International Diagnostic Interview (CIDI) for craving, as reference standard to diagnose a current DSM-5 alcohol use disorder. We estimated sensitivity and specificity of the single-item test, as well as positive and negative Clinical Utility Indexes (CUIs). 242 subjects with anxiety and mood disorders were included. The NIAAA single-item test showed high sensitivity (91.9%) and specificity (91.2%) for DSM-5 alcohol use disorder. The positive CUI was 0.601, whereas the negative one was 0.898, with excellent values also accounting for main individual characteristics (age, gender, diagnosis, psychological distress levels, smoking status). Testing for relevant indexes, we found an excellent clinical utility of the NIAAA single-item test for screening true negative cases. Our findings support a routine use of reliable methods for rapid screening in similar mental health settings. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

ERIC Educational Resources Information Center

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
Learning to Think Spatially: What Do Students "See" in Numeracy Test Items?

ERIC Educational Resources Information Center

Diezmann, Carmel M.; Lowrie, Tom

2012-01-01

Learning to think spatially in mathematics involves developing proficiency with graphics. This paper reports on 2 investigations of spatial thinking and graphics. The first investigation explored the importance of graphics as 1 of 3 communication systems (i.e. text, symbols, graphics) used to provide information in numeracy test items. The results…
Application of a Multidimensional Nested Logit Model to Multiple-Choice Test Items

ERIC Educational Resources Information Center

Bolt, Daniel M.; Wollack, James A.; Suh, Youngsuk

2012-01-01

Nested logit models have been presented as an alternative to multinomial logistic models for multiple-choice test items (Suh and Bolt in "Psychometrika" 75:454-473, 2010) and possess a mathematical structure that naturally lends itself to evaluating the incremental information provided by attending to distractor selection in scoring. One potential…
Domain-General and Domain-Specific Creative-Thinking Tests: Effects of Gender and Item Content on Test Performance

ERIC Educational Resources Information Center

Hong, Eunsook; Peng, Yun; O'Neil, Harold F., Jr.; Wu, Junbin

2013-01-01

The study examined the effects of gender and item content of domain-general and domain-specific creative-thinking tests on four subscale scores of creative-thinking (fluency, flexibility, originality, and elaboration). Chinese tenth-grade students (234 males and 244 females) participated in the study. Domain-general creative thinking was measured…
Effects of Calibration Sample Size and Item Bank Size on Ability Estimation in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Sahin, Alper; Weiss, David J.

2015-01-01

This study aimed to investigate the effects of calibration sample size and item bank size on examinee ability estimation in computerized adaptive testing (CAT). For this purpose, a 500-item bank pre-calibrated using the three-parameter logistic model with 10,000 examinees was simulated. Calibration samples of varying sizes (150, 250, 350, 500,…
A study of the factors affecting the attitudes of young female students toward chemistry at the high school level

NASA Astrophysics Data System (ADS)

Banya, Santonino K.

usefulness of chemistry on their attitudes toward the study of chemistry. Both quantitative (a Likert-type Scale questionnaire) and qualitative (open-ended questions) items were used to investigate the views of young female students. Results of the survey were analysed using a correlation test. Significant differences were found in the Likert-type scale scores, providing evidences supporting literature that suggests, self-confidence toward chemistry, the influence of role models, and knowledge about the usefulness of chemistry affect the decision of young female students about the study of chemistry. Interview responses corroborated the results from the survey. Strategies for addressing the problems and recommendations for further studies have been suggested.
Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank.

PubMed

Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; de Beurs, Edwin

2017-03-01

We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample ( N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development.
Applying Multidimensional Item Response Theory Models in Validating Test Dimensionality: An Example of K-12 Large-Scale Science Assessment

ERIC Educational Resources Information Center

Li, Ying; Jiao, Hong; Lissitz, Robert W.

2012-01-01

This study investigated the application of multidimensional item response theory (IRT) models to validate test structure and dimensionality. Multiple content areas or domains within a single subject often exist in large-scale achievement tests. Such areas or domains may cause multidimensionality or local item dependence, which both violate the…
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework

ERIC Educational Resources Information Center

Debeer, Dries; Janssen, Rianne

2013-01-01

Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
Language Effects in International Testing: The Case of PISA 2006 Science Items

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art

2016-01-01

We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Linking Outcomes from Peabody Picture Vocabulary Test Forms Using Item Response Models

ERIC Educational Resources Information Center

Hoffman, Lesa; Templin, Jonathan; Rice, Mabel L.

2012-01-01

Purpose: The present work describes how vocabulary ability as assessed by 3 different forms of the Peabody Picture Vocabulary Test (PPVT; Dunn & Dunn, 1997) can be placed on a common latent metric through item response theory (IRT) modeling, by which valid comparisons of ability between samples or over time can then be made. Method: Responses…
An Empirical Investigation of the Potential Impact of Item Misfit on Test Scores. Research Report. ETS RR-17-60

ERIC Educational Resources Information Center

Kim, Sooyeon; Robin, Frederic

2017-01-01

In this study, we examined the potential impact of item misfit on the reported scores of an admission test from the subpopulation invariance perspective. The target population of the test consisted of 3 major subgroups with different geographic regions. We used the logistic regression function to estimate item parameters of the operational items…
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

PubMed

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Automated Item Banking and Test Development. Final Technical Paper for Period October 1987-April 1988.

ERIC Educational Resources Information Center

Lee, William M.; And Others

Projects to develop an automated item banking and test development system have been undertaken on several occasions at the Air Force Human Resources Laboratory (AFHRL) throughout the past 10 years. Such a system permits the construction of tests in far less time and with a higher degree of accuracy than earlier test construction procedures. This…

A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

PubMed

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Gender and Minority Achievement Gaps in Science in Eighth Grade: Item Analyses of Nationally Representative Data. Research Report. ETS RR-17-36

ERIC Educational Resources Information Center

Qian, Xiaoyu; Nandakumar, Ratna; Glutting, Joseoph; Ford, Danielle; Fifield, Steve

2017-01-01

In this study, we investigated gender and minority achievement gaps on 8th-grade science items employing a multilevel item response methodology. Both gaps were wider on physics and earth science items than on biology and chemistry items. Larger gender gaps were found on items with specific topics favoring male students than other items, for…
Corrections of clinical chemistry test results in a laboratory information system.

PubMed

Wang, Sihe; Ho, Virginia

2004-08-01

The recently released reports by the Institute of Medicine, To Err Is Human and Patient Safety, have received national attention because of their focus on the problem of medical errors. Although a small number of studies have reported on errors in general clinical laboratories, there are, to our knowledge, no reported studies that focus on errors in pediatric clinical laboratory testing. To characterize the errors that have caused corrections to have to be made in pediatric clinical chemistry results in the laboratory information system, Misys. To provide initial data on the errors detected in pediatric clinical chemistry laboratories in order to improve patient safety in pediatric health care. All clinical chemistry staff members were informed of the study and were requested to report in writing when a correction was made in the laboratory information system, Misys. Errors were detected either by the clinicians (the results did not fit the patients' clinical conditions) or by the laboratory technologists (the results were double-checked, and the worksheets were carefully examined twice a day). No incident that was discovered before or during the final validation was included. On each Monday of the study, we generated a report from Misys that listed all of the corrections made during the previous week. We then categorized the corrections according to the types and stages of the incidents that led to the corrections. A total of 187 incidents were detected during the 10-month study, representing a 0.26% error detection rate per requisition. The distribution of the detected incidents included 31 (17%) preanalytic incidents, 46 (25%) analytic incidents, and 110 (59%) postanalytic incidents. The errors related to noninterfaced tests accounted for 50% of the total incidents and for 37% of the affected tests and orderable panels, while the noninterfaced tests and panels accounted for 17% of the total test volume in our laboratory. This pilot study provided the rate and
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

NASA Astrophysics Data System (ADS)

Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

2011-01-01

Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
The Effect of Background Knowledge and Reading Comprehension Test Items on Male and Female Performance

ERIC Educational Resources Information Center

Yazdanpanah, Khatereh

2007-01-01

The present article reports on a study investigating the interaction of a reading comprehension test with gender in a formal testing context and the performance of males and females on reading test items with regard to demands on strategy use. Participants were 187 (59 = female and 128 = male) international students ranging from 17 to 20 years of…
The Dependence on Mathematical Theory in TIMSS, PISA and TIMSS Advanced Test Items and Its Relation to Student Achievement

ERIC Educational Resources Information Center

Hole, Arne; Grønmo, Liv Sissel; Onstad, Torgeir

2018-01-01

Background: This paper discusses a framework for analyzing the dependence on mathematical theory in test items, that is, a framework for discussing to what extent knowledge of mathematical theory is helpful for the student in solving the item. The framework can be applied to any test in which some knowledge of mathematical theory may be useful,…
Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

ERIC Educational Resources Information Center

Kim, Seonghoon

2013-01-01

With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

ERIC Educational Resources Information Center

Cao, Yi; Lu, Ru; Tao, Wei

2014-01-01

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Elicited Speech from Graph Items on the Test of Spoken English[TM]. Research Reports. Report 74. RR-04-06

ERIC Educational Resources Information Center

Katz, Irvin R.; Xi, Xiaoming; Kim, Hyun-Joo; Cheng, Peter C. H.

2004-01-01

This research applied a cognitive model to identify item features that lead to irrelevant variance on the Test of Spoken English[TM] (TSE[R]). The TSE is an assessment of English oral proficiency and includes an item that elicits a description of a statistical graph. This item type sometimes appears to tap graph-reading skills--an irrelevant…
How Do Undergraduate Students Conceptualize Acid-Base Chemistry? Measurement of a Concept Progression

ERIC Educational Resources Information Center

Romine, William L.; Todd, Amber N.; Clark, Travis B.

2016-01-01

We developed and validated a new instrument, called "Measuring Concept progressions in Acid-Base chemistry" (MCAB) and used it to better understand the progression of undergraduate students' understandings about acid-base chemistry. Items were developed based on an existing learning progression for acid-base chemistry. We used the Rasch…
Firestar-"D": Computerized Adaptive Testing Simulation Program for Dichotomous Item Response Theory Models

ERIC Educational Resources Information Center

Choi, Seung W.; Podrabsky, Tracy; McKinney, Natalie

2012-01-01

Computerized adaptive testing (CAT) enables efficient and flexible measurement of latent constructs. The majority of educational and cognitive measurement constructs are based on dichotomous item response theory (IRT) models. An integral part of developing various components of a CAT system is conducting simulations using both known and empirical…
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model

ERIC Educational Resources Information Center

Wang, Wen-Chung; Wilson, Mark

2005-01-01

This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
The Orion Test Capsule and a number of other items used in the c

NASA Image and Video Library

2013-08-07

The Orion Test Capsule and a number of other items used in the capsule recovery at being transported down the James River on a Navy INLS "Improved Navy Lighterage System" from Fort Eustis from where it was loaded. Its liquid route will take them to Little Creek Amphibious Base in Norfolk, where it will stay until scheduled recovery test will be performed.
Relationship of students' conceptual representations and problem-solving abilities in acid-base chemistry

NASA Astrophysics Data System (ADS)

Powers, Angela R.

2000-10-01

This study explored the relationship between secondary chemistry students' conceptual representations of acid-base chemistry, as shown in student-constructed concept maps, and their ability to solve acid-base problems, represented by their score on an 18-item paper and pencil test, the Acid-Base Concept Assessment (ABCA). The ABCA, consisting of both multiple-choice and short-answer items, was originally designed using a question-type by subtopic matrix, validated by a panel of experts, and refined through pilot studies and factor analysis to create the final instrument. The concept map task included a short introduction to concept mapping, a prototype concept map, a practice concept-mapping activity, and the instructions for the acid-base concept map task. The instruments were administered to chemistry students at two high schools; 108 subjects completed both instruments for this study. Factor analysis of ABCA results indicated that the test was unifactorial for these students, despite the intention to create an instrument with multiple "question-type" scales. Concept maps were scored both holistically and by counting valid concepts. The two approaches were highly correlated (r = 0.75). The correlation between ABCA score and concept-map score was 0.29 for holistically-scored concept maps and 0.33 for counted-concept maps. Although both correlations were significant, they accounted for only 8.8 and 10.2% of variance in ABCA scores, respectively. However, when the reliability of the instruments used is considered, more than 20% of the variance in ABCA scores may be explained by concept map scores. MANOVAs for ABCA and concept map scores by instructor, student gender, and year in school showed significant differences for both holistic and counted concept-map scores. Discriminant analysis revealed that the source of these differences was the instruction variable. Significant differences between classes receiving different instruction were found in the frequency of
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

PubMed Central

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

ERIC Educational Resources Information Center

Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.

2012-01-01

Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Effects of Test Format, Self Concept and Anxiety on Item Response Changing Behaviour

ERIC Educational Resources Information Center

Afolabi, E. R. I.

2007-01-01

The study examined the effects of item format, self-concept and anxiety on response changing behaviour. Four hundred undergraduate students who offered a counseling psychology course in a Nigerian university participated in the study. Students' answers in multiple--choice and true--false formats of an achievement test were observed for response…
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

PubMed

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

ERIC Educational Resources Information Center

Eleje, Lydia I.; Esomonu, Nkechi P. M.

2018-01-01

A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…

Anatomy of a physics test: Validation of the physics items on the Texas Assessment of Knowledge and Skills

NASA Astrophysics Data System (ADS)

Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry

2009-06-01

We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
Collaborative learning and testing in introductory general chemistry

NASA Astrophysics Data System (ADS)

Amaral, Katie Elizabeth

Students taking General chemistry at the University of Florida are either well-prepared or under-prepared. To meet the needs of the under-prepared students, an introductory course (CHM 1025) was developed. An accurate method of placement into CHM 1025 or the mainstream course (CHM 2045) was needed. The Chemistry Readiness Assessment Exam was written and tested and students are advised to take either course based upon their scores. The accuracy of the cutoff scores was examined, with the minimum passing chemistry score lowered to six correct out of 18, and the math score raised to six correct out of eight. Collaborative problem-solving sessions were held during every CHM 1025 class. These sessions were shown to increase student achievement in CHM 1025. Group placement was also shown to have an effect on student achievement in the course. Students placed randomly into collaborative groups had the highest average GPA, while students placed by achievement had the lowest average GPA. The efficacy of CHM 1025 was examined to determine if the students who required the course do as well in CHM 2045 as those students who did not need it. Students who had taken CHM 1025 had a higher GPA in CHM 2045 than the students who went directly into CHM 2045. Students in the spring semester of 2004 took collaborative exams. Achievement levels of students who had collaborative exams were compared to students who took traditional exams to determine if collaborative testing had an effect on student achievement and retention in CHM 1025. There was no significant difference in achievement although the collaborative exams were harder. Percentages of students taking each exam were also compared, with more students taking the collaborative exams. Finally, undergraduate students called peer mentors, who had taken CHM 1025, were recruited to assist with the course. Mentors helped CHM 1025 students with the collaborative problems. The mentors' presence helped lower students' withdrawal rates in
Development of Abbreviated Nine-Item Forms of the Raven's Standard Progressive Matrices Test

ERIC Educational Resources Information Center

Bilker, Warren B.; Hansen, John A.; Brensinger, Colleen M.; Richard, Jan; Gur, Raquel E.; Gur, Ruben C.

2012-01-01

The Raven's Standard Progressive Matrices (RSPM) is a 60-item test for measuring abstract reasoning, considered a nonverbal estimate of fluid intelligence, and often included in clinical assessment batteries and research on patients with cognitive deficits. The goal was to develop and apply a predictive model approach to reduce the number of items…
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

PubMed

Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

2009-04-01

To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.
Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

ERIC Educational Resources Information Center

Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

2010-01-01

This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Cognitive testing of tobacco use items for administration to patients with cancer and cancer survivors in clinical research.

PubMed

Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A

2016-06-01

To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.
Phase Equilibrium, Chemical Equilibrium, and a Test of the Third Law: Experiments for Physical Chemistry.

ERIC Educational Resources Information Center

Dannhauser, Walter

1980-01-01

Described is an experiment designed to provide an experimental basis for a unifying point of view (utilizing theoretical framework and chemistry laboratory experiments) for physical chemistry students. Three experiments are described: phase equilibrium, chemical equilibrium, and a test of the third law of thermodynamics. (Author/DS)
Instructional Sensitivity Statistics Appropriate for Objectives-Based Test Items. CSE Report No. 91.

ERIC Educational Resources Information Center

Kosecoff, Jacqueline B.; Klein, Stephen P.

Two types of sensitivity indices were developed in this paper, one internal to the total test and the second external. To evaluate the success of these statistics the three criteria suggested for a satisfactory index of item quality were considered. The Internal Sensitivity Index appears to meet these demands. Certainly it is easily computed. In…
Development of the Flame Test Concept Inventory: Measuring Student Thinking about Atomic Emission

ERIC Educational Resources Information Center

Bretz, Stacey Lowery; Murata Mayo, Ana Vasquez

2018-01-01

This study reports the development of a 19-item Flame Test Concept Inventory, an assessment tool to measure students' understanding of atomic emission. Fifty-two students enrolled in secondary and postsecondary chemistry courses were interviewed about atomic emission and explicitly asked to explain flame test demonstrations and energy level…
The psychometric properties of the "Reading the Mind in the Eyes" Test: an item response theory (IRT) analysis.

PubMed

Preti, Antonio; Vellante, Marcello; Petretto, Donatella R

2017-05-01

The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
A Guide to Item Banking in Education. (Third Edition).

ERIC Educational Resources Information Center

Naccarato, Richard W.

The current status of banks of test items existing across the United States was determined through a survey conducted between September and December 1987. Item "bank" in this context does not imply that the test items are available in computerized form, but simply that "deposited" test items can be withdrawn for use. Emphasis…
A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose

2010-01-01

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…
Comparison of Procedures for Detecting Test-Item Bias with Both Internal and External Ability Criteria.

ERIC Educational Resources Information Center

Shepard, Lorrie, And Others

1981-01-01

Sixteen approaches for detecting item bias were compared on samples of Black, White, and Chicano elementary school pupils using the Lorge-Thorndike and Raven's Coloured Progressive Matrices tests. Recommendations for practical use are made. (JKS)
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Enhanced Automatic Question Creator--EAQC: Concept, Development and Evaluation of an Automatic Test Item Creation Tool to Foster Modern e-Education

ERIC Educational Resources Information Center

Gutl, Christian; Lankmayr, Klaus; Weinhofer, Joachim; Hofler, Margit

2011-01-01

Research in automated creation of test items for assessment purposes became increasingly important during the recent years. Due to automatic question creation it is possible to support personalized and self-directed learning activities by preparing appropriate and individualized test items quite easily with relatively little effort or even fully…
Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models

ERIC Educational Resources Information Center

Huitzing, Hiddo A.

2004-01-01

This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be…
The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

ERIC Educational Resources Information Center

Hamadneh, Iyad Mohammed

2015-01-01

This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
Memory for Items and Relationships among Items Embedded in Realistic Scenes: Disproportionate Relational Memory Impairments in Amnesia

PubMed Central

Hannula, Deborah E.; Tranel, Daniel; Allen, John S.; Kirchhoff, Brenda A.; Nickel, Allison E.; Cohen, Neal J.

2014-01-01

Objective The objective of this study was to examine the dependence of item memory and relational memory on medial temporal lobe (MTL) structures. Patients with amnesia, who either had extensive MTL damage or damage that was relatively restricted to the hippocampus, were tested, as was a matched comparison group. Disproportionate relational memory impairments were predicted for both patient groups, and those with extensive MTL damage were also expected to have impaired item memory. Method Participants studied scenes, and were tested with interleaved two-alternative forced-choice probe trials. Probe trials were either presented immediately after the corresponding study trial (lag 1), five trials later (lag 5), or nine trials later (lag 9) and consisted of the studied scene along with a manipulated version of that scene in which one item was replaced with a different exemplar (item memory test) or was moved to a new location (relational memory test). Participants were to identify the exact match of the studied scene. Results As predicted, patients were disproportionately impaired on the test of relational memory. Item memory performance was marginally poorer among patients with extensive MTL damage, but both groups were impaired relative to matched comparison participants. Impaired performance was evident at all lags, including the shortest possible lag (lag 1). Conclusions The results are consistent with the proposed role of the hippocampus in relational memory binding and representation, even at short delays, and suggest that the hippocampus may also contribute to successful item memory when items are embedded in complex scenes. PMID:25068665
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…

Demand Characteristics of Multiple-Choice Items.

ERIC Educational Resources Information Center

Diamond, James J.; Williams, David V.

Thirteen graduate students were asked to indicate for each of 24 multiple-choice items whether the item tested "recall of specific information," a "higher order skill," or "don't know." The students were also asked to state their general basis for judging the items. The 24 items had been previously classified according to Bloom's cognitive-skills…
Automated Item Generation with Recurrent Neural Networks.

PubMed

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
A Study on Detecting of Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

ERIC Educational Resources Information Center

Çikirikçi Demirtasli, Nükhet; Ulutas, Seher

2015-01-01

Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
Application of a Method of Estimating DIF for Polytomous Test Items.

ERIC Educational Resources Information Center

Camilli, Gregory; Congdon, Peter

1999-01-01

Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)
Effect of Item Arrangement, Knowledge of Arrangement, and Test Anxiety on Two Scoring Methods.

ERIC Educational Resources Information Center

Plake, Barbara S.; And Others

1981-01-01

Number right and elimination scores were analyzed on a college level mathematics exam assembled from pretest data. Anxiety measures were administered along with the experimental forms to undergraduates. Results suggest that neither test scores nor attitudes are influenced by item order knowledge thereof, or anxiety level. (Author/GK)
A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

ERIC Educational Resources Information Center

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.

2010-01-01

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
Friendship chemistry: An examination of underlying factors☆.

PubMed

Campbell, Kelly; Holderness, Nicole; Riggs, Matt

2015-06-01

Interpersonal chemistry refers to a connection between two individuals that exists upon first meeting. The goal of the current study is to identify beliefs about the underlying components of friendship chemistry. Individuals respond to an online Friendship Chemistry Questionnaire containing items that are derived from interdependence theory and the friendship formation literature. Participants are randomly divided into two subsamples. A principal axis factor analysis with promax rotation is performed on subsample 1 and produces 5 factors: Reciprocal candor, mutual interest, personableness, similarity, and physical attraction. A confirmatory factor analysis is conducted using subsample 2 and provides support for the 5-factor model. Participants with agreeable, open, and conscientious personalities more commonly report experiencing friendship chemistry, as do those who are female, young, and European/white. Responses from participants who have never experienced chemistry are qualitatively analyzed. Limitations and directions for future research are discussed.
Friendship chemistry: An examination of underlying factors☆

PubMed Central

Campbell, Kelly; Holderness, Nicole; Riggs, Matt

2015-01-01

Interpersonal chemistry refers to a connection between two individuals that exists upon first meeting. The goal of the current study is to identify beliefs about the underlying components of friendship chemistry. Individuals respond to an online Friendship Chemistry Questionnaire containing items that are derived from interdependence theory and the friendship formation literature. Participants are randomly divided into two subsamples. A principal axis factor analysis with promax rotation is performed on subsample 1 and produces 5 factors: Reciprocal candor, mutual interest, personableness, similarity, and physical attraction. A confirmatory factor analysis is conducted using subsample 2 and provides support for the 5-factor model. Participants with agreeable, open, and conscientious personalities more commonly report experiencing friendship chemistry, as do those who are female, young, and European/white. Responses from participants who have never experienced chemistry are qualitatively analyzed. Limitations and directions for future research are discussed. PMID:26097283
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Limited-information goodness-of-fit testing of diagnostic classification item response models.

PubMed

Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

2016-11-01

Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full-information test statistics such as Pearson's X 2 and the likelihood ratio statistic G 2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited-information fit statistics such as Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu-Olivares and Joe's (2006, Psychometrika, 71, 713) M 2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M 2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q-matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M 2 was largely insensitive to misspecifications in the distribution of higher-order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M 2 , we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic XLD2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The XLD2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful
A Case Study on an Item Writing Process: Use of Test Specifications, Nature of Group Dynamics, and Individual Item Writers' Characteristics

ERIC Educational Resources Information Center

Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa

2010-01-01

This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…
Teachers' Use of Test-Item Banks for Student Assessment in North Carolina Secondary Agricultural Education Programs

ERIC Educational Resources Information Center

Marshall, Joy Morgan

2014-01-01

Higher expectations are on all parties to ensure students successfully perform on standardized tests. Specifically in North Carolina agriculture classes, students are given a CTE Post Assessment to measure knowledge gained and proficiency. Prior to students taking the CTE Post Assessment, teachers have access to a test item bank system that…
Differential Item Functioning Analysis of High-Stakes Test in Terms of Gender: A Rasch Model Approach

ERIC Educational Resources Information Center

Alavi, Seyed Mohammad; Bordbar, Soodeh

2017-01-01

Differential Item Functioning (DIF) analysis is a key element in evaluating educational test fairness and validity. One of the frequently cited sources of construct-irrelevant variance is gender which has an important role in the university entrance exam; therefore, it causes bias and consequently undermines test validity. The present study aims…
How Gender and College Chemistry Experience Influence Mental Rotation Ability.

ERIC Educational Resources Information Center

Brownlow, Sheila; Miderski, Carol Ann

Deficits in spatial abilities, particularly Mental Rotation (MR), may contribute to women's avoidance of areas of study (such as chemistry) that rely on MR. Those women who do succeed in chemistry may do so because they have MT skills that are on par with their male peers. We examined MR ability on 12 items from the Vandenberg and Kuse MR test…
Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

PubMed

Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

2017-11-01

The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
The Nursing Home Physical Performance Test: A Secondary Data Analysis of Women in Long-Term Care Using Item Response Theory.

PubMed

Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L

2017-04-11

The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
7 CFR 2902.5 - Item designation.

Code of Federal Regulations, 2011 CFR

2011-01-01

..., USDA will use life cycle cost information only from tests using the BEES analytical method. (c... availability of such items and the economic and technological feasibility of using such items, including life cycle costs. USDA will gather information on individual products within an item and extrapolate that...
Assessing Conceptual and Algorithmic Knowledge in General Chemistry with ACS Exams

ERIC Educational Resources Information Center

Holme, Thomas; Murphy, Kristen

2011-01-01

In 2005, the ACS Examinations Institute released an exam for first-term general chemistry in which items are intentionally paired with one conceptual and one traditional item. A second-term, paired-questions exam was released in 2007. This paper presents an empirical study of student performances on these two exams based on national samples of…
The role of attention in item-item binding in visual working memory.

PubMed

Peterson, Dwight J; Naveh-Benjamin, Moshe

2017-09-01

An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating. Research Report. ETS RR-12-09

ERIC Educational Resources Information Center

Li, Yanmei

2012-01-01

In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
The terminator "toy" chemistry test: A simple tool to assess errors in transport schemes

DOE PAGES

Lauritzen, P. H.; Conley, A. J.; Lamarque, J. -F.; ...

2015-05-04

This test extends the evaluation of transport schemes from prescribed advection of inert scalars to reactive species. The test consists of transporting two interacting chemical species in the Nair and Lauritzen 2-D idealized flow field. The sources and sinks for these two species are given by a simple, but non-linear, "toy" chemistry that represents combination (X+X → X 2) and dissociation (X 2 → X+X). This chemistry mimics photolysis-driven conditions near the solar terminator, where strong gradients in the spatial distribution of the species develop near its edge. Despite the large spatial variations in each species, the weighted sum Xmore » T = X+2X 2 should always be preserved at spatial scales at which molecular diffusion is excluded. The terminator test demonstrates how well the advection–transport scheme preserves linear correlations. Chemistry–transport (physics–dynamics) coupling can also be studied with this test. Examples of the consequences of this test are shown for illustration.« less
Evaluation of item candidates for a diabetic retinopathy quality of life item bank.

PubMed

Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L

2013-09-01

We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
An Internal Construct Validation Study of the "Iowa Tests of Basic Skills" (Level 12, Form G) Reading Comprehension Test Items.

ERIC Educational Resources Information Center

Perkins, Kyle; Duncan, Ann

An assessment analysis was performed to determine whether sets of items designed to measure three different subskills of reading comprehension of the Iowa Tests of Basic Skills (ITBSs) did, in fact, distinguish among these subskills. The three major skills objectives were: (1) facts; (2) generalizations; and (3) inferences. Data from…
Item-method directed forgetting: Effects at retrieval?

PubMed

Taylor, Tracy L; Cutmore, Laura; Pries, Lotta

2018-02-01

In an item-method directed forgetting paradigm, words are presented one at a time, each followed by an instruction to Remember or Forget; a directed forgetting effect is measured as better subsequent memory for Remember words than Forget words. The dominant view is that the directed forgetting effect arises during encoding due to selective rehearsal of Remember over Forget items. In three experiments we attempted to falsify a strong view that directed forgetting effects in recognition are due only to encoding mechanisms when an item method is used. Across 3 experiments we tested for retrieval-based processes by colour-coding the recognition test items. Black colour provided no information; green colour cued a potential Remember item; and, red colour cued a potential Forget item. Recognition cues were mixed within-blocks in Experiment 1 and between-blocks in Experiments 2 and 3; Experiment 3 added explicit feedback on the accuracy of the recognition decision. Although overall recognition improved with cuing when explicit test performance feedback was added in Experiment 3, in no case was the magnitude of the directed forgetting effect influenced by recognition cueing. Our results argue against a role for retrieval-based strategies that limit recognition of Forget items at test and posit a role for encoding intentions only. Copyright © 2017 Elsevier B.V. All rights reserved.
Do Chemistry-Climate Models Project the Same Greenhouse Gas Chemistry if Initialized with Observations of the Trace Gases: A Pre-ATom Test

NASA Astrophysics Data System (ADS)

Flynn, C. M.; Prather, M. J.; Zhu, X.; Strode, S. A.; Steenrod, S. D.; Strahan, S. E.; Lamarque, J. F.; Fiore, A. M.; Horowitz, L. W.; Mao, J.; Murray, L. T.; Shindell, D. T.

2016-12-01

sensitivity tests in which each of the major precursors (e.g., NOx, HOOH, HCHO, CO) is perturbed by 10%. Such sensitivity tests can help determine the causes of model differences. Overall, this new approach allows us to characterize each model's chemistry package for a wide range of designated chemical composition. The real test will be with ATom data next year.
Applications of Multidimensional Item Response Theory Models with Covariates to Longitudinal Test Data. Research Report. ETS RR-16-21

ERIC Educational Resources Information Center

Fu, Jianbin

2016-01-01

The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Analysis of Item Response Patterns: Consistency Indices and Their Application to Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Harnisch, Delwyn L.

The major emphasis of this paper is in the examination of test item response patterns. Tatsuoka and Tatsuoka (1980) have developed two indices of response consistency: the norm-conformity index (NCI) and the individual consistency index (ICI). The NCI provides a measure of the degree of consistency between the response pattern of an individual and…
Nursing Faculty Decision Making about Best Practices in Test Construction, Item Analysis, and Revision

ERIC Educational Resources Information Center

Killingsworth, Erin Elizabeth

2013-01-01

With the widespread use of classroom exams in nursing education there is a great need for research on current practices in nursing education regarding this form of assessment. The purpose of this study was to explore how nursing faculty members make decisions about using best practices in classroom test construction, item analysis, and revision in…
42 CFR 493.929 - Chemistry.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 5 2010-10-01 2010-10-01 false Chemistry. 493.929 Section 493.929 Public Health... Proficiency Testing Programs by Specialty and Subspecialty § 493.929 Chemistry. The subspecialties under the specialty of chemistry for which a proficiency testing program may offer proficiency testing are routine...
42 CFR 493.929 - Chemistry.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 5 2012-10-01 2012-10-01 false Chemistry. 493.929 Section 493.929 Public Health... Proficiency Testing Programs by Specialty and Subspecialty § 493.929 Chemistry. The subspecialties under the specialty of chemistry for which a proficiency testing program may offer proficiency testing are routine...
42 CFR 493.929 - Chemistry.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 5 2014-10-01 2014-10-01 false Chemistry. 493.929 Section 493.929 Public Health... Proficiency Testing Programs by Specialty and Subspecialty § 493.929 Chemistry. The subspecialties under the specialty of chemistry for which a proficiency testing program may offer proficiency testing are routine...
42 CFR 493.929 - Chemistry.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 5 2013-10-01 2013-10-01 false Chemistry. 493.929 Section 493.929 Public Health... Proficiency Testing Programs by Specialty and Subspecialty § 493.929 Chemistry. The subspecialties under the specialty of chemistry for which a proficiency testing program may offer proficiency testing are routine...
42 CFR 493.929 - Chemistry.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 42 Public Health 5 2011-10-01 2011-10-01 false Chemistry. 493.929 Section 493.929 Public Health... Proficiency Testing Programs by Specialty and Subspecialty § 493.929 Chemistry. The subspecialties under the specialty of chemistry for which a proficiency testing program may offer proficiency testing are routine...
Developing a Numerical Ability Test for Students of Education in Jordan: An Application of Item Response Theory

ERIC Educational Resources Information Center

Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader

2016-01-01

The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…
Set of Criteria for Efficiency of the Process Forming the Answers to Multiple-Choice Test Items

ERIC Educational Resources Information Center

Rybanov, Alexander Aleksandrovich

2013-01-01

Is offered the set of criteria for assessing efficiency of the process forming the answers to multiple-choice test items. To increase accuracy of computer-assisted testing results, it is suggested to assess dynamics of the process of forming the final answer using the following factors: loss of time factor and correct choice factor. The model…
Three controversies over item disclosure in medical licensure examinations

PubMed Central

Park, Yoon Soo; Yang, Eunbae B.

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
Three controversies over item disclosure in medical licensure examinations.

PubMed

Park, Yoon Soo; Yang, Eunbae B

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Development of a Computerized Adaptive Test for Anxiety Based on the Dutch-Flemish Version of the PROMIS Item Bank.

PubMed

Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; Spinhoven, Philip; de Beurs, Edwin

2017-12-01

We used the Dutch-Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample ( N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch-Flemish version of the PROMIS Anxiety item bank.
Item Performance Across Native Language Groups on the Test of English as a Foreign Language. TOEFL Research Reports, 9.

ERIC Educational Resources Information Center

Alderman, Donald L.; Holland, Paul W.

The Test of English as a Foreign Language (TOEFL) was examined for instances in which the item performance of examinees with comparable scores differed according to their native languages. A chi-square procedure, sensitive to deviations of less than ten percent from the expected frequencies of correct item responses across several language groups,…

Real and Artificial Differential Item Functioning

ERIC Educational Resources Information Center

Andrich, David; Hagquist, Curt

2012-01-01

The literature in modern test theory on procedures for identifying items with differential item functioning (DIF) among two groups of persons includes the Mantel-Haenszel (MH) procedure. Generally, it is not recognized explicitly that if there is real DIF in some items which favor one group, then as an artifact of this procedure, artificial DIF…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

ERIC Educational Resources Information Center

Brutten, Sheila R.; And Others

A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Concreteness effects in short-term memory: a test of the item-order hypothesis.

PubMed

Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald

2011-12-01

The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.
The Combined Effects of Classroom Teaching and Learning Strategy Use on Students' Chemistry Self-Efficacy

NASA Astrophysics Data System (ADS)

Cheung, Derek

2015-02-01

For students to be successful in school chemistry, a strong sense of self-efficacy is essential. Chemistry self-efficacy can be defined as students' beliefs about the extent to which they are capable of performing specific chemistry tasks. According to Bandura (Psychol. Rev. 84:191-215, 1977), students acquire information about their level of self-efficacy from four sources: performance accomplishments, vicarious experiences, verbal persuasion, and physiological states. No published studies have investigated how instructional strategies in chemistry lessons can provide students with positive experiences with these four sources of self-efficacy information and how the instructional strategies promote students' chemistry self-efficacy. In this study, questionnaire items were constructed to measure student perceptions about instructional strategies, termed efficacy-enhancing teaching, which can provide positive experiences with the four sources of self-efficacy information. Structural equation modeling was then applied to test a hypothesized mediation model, positing that efficacy-enhancing teaching positively affects students' chemistry self-efficacy through their use of deep learning strategies such as metacognitive control strategies. A total of 590 chemistry students at nine secondary schools in Hong Kong participated in the survey. The mediation model provided a good fit to the student data. Efficacy-enhancing teaching had a direct effect on students' chemistry self-efficacy. Efficacy-enhancing teaching also directly affected students' use of deep learning strategies, which in turn affected students' chemistry self-efficacy. The implications of these findings for developing secondary school students' chemistry self-efficacy are discussed.
Item Response Modeling with Sum Scores

ERIC Educational Resources Information Center

Johnson, Timothy R.

2013-01-01

One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

ERIC Educational Resources Information Center

Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

2013-01-01

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
A test of the International Personality Item Pool representation of the Revised NEO Personality Inventory and development of a 120-item IPIP-based measure of the five-factor model.

PubMed

Maples, Jessica L; Guan, Li; Carter, Nathan T; Miller, Joshua D

2014-12-01

There has been a substantial increase in the use of personality assessment measures constructed using items from the International Personality Item Pool (IPIP) such as the 300-item IPIP-NEO (Goldberg, 1999), a representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992). The IPIP-NEO is free to use and can be modified to accommodate its users' needs. Despite the substantial interest in this measure, there is still a dearth of data demonstrating its convergence with the NEO PI-R. The present study represents an investigation of the reliability and validity of scores on the IPIP-NEO. Additionally, we used item response theory (IRT) methodology to create a 120-item version of the IPIP-NEO. Using an undergraduate sample (n = 359), we examined the reliability, as well as the convergent and criterion validity, of scores from the 300-item IPIP-NEO, a previously constructed 120-item version of the IPIP-NEO (Johnson, 2011), and the newly created IRT-based IPIP-120 in comparison to the NEO PI-R across a range of outcomes. Scores from all 3 IPIP measures demonstrated strong reliability and convergence with the NEO PI-R and a high degree of similarity with regard to their correlational profiles across the criterion variables (rICC = .983, .972, and .976, respectively). The replicability of these findings was then tested in a community sample (n = 757), and the results closely mirrored the findings from Sample 1. These results provide support for the use of the IPIP-NEO and both 120-item IPIP-NEO measures as assessment tools for measurement of the five-factor model. (c) 2014 APA, all rights reserved.
Testing for DIF in a Model with Single Peaked Item Characteristic Curves: The PARELLA Model.

ERIC Educational Resources Information Center

Hoijtink, Herbert; Molenaar, Ivo W.

1992-01-01

The PARallELogram Analysis (PARELLA) model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. A method is presented for testing for differential item functioning (DIF) for the PARELLA model using the approach of D. Thissen and others (1988). (SLD)
Examination of the PROMIS upper extremity item bank.

PubMed

Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Photoelectron Spectroscopy in Advanced Placement Chemistry

ERIC Educational Resources Information Center

Benigna, James

2014-01-01

Photoelectron spectroscopy (PES) is a new addition to the Advanced Placement (AP) Chemistry curriculum. This article explains the rationale for its inclusion, an overview of how the PES instrument records data, how the data can be analyzed, and how to include PES data in the course. Sample assessment items and analysis are included, as well as…
Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment.

PubMed

Fayers, Peter M

2007-01-01

We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model

ERIC Educational Resources Information Center

Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet

2012-01-01

Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…
Item generation and pilot testing of the Comprehensive Professional Behaviours Development Log.

PubMed

Bartlett, Doreen J; Lucy, S Deborah; Bisbee, Leslie

2006-01-01

The purpose of this project was to generate and refine criteria for professional behaviors previously identified to be important for physical therapy practice and to develop and pilot test a new instrument, which we have called the Comprehensive Professional Behaviours Development Log (CPBDL). Items were generated from our previous work, the work of Warren May and his colleagues, a competency profile for entry-level physical therapists, our regulatory code of ethics, and an evaluation of clinical performance. A group of eight people, including recent graduates, clinical instructors and professional practice leaders, and faculty members, refined the items in two iterations using the Delphi process. The CPBDL contains nine key professional behaviors with a range of nine to 23 specific behavioral criteria for individuals to reflect on and to indicate the consistency of performance from a selection of "not at all," "sometimes," and "always" response options. Pilot testing with a group of 42 students in the final year of our entry-to-practice curriculum indicated that the criteria were clear, the measure was feasible to complete in a reasonable time frame, and there were no ceiling or floor effects. We believe that others, including health care educators and practicing professionals, might be interested in adapting the CPBDL in their own settings to enhance the professional behaviors of either students in preparation for entry to practice or clinicians wishing to demonstrate continuing competency to professional regulatory bodies.
Latent Class Analysis of Differential Item Functioning on the Peabody Picture Vocabulary Test-III

ERIC Educational Resources Information Center

Webb, Mi-young Lee; Cohen, Allan S.; Schwanenflugel, Paula J.

2008-01-01

This study investigated the use of latent class analysis for the detection of differences in item functioning on the Peabody Picture Vocabulary Test-Third Edition (PPVT-III). A two-class solution for a latent class model appeared to be defined in part by ability because Class 1 was lower in ability than Class 2 on both the PPVT-III and the…
Chemistry Notes.

ERIC Educational Resources Information Center

School Science Review, 1983

1983-01-01

Presents background information, laboratory procedures, classroom materials/activities, and chemistry experiments. Topics include sublimation, electronegativity, electrolysis, experimental aspects of strontianite, halide test, evaluation of present and future computer programs in chemistry, formula building, care of glass/saturated calomel…
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Developing energy and momentum conceptual survey (EMCS) with four-tier diagnostic test items

NASA Astrophysics Data System (ADS)

Afif, Nur Faadhilah; Nugraha, Muhammad Gina; Samsudin, Achmad

2017-05-01

Students' conceptions of work and energy are important to support the learning process in the classroom. For that reason, a diagnostic test instrument is needed to diagnose students' conception of work and energy. As a result, the researcher decided to develop Energy and Momentum Conceptual Survey (EMCS) instrument test into four-tier test diagnostic items. The purpose of this research is organized as the first step of four-tier test-formatted EMCS development as one of diagnostic test instruments on work and Energy. The research method used the 4D model (Defining, Designing, Developing and Disseminating). The instrument developed has been tested to 39 students in one of Senior High Schools. The resulting research showed that four-tier test-formatted EMCS is able to diagnose students' conception level of work and energy concept. It can be concluded that the development of four-tier test-formatted EMCS is one of potential diagnostic test instruments that able to obtain the category of students who understand concepts, misconceptions and do not understand about Work and Energy concept at all.
42 CFR 493.839 - Condition: Chemistry.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 5 2010-10-01 2010-10-01 false Condition: Chemistry. 493.839 Section 493.839... These Tests § 493.839 Condition: Chemistry. The specialty of chemistry includes for the purposes of proficiency testing the subspecialties of routine chemistry, endocrinology, and toxicology. ...
42 CFR 493.839 - Condition: Chemistry.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 5 2014-10-01 2014-10-01 false Condition: Chemistry. 493.839 Section 493.839... These Tests § 493.839 Condition: Chemistry. The specialty of chemistry includes for the purposes of proficiency testing the subspecialties of routine chemistry, endocrinology, and toxicology. ...
42 CFR 493.839 - Condition: Chemistry.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 5 2013-10-01 2013-10-01 false Condition: Chemistry. 493.839 Section 493.839... These Tests § 493.839 Condition: Chemistry. The specialty of chemistry includes for the purposes of proficiency testing the subspecialties of routine chemistry, endocrinology, and toxicology. ...

42 CFR 493.839 - Condition: Chemistry.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 42 Public Health 5 2011-10-01 2011-10-01 false Condition: Chemistry. 493.839 Section 493.839... These Tests § 493.839 Condition: Chemistry. The specialty of chemistry includes for the purposes of proficiency testing the subspecialties of routine chemistry, endocrinology, and toxicology. ...
42 CFR 493.839 - Condition: Chemistry.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 5 2012-10-01 2012-10-01 false Condition: Chemistry. 493.839 Section 493.839... These Tests § 493.839 Condition: Chemistry. The specialty of chemistry includes for the purposes of proficiency testing the subspecialties of routine chemistry, endocrinology, and toxicology. ...
Managing a Test Item Bank on a Microcomputer: Can It Help You and Your Students?

ERIC Educational Resources Information Center

Peterson, Julian A.; Meister, Lynn L.

1983-01-01

Describes a test item bank developed by the Association for Medical School Departments of Biochemistry (Texas). Programs (written in Pascal) allow self-evaluation by interactive student access to questions randomly selected from a chosen category. Potential users of the system (having student, manager, and instructor modes) are invited to contact…
Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

ERIC Educational Resources Information Center

van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

2007-01-01

The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…
Use of Item Models in a Large-Scale Admissions Test: A Case Study

ERIC Educational Resources Information Center

Sinharay, Sandip; Johnson, Matthew S.

2008-01-01

"Item models" (LaDuca, Staples, Templeton, & Holzman, 1986) are classes from which it is possible to generate items that are equivalent/isomorphic to other items from the same model (e.g., Bejar, 1996, 2002). They have the potential to produce large numbers of high-quality items at reduced cost. This article introduces data from an…
Evaluating the Wald Test for Item-Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

ERIC Educational Resources Information Center

de la Torre, Jimmy; Lee, Young-Sun

2013-01-01

This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…
The effect of participation in an extended inquiry project on general chemistry student laboratory interactions, confidence, and process skills

NASA Astrophysics Data System (ADS)

Krystyniak, Rebecca A.

2001-12-01

This study explored the effect of participation by second-semester general chemistry students in an extended open-inquiry laboratory investigation on their use of science process skills and confidence in performing specific aspects of laboratory investigations. In addition, verbal interactions of a student lab team among team members and with their instructor over three open-inquiry laboratory sessions and two non-inquiry sessions were investigated. Instruments included the Test of Integrated Skills (TIPS), a 36-item multiple-choice instrument, and the Chemistry Laboratory Survey (CLS), a researcher co-designed 20-item 8-point instrument. Instruments were administered at the beginning and close of the semester to 157 second-semester general chemistry students at the two universities; students at only one university participated in open-inquiry activity. A MANCOVA was performed to investigate relationships among control and experimental students, TIPS, and CLS post-test scores. Covariates were TIPS and CLS pre-test scores and prior high school and college science experience. No significant relationships were found. Wilcoxen analyses indicated both groups showed increase in confidence; experimental-group students with below-average TIPS pre-test scores showed a significant increase in science process skills. Transcribed audio tapes of all laboratory-based verbal interactions were analyzed. Coding categories, developed using the constant comparison method, led to an inter-rater reliability of .96. During open-inquiry activities, the lab team interacted less often, sought less guidance from their instructor, and talked less about chemistry concepts than during non-inquiry activities. Evidence confirmed that students used science process skills and engaged in higher-order thinking during both types of activities. A four-student focus shared their experiences with open-inquiry activities, indicating that they enjoyed the experience, viewed it as worthwhile, and believed
A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Doong, Shing H.

2009-01-01

The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…
Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores with Item Exposure Control and Content Constraints

ERIC Educational Resources Information Center

Yao, Lihua

2014-01-01

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…
An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

PubMed

Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

2016-05-20

Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Consumer Chemistry in the Classroom. Science from the Supermarket.

ERIC Educational Resources Information Center

Sumrall, William J.; Brown, Fred W.

1991-01-01

Activities that show students a practical use for chemistry using common items such as food products, pharmaceuticals, and household products as sources of chemical compounds are presented. The importance of having adequate resource materials available for students is emphasized. (KR)
The Australian Science Item Bank Project

ERIC Educational Resources Information Center

Kings, Clive B.; Cropley, Murray C.

1974-01-01

Describes the development of multiple-choice test item bank for grade ten science by the Australian Council for Educational Research. Other item banks are also being developed at the grade ten level in mathematics and social science. (RH)
Developing Items to Measure Theory of Planned Behavior Constructs for Opioid Administration for Children: Pilot Testing.

PubMed

Vincent, Catherine; Riley, Barth B; Wilkie, Diana J

2015-12-01

The Theory of Planned Behavior (TpB) is useful to direct nursing research aimed at behavior change. As proposed in the TpB, individuals' attitudes, perceived norms, and perceived behavior control predict their intentions to perform a behavior and subsequently predict their actual performance of the behavior. Our purpose was to apply Fishbein and Ajzen's guidelines to begin development of a valid and reliable instrument for pediatric nurses' attitudes, perceived norms, perceived behavior control, and intentions to administer PRN opioid analgesics when hospitalized children self-report moderate to severe pain. Following Fishbein and Ajzen's directions, we were able to define the behavior of interest and specify the research population, formulate items for direct measures, elicit salient beliefs shared by our target population and formulate items for indirect measures, and prepare and test our questionnaire. For the pilot testing of internal consistency of measurement items, Cronbach alphas were between 0.60 and 0.90 for all constructs. Test-retest reliability correlations ranged from 0.63 to 0.90. Following Fishbein and Ajzen's guidelines was a feasible and organized approach for instrument development. In these early stages, we demonstrated good reliability for most subscales, showing promise for the instrument and its use in pain management research. Better understanding of the TpB constructs will facilitate the development of interventions targeted toward nurses' attitudes, perceived norms, and/or perceived behavior control to ultimately improve their pain behaviors toward reducing pain for vulnerable children. Copyright © 2015 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Item Review and the Rearrangement Procedure: Its Process and Its Results

ERIC Educational Resources Information Center

Papanastasiou, Elena C.

2005-01-01

Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…
Severity of Organized Item Theft in Computerized Adaptive Testing: An Empirical Study. Research Report. ETS RR-06-22

ERIC Educational Resources Information Center

Yi, Qing; Zhang, Jinming; Chang, Hua-Hua

2006-01-01

Chang and Zhang (2002, 2003) proposed several baseline criteria for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria were obtained from theoretical derivations that assumed uniformly randomized item selection. The current study investigated potential damage caused…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

2014-01-01

The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

ERIC Educational Resources Information Center

Jackson, Evelyn W.; And Others

1994-01-01

Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
Primary care requests for anaemia chemistry tests in Spain: potential iron, transferrin and folate over-requesting.

PubMed

Salinas, Maria; López-Garrigós, Maite; Flores, Emilio; Leiva-Salinas, Carlos

2017-09-01

To study the regional variability of requests for anaemia chemistry tests in primary care in Spain and the associated economic costs of potential over-requesting. Requests for anaemia tests were examined in a cross-sectional study. Clinical laboratories from different autonomous communities (AACCs) were invited to report on primary care anaemia chemistry tests requested during 2014. Demand for iron, ferritin, vitamin B12 and folate tests per 1000 inhabitants and the ratios of the folate/vitamin B12 and transferrin/ferritin requests were compared between AACCs. We also calculated reagent costs and the number of iron, transferrin and folate tests and the economic saving if every AACC had obtained the results achieved by the AACC with best practice. 110 laboratories participated (59.8% of the Spanish population). More than 12 million tests were requested, resulting in reagent costs exceeding €16.5 million. The serum iron test was the most often requested, and the ferritin test was the most costly (over €7 million). Close to €4.5 million could potentially have been saved if iron, transferrin and folate had been appropriately requested (€6 million when extrapolated to the whole Spanish population). The demand for and expenditure on anaemia chemistry tests in primary care in Spain is high, with significant regional differences between different AACCs. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items.

PubMed

Sinharay, Sandip; Jensen, Jens Ledet

2018-06-27

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.
Threats to Validity When Using Open-Ended Items in International Achievement Studies: Coding Responses to the PISA 2012 Problem-Solving Test in Finland

ERIC Educational Resources Information Center

Arffman, Inga

2016-01-01

Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.