multiple-choice test items: Topics by Science.gov

Sample records for multiple-choice test items

The Effects of Clinically Relevant Multiple-Choice Items on the Statistical Discrimination of Physician Clinical Competence.

ERIC Educational Resources Information Center

Downing, Steven M.; Maatsch, Jack L.

To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…
Samejima Items in Multiple-Choice Tests: Identification and Implications

ERIC Educational Resources Information Center

Rahman, Nazia

2013-01-01

Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

ERIC Educational Resources Information Center

Hamadneh, Iyad Mohammed

2015-01-01

This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
Quality Multiple-Choice Test Questions: Item-Writing Guidelines and an Analysis of Auditing Testbanks.

ERIC Educational Resources Information Center

Hansen, James D.; Dexter, Lee

1997-01-01

Analysis of test item banks in 10 auditing textbooks found that 75% of questions violated one or more guidelines for multiple-choice items. In comparison, 70% of a certified public accounting exam bank had no violations. (SK)
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000

ERIC Educational Resources Information Center

von Schrader, Sarah; Ansley, Timothy

2006-01-01

Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
Developing multiple-choices test items as tools for measuring the scientific-generic skills on solar system

NASA Astrophysics Data System (ADS)

Bhakti, Satria Seto; Samsudin, Achmad; Chandra, Didi Teguh; Siahaan, Parsaoran

2017-05-01

The aim of research is developing multiple-choices test items as tools for measuring the scientific of generic skills on solar system. To achieve the aim that the researchers used the ADDIE model consisting Of: Analyzing, Design, Development, Implementation, dan Evaluation, all of this as a method research. While The scientific of generic skills limited research to five indicator including: (1) indirect observation, (2) awareness of the scale, (3) inference logic, (4) a causal relation, and (5) mathematical modeling. The participants are 32 students at one of junior high schools in Bandung. The result shown that multiple-choices that are constructed test items have been declared valid by the expert validator, and after the tests show that the matter of developing multiple-choices test items be able to measuring the scientific of generic skills on solar system.
Measuring more than we know? An examination of the motivational and situational influences in science achievement

NASA Astrophysics Data System (ADS)

Haydel, Angela Michelle

The purpose of this dissertation was to advance theoretical understanding about fit between the personal resources of individuals and the characteristics of science achievement tasks. Testing continues to be pervasive in schools, yet we know little about how students perceive tests and what they think and feel while they are actually working on test items. This study focused on both the personal (cognitive and motivational) and situational factors that may contribute to individual differences in achievement-related outcomes. 387 eighth grade students first completed a survey including measures of science achievement goals, capability beliefs, efficacy related to multiple-choice items and performance assessments, validity beliefs about multiple-choice items and performance assessments, and other perceptions of these item formats. Students then completed science achievement tests including multiple-choice items and two performance assessments. A sample of students was asked to verbalize both thoughts and feelings as they worked through the test items. These think-alouds were transcribed and coded for evidence of cognitive, metacognitive and motivational engagement. Following each test, all students completed measures of effort, mood, energy level and strategy use during testing. Students reported that performance assessments were more challenging, authentic, interesting and valid than multiple-choice tests. They also believed that comparisons between students were easier using multiple-choice items. Overall, students tried harder, felt better, had higher levels of energy and used more strategies while working on performance assessments. Findings suggested that performance assessments might be more congruent with a mastery achievement goal orientation, while multiple-choice tests might be more congruent with a performance achievement goal orientation. A variable-centered analytic approach including regression analyses provided information about how students, on average, who differed in terms of their teachers' ratings of their science ability, achievement goals, capability beliefs and experiences with science achievement tasks perceived, engaged in, and performed on multiple-choice items and performance assessments. Person-centered analyses provided information about the perceptions, engagement and performance of subgroups of individuals who had different motivational characteristics. Generally, students' personal goals and capability beliefs related more strongly to test perceptions, but not performance, while teacher ratings of ability and test-specific beliefs related to performance.
Difficulty and Discriminability of Introductory Psychology Test Items.

ERIC Educational Resources Information Center

Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis

2001-01-01

Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
A Diagnostic Study of Pre-Service Teachers' Competency in Multiple-Choice Item Development

ERIC Educational Resources Information Center

Asim, Alice E.; Ekuri, Emmanuel E.; Eni, Eni I.

2013-01-01

Large class size is an issue in testing at all levels of Education. As a panacea to this, multiple choice test formats has become very popular. This case study was designed to diagnose pre-service teachers' competency in constructing questions (IQT); direct questions (DQT); and best answer (BAT) varieties of multiple choice items. Subjects were 88…
Demand Characteristics of Multiple-Choice Items.

ERIC Educational Resources Information Center

Diamond, James J.; Williams, David V.

Thirteen graduate students were asked to indicate for each of 24 multiple-choice items whether the item tested "recall of specific information," a "higher order skill," or "don't know." The students were also asked to state their general basis for judging the items. The 24 items had been previously classified according to Bloom's cognitive-skills…
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

ERIC Educational Resources Information Center

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
The Effect of the Multiple-Choice Item Format on the Measurement of Knowledge of Language Structure

ERIC Educational Resources Information Center

Currie, Michael; Chiramanee, Thanyapa

2010-01-01

Noting the widespread use of multiple-choice items in tests in English language education in Thailand, this study compared their effect against that of constructed-response items. One hundred and fifty-two university undergraduates took a test of English structure first in constructed-response format, and later in three, stem-equivalent…
Cognitive Diagnostic Models for Tests with Multiple-Choice and Constructed-Response Items

ERIC Educational Resources Information Center

Kuo, Bor-Chen; Chen, Chun-Hua; Yang, Chih-Wei; Mok, Magdalena Mo Ching

2016-01-01

Traditionally, teachers evaluate students' abilities via their total test scores. Recently, cognitive diagnostic models (CDMs) have begun to provide information about the presence or absence of students' skills or misconceptions. Nevertheless, CDMs are typically applied to tests with multiple-choice (MC) items, which provide less diagnostic…
Set of Criteria for Efficiency of the Process Forming the Answers to Multiple-Choice Test Items

ERIC Educational Resources Information Center

Rybanov, Alexander Aleksandrovich

2013-01-01

Is offered the set of criteria for assessing efficiency of the process forming the answers to multiple-choice test items. To increase accuracy of computer-assisted testing results, it is suggested to assess dynamics of the process of forming the final answer using the following factors: loss of time factor and correct choice factor. The model…
Developing Multiple Choice Tests: Tips & Techniques

ERIC Educational Resources Information Center

McCowan, Richard J.

1999-01-01

Item writing is a major responsibility of trainers. Too often, qualified staff who prepare lessons carefully and teach conscientiously use inadequate tests that do not validly reflect the true level of trainee achievement. This monograph describes techniques for constructing multiple-choice items that measure student performance accurately. It…
The Testing Methods and Gender Differences in Multiple-Choice Assessment

NASA Astrophysics Data System (ADS)

Ng, Annie W. Y.; Chan, Alan H. S.

2009-10-01

This paper provides a comprehensive review of the multiple-choice assessment in the past two decades for facilitating people to conduct effective testing in various subject areas. It was revealed that a variety of multiple-choice test methods viz. conventional multiple-choice, liberal multiple-choice, elimination testing, confidence marking, probability testing, and order-of-preference scheme are available for use in assessing subjects' knowledge and decision ability. However, the best multiple-choice test method for use has not yet been identified. The review also indicated that the existence of gender differences in multiple-choice task performance might be due to the test area, instruction/scoring condition, and item difficulty.
Measures of Partial Knowledge and Unexpected Responses in Multiple-Choice Tests

ERIC Educational Resources Information Center

Chang, Shao-Hua; Lin, Pei-Chun; Lin, Zih-Chuan

2007-01-01

This study investigates differences in the partial scoring performance of examinees in elimination testing and conventional dichotomous scoring of multiple-choice tests implemented on a computer-based system. Elimination testing that uses the same set of multiple-choice items rewards examinees with partial knowledge over those who are simply…
The Effects of Item Preview on Video-Based Multiple-Choice Listening Assessments

ERIC Educational Resources Information Center

Koyama, Dennis; Sun, Angela; Ockey, Gary J.

2016-01-01

Multiple-choice formats remain a popular design for assessing listening comprehension, yet no consensus has been reached on how multiple-choice formats should be employed. Some researchers argue that test takers must be provided with a preview of the items prior to the input (Buck, 1995; Sherman, 1997); others argue that a preview may decrease the…
Polytomous versus Dichotomous Scoring on Multiple-Choice Examinations: Development of a Rubric for Rating Partial Credit

ERIC Educational Resources Information Center

Grunert, Megan L.; Raker, Jeffrey R.; Murphy, Kristen L.; Holme, Thomas A.

2013-01-01

The concept of assigning partial credit on multiple-choice test items is considered for items from ACS Exams. Because the items on these exams, particularly the quantitative items, use common student errors to define incorrect answers, it is possible to assign partial credits to some of these incorrect responses. To do so, however, it becomes…

Application of a Multidimensional Nested Logit Model to Multiple-Choice Test Items

ERIC Educational Resources Information Center

Bolt, Daniel M.; Wollack, James A.; Suh, Youngsuk

2012-01-01

Nested logit models have been presented as an alternative to multinomial logistic models for multiple-choice test items (Suh and Bolt in "Psychometrika" 75:454-473, 2010) and possess a mathematical structure that naturally lends itself to evaluating the incremental information provided by attending to distractor selection in scoring. One potential…
English 30, Part B: Reading. Questions Booklet. Grade 12 Diploma Examination, January 1997.

ERIC Educational Resources Information Center

Alberta Dept. of Education, Edmonton. Student Evaluation Branch.

Intended for students taking the Grade 12 Diploma Examinations in English 30, this "questions booklet" presents 70 multiple choice test items based on 8 reading selections in the accompanying readings booklet. After instructions for students, the booklet presents the multiple choice items which test students' comprehension of the poetry,…
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.

ERIC Educational Resources Information Center

Martinez, Michael E.; And Others

Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
Does the Position of Response Options in Multiple-Choice Tests Matter?

ERIC Educational Resources Information Center

Hohensinn, Christine; Baghaei, Purya

2017-01-01

In large scale multiple-choice (MC) tests alternate forms of a test may be developed to prevent cheating by changing the order of items or by changing the position of the response options. The assumption is that since the content of the test forms are the same the order of items or the positions of the response options do not have any effect on…
Psychometrics of Multiple Choice Questions with Non-Functioning Distracters: Implications to Medical Education.

PubMed

Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah

2015-01-01

The functionality of distracters in a multiple choice question plays a very important role. We examined the frequency and impact of functioning and non-functioning distracters on psychometric properties of 5-option items in clinical disciplines. We analyzed item statistics of 1115 multiple choice questions from 15 summative assessments of undergraduate medical students and classified the items into five groups by their number of non-functioning distracters. We analyzed the effect of varying degree of non-functionality ranging from 0 to 4, on test reliability, difficulty index, discrimination index and point biserial correlation. The non-functionality of distracters inversely affected the test reliability and quality of items in a predictable manner. The non-functioning distracters made the items easier and lowered the discrimination index significantly. Three non-functional distracters in a 5-option MCQ significantly affected all psychometric properties (p < 0.5). The corrected point biserial correlation revealed that the items with 3 functional options were psychometrically as effective as 5-option items. Our study reveals that a multiple choice question with 3 functional options provides lower most limit of item format that has adequate psychometric property. The test containing items with less number of functioning options have significantly lower reliability. The distracter function analysis and revision of nonfunctioning distracters can serve as important methods to improve the psychometrics and reliability of assessment.
Models for Scoring Missing Responses to Multiple-Choice Items. Program Statistics Research Technical Report No. 94-1.

ERIC Educational Resources Information Center

Longford, Nicholas T.

This study is a critical evaluation of the roles for coding and scoring of missing responses to multiple-choice items in educational tests. The focus is on tests in which the test-takers have little or no motivation; in such tests omitting and not reaching (as classified by the currently adopted operational rules) is quite frequent. Data from the…
An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items

ERIC Educational Resources Information Center

Suh, Youngsuk; Talley, Anna E.

2015-01-01

This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
Reducing the Need for Guesswork in Multiple-Choice Tests

ERIC Educational Resources Information Center

Bush, Martin

2015-01-01

The humble multiple-choice test is very widely used within education at all levels, but its susceptibility to guesswork makes it a suboptimal assessment tool. The reliability of a multiple-choice test is partly governed by the number of items it contains; however, longer tests are more time consuming to take, and for some subject areas, it can be…
Assessment of item-writing flaws in multiple-choice questions.

PubMed

Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

2013-01-01

This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
The Effect of SSM Grading on Reliability When Residual Items Have No Discriminating Power.

ERIC Educational Resources Information Center

Kane, Michael T.; Moloney, James M.

Gilman and Ferry have shown that when the student's score on a multiple choice test is the total number of responses necessary to get all items correct, substantial increases in reliability can occur. In contrast, similar procedures giving partial credit on multiple choice items have resulted in relatively small gains in reliability. The analysis…
The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

ERIC Educational Resources Information Center

Domyancich, John M.

2014-01-01

Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…
On the Equivalence of Constructed-Response and Multiple-Choice Tests.

ERIC Educational Resources Information Center

Traub, Ross E.; Fisher, Charles W.

Two sets of mathematical reasoning and two sets of verbal comprehension items were cast into each of three formats--constructed response, standard multiple-choice, and Coombs multiple-choice--in order to assess whether tests with indentical content but different formats measure the same attribute, except for possible differences in error variance…
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms

ERIC Educational Resources Information Center

Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.

2011-01-01

To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
The memorial consequences of multiple-choice testing.

PubMed

Marsh, Elizabeth J; Roediger, Henry L; Bjork, Robert A; Bjork, Elizabeth L

2007-04-01

The present article addresses whether multiple-choice tests may change knowledge even as they attempt to measure it. Overall, taking a multiple-choice test boosts performance on later tests, as compared with non-tested control conditions. This benefit is not limited to simple definitional questions, but holds true for SAT II questions and for items designed to tap concepts at a higher level in Bloom's (1956) taxonomy of educational objectives. Students, however, can also learn false facts from multiple-choice tests; testing leads to persistence of some multiple-choice lures on later general knowledge tests. Such persistence appears due to faulty reasoning rather than to an increase in the familiarity of lures. Even though students may learn false facts from multiple-choice tests, the positive effects of testing outweigh this cost.
Does Linking Mixed-Format Tests Using a Multiple-Choice Anchor Produce Comparable Results for Male and Female Subgroups? Research Report. ETS RR-11-44

ERIC Educational Resources Information Center

Kim, Sooyeon; Walker, Michael E.

2011-01-01

This study examines the use of subpopulation invariance indices to evaluate the appropriateness of using a multiple-choice (MC) item anchor in mixed-format tests, which include both MC and constructed-response (CR) items. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using an MC-only anchor set for 4…
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

2016-01-01

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Cognitive Validity: Can Multiple-Choice Items Tap Historical Thinking Processes?

ERIC Educational Resources Information Center

Smith, Mark D.

2017-01-01

Cognitive validity examines the relationship between what an assessment aims to measure and what it actually elicits from test takers. The present study examined whether multiple-choice items from the National Assessment of Educational Progress (NAEP) grade 12 U.S. history exam elicited the historical thinking processes they were designed to…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

PubMed Central

2013-01-01

Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056

Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

PubMed

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M

2013-03-04

Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Item Analysis in Introductory Economics Testing.

ERIC Educational Resources Information Center

Tinari, Frank D.

1979-01-01

Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
The Effect of Images on Item Statistics in Multiple Choice Anatomy Examinations

ERIC Educational Resources Information Center

Notebaert, Andrew J.

2017-01-01

Although multiple choice examinations are often used to test anatomical knowledge, these often forgo the use of images in favor of text-based questions and answers. Because anatomy is reliant on visual resources, examinations using images should be used when appropriate. This study was a retrospective analysis of examination items that were text…
Development of multiple choice pictorial test for measuring the dimensions of knowledge

NASA Astrophysics Data System (ADS)

Nahadi, Siswaningsih, Wiwi; Erna

2017-05-01

This study aims to develop a multiple choice pictorial test as a tool to measure dimension of knowledge in chemical equilibrium subject. The method used is Research and Development and validation that was conducted in the preliminary studies and model development. The product is multiple choice pictorial test. The test was developed by 22 items and tested to 64 high school students in XII grade. The quality of test was determined by value of validity, reliability, difficulty index, discrimination power, and distractor effectiveness. The validity of test was determined by CVR calculation using 8 validators (4 university teachers and 4 high school teachers) with average CVR value 0,89. The reliability of test has very high category with value 0,87. Discrimination power of items with a very good category is 32%, 59% as good category, and 20% as sufficient category. This test has a varying level of difficulty, item with difficult category is 23%, the medium category is 50%, and the easy category is 27%. The distractor effectiveness of items with a very poor category is 1%, poor category is 1%, medium category is 4%, good category is 39%, and very good category is 55%. The dimension of knowledge that was measured consist of factual knowledge, conceptual knowledge, and procedural knowledge. Based on the questionnaire, students responded quite well to the developed test and most of the students like this kind of multiple choice pictorial test that include picture as evaluation tool compared to the naration tests was dominated by text.
To Show or Not to Show: The Effects of Item Stems and Answer Options on Performance on a Multiple-Choice Listening Comprehension Test

ERIC Educational Resources Information Center

Yanagawa, Kozo; Green, Anthony

2008-01-01

The purpose of this study is to examine whether the choice between three multiple-choice listening comprehension test formats results in any difference in listening comprehension test performance. The three formats entail (a) allowing test takers to preview both the question stem and answer options prior to listening; (b) allowing test takers to…
Validation and Structural Analysis of the Kinematics Concept Test

ERIC Educational Resources Information Center

Lichtenberger, A.; Wagner, C.; Hofer, S. I.; Stem, E.; Vaterlaus, A.

2017-01-01

The kinematics concept test (KCT) is a multiple-choice test designed to evaluate students' conceptual understanding of kinematics at the high school level. The test comprises 49 multiple-choice items about velocity and acceleration, which are based on seven kinematic concepts and which make use of three different representations. In the first part…
Multiple-Choice Test Bias Due to Answering Strategy Variation.

ERIC Educational Resources Information Center

Frary, Robert B.; Giles, Mary B.

This paper describes the development and investigation of a new approach to determining the existence of bias in multiple-choice test scores. Previous work in this area has concentrated almost exclusively on bias attributable to specific test items or to differences in test score distributions across racial or ethnic groups. In contrast, the…
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Electronics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…
V-TECS Criterion-Referenced Test Item Bank for Radiologic Technology Occupations.

ERIC Educational Resources Information Center

Reneau, Fred; And Others

This Vocational-Technical Education Consortium of States (V-TECS) criterion-referenced test item bank provides 696 multiple-choice items and 33 matching items for radiologic technology occupations. These job titles are included: radiologic technologist, chief; radiologic technologist; nuclear medicine technologist; radiation therapy technologist;…
Evaluation of five guidelines for option development in multiple-choice item-writing.

PubMed

Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

2009-05-01

This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Some Effects of Changes in Question Structure and Sequence on Performance in a Multiple Choice Chemistry Test.

ERIC Educational Resources Information Center

Hodson, D.

1984-01-01

Investigated the effect on student performance of changes in question structure and sequence on a GCE 0-level multiple-choice chemistry test. One finding noted is that there was virtually no change in test reliability on reducing the number of options (from five to per test item). (JN)
A Model-Based Method for Content Validation of Automatically Generated Test Items

ERIC Educational Resources Information Center

Zhang, Xinxin; Gierl, Mark

2016-01-01

The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Preliminary Findings on the Computer-Administered Multiple-Choice Online Causal Comprehension Assessment, a Diagnostic Reading Comprehension Test

ERIC Educational Resources Information Center

Davison, Mark L.; Biancarosa, Gina; Carlson, Sarah E.; Seipel, Ben; Liu, Bowen

2018-01-01

The computer-administered Multiple-Choice Online Causal Comprehension Assessment (MOCCA) for Grades 3 to 5 has an innovative, 40-item multiple-choice structure in which each distractor corresponds to a comprehension process upon which poor comprehenders have been shown to rely. This structure requires revised thinking about measurement issues…
Are Learning Disabled Students "Test-Wise?": An Inquiry into Reading Comprehension Test Items.

ERIC Educational Resources Information Center

Scruggs, Thomas E.; Lifson, Steve

The ability to correctly answer reading comprehension test items, without having read the accompanying reading passage, was compared for third grade learning disabled students and their peers from a regular classroom. In the first experiment, fourteen multiple choice items were selected from the Stanford Achievement Test. No reading passages were…
The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments.

PubMed

Tarrant, Marie; Knierim, Aimee; Hayes, Sasha K; Ware, James

2006-12-01

Multiple-choice questions are a common assessment method in nursing examinations. Few nurse educators, however, have formal preparation in constructing multiple-choice questions. Consequently, questions used in baccalaureate nursing assessments often contain item-writing flaws, or violations to accepted item-writing guidelines. In one nursing department, 2770 MCQs were collected from tests and examinations administered over a five-year period from 2001 to 2005. Questions were evaluated for 19 frequently occurring item-writing flaws, for cognitive level, for question source, and for the distribution of correct answers. Results show that almost half (46.2%) of the questions contained violations of item-writing guidelines and over 90% were written at low cognitive levels. Only a small proportion of questions were teacher generated (14.1%), while 36.2% were taken from testbanks and almost half (49.4%) had no source identified. MCQs written at a lower cognitive level were significantly more likely to contain item-writing flaws. While there was no relationship between the source of the question and item-writing flaws, teacher-generated questions were more likely to be written at higher cognitive levels (p<0.001). Correct answers were evenly distributed across all four options and no bias was noted in the placement of correct options. Further training in item-writing is recommended for all faculty members who are responsible for developing tests. Pre-test review and quality assessment is also recommended to reduce the occurrence of item-writing flaws and to improve the quality of test questions.
Exploring problem solving strategies on multiple-choice science items: Comparing native Spanish-speaking English Language Learners and mainstream monolinguals

NASA Astrophysics Data System (ADS)

Kachchaf, Rachel Rae

The purpose of this study was to compare how English language learners (ELLs) and monolingual English speakers solved multiple-choice items administered with and without a new form of testing accommodation---vignette illustration (VI). By incorporating theories from second language acquisition, bilingualism, and sociolinguistics, this study was able to gain more accurate and comprehensive input into the ways students interacted with items. This mixed methods study used verbal protocols to elicit the thinking processes of thirty-six native Spanish-speaking English language learners (ELLs), and 36 native-English speaking non-ELLs when solving multiple-choice science items. Results from both qualitative and quantitative analyses show that ELLs used a wider variety of actions oriented to making sense of the items than non-ELLs. In contrast, non-ELLs used more problem solving strategies than ELLs. There were no statistically significant differences in student performance based on the interaction of presence of illustration and linguistic status or the main effect of presence of illustration. However, there were significant differences based on the main effect of linguistic status. An interaction between the characteristics of the students, the items, and the illustrations indicates considerable heterogeneity in the ways in which students from both linguistic groups think about and respond to science test items. The results of this study speak to the need for more research involving ELLs in the process of test development to create test items that do not require ELLs to carry out significantly more actions to make sense of the item than monolingual students.
Multiple-Choice versus Constructed-Response Tests in the Assessment of Mathematics Computation Skills.

ERIC Educational Resources Information Center

Gadalla, Tahany M.

The equivalence of multiple-choice (MC) and constructed response (discrete) (CR-D) response formats as applied to mathematics computation at grade levels two to six was tested. The difference between total scores from the two response formats was tested for statistical significance, and the factor structure of items in both response formats was…
Developing Achievement Test: A Research for Assessment of 5th Grade Biology Subject

ERIC Educational Resources Information Center

Sener, Nilay; Tas, Erol

2017-01-01

The purpose of this study is to prepare a multiple-choice achievement test with high reliability and validity for the "Let's Solve the Puzzle of Our Body" unit. For this purpose, a multiple choice achievement test consisting of 46 items was applied to 178 fifth grade students in total. As a result of the test and material analysis…
Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models

ERIC Educational Resources Information Center

Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.

2010-01-01

Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Test Design Project: Studies in Test Bias. Annual Report.

ERIC Educational Resources Information Center

McArthur, David

Item bias in a multiple-choice test can be detected by appropriate analyses of the persons x items scoring matrix. This permits comparison of groups of examinees tested with the same instrument. The test may be biased if it is not measuring the same thing in comparable groups, if groups are responding to different aspects of the test items, or if…
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test

ERIC Educational Resources Information Center

Kahraman, Nilüfer

2014-01-01

Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Accommodations for Multiple Choice Tests

ERIC Educational Resources Information Center

Trammell, Jack

2011-01-01

Students with learning or learning-related disabilities frequently struggle with multiple choice assessments due to difficulty discriminating between items, filtering out distracters, and framing a mental best answer. This Practice Brief suggests accommodations and strategies that disability service providers can utilize in conjunction with…
Effect of response format on cognitive reflection: Validating a two- and four-option multiple choice question version of the Cognitive Reflection Test.

PubMed

Sirota, Miroslav; Juanchich, Marie

2018-03-27

The Cognitive Reflection Test, measuring intuition inhibition and cognitive reflection, has become extremely popular because it reliably predicts reasoning performance, decision-making, and beliefs. Across studies, the response format of CRT items sometimes differs, based on the assumed construct equivalence of tests with open-ended versus multiple-choice items (the equivalence hypothesis). Evidence and theoretical reasons, however, suggest that the cognitive processes measured by these response formats and their associated performances might differ (the nonequivalence hypothesis). We tested the two hypotheses experimentally by assessing the performance in tests with different response formats and by comparing their predictive and construct validity. In a between-subjects experiment (n = 452), participants answered stem-equivalent CRT items in an open-ended, a two-option, or a four-option response format and then completed tasks on belief bias, denominator neglect, and paranormal beliefs (benchmark indicators of predictive validity), as well as on actively open-minded thinking and numeracy (benchmark indicators of construct validity). We found no significant differences between the three response formats in the numbers of correct responses, the numbers of intuitive responses (with the exception of the two-option version, which had a higher number than the other tests), and the correlational patterns of the indicators of predictive and construct validity. All three test versions were similarly reliable, but the multiple-choice formats were completed more quickly. We speculate that the specific nature of the CRT items helps build construct equivalence among the different response formats. We recommend using the validated multiple-choice version of the CRT presented here, particularly the four-option CRT, for practical and methodological reasons. Supplementary materials and data are available at https://osf.io/mzhyc/ .
Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

ERIC Educational Resources Information Center

Papenberg, Martin; Musch, Jochen

2017-01-01

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
The Australian Science Item Bank Project

ERIC Educational Resources Information Center

Kings, Clive B.; Cropley, Murray C.

1974-01-01

Describes the development of multiple-choice test item bank for grade ten science by the Australian Council for Educational Research. Other item banks are also being developed at the grade ten level in mathematics and social science. (RH)
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
The "None of the Above" Option in Multiple-Choice Testing: An Experimental Study

ERIC Educational Resources Information Center

DiBattista, David; Sinnige-Egger, Jo-Anne; Fortuna, Glenda

2014-01-01

The authors assessed the effects of using "none of the above" as an option in a 40-item, general-knowledge multiple-choice test administered to undergraduate students. Examinees who selected "none of the above" were given an incentive to write the correct answer to the question posed. Using "none of the above" as the…
Using Tests as Learning Opportunities.

ERIC Educational Resources Information Center

Foos, Paul W.; Fisher, Ronald P.

1988-01-01

A study involving 105 undergraduates assessed the value of testing as a means of increasing, rather than simply monitoring, learning. Results indicate that fill-in-the-blank and items requiring student inferences were more effective, respectively, than multiple-choice tests and verbatim items in furthering student learning. (TJH)

Meatcutting Testbook, Part 2.

ERIC Educational Resources Information Center

California State Dept. of Education, Sacramento. Bureau of Publications.

This document contains objective tests for each topic in the Meatcutting Workbook, Part 2, which is designed for apprenticeship meatcutting programs in California. Each of the 30 tests consists of from 5 to 65 multiple-choice items with most tests containing approximately 10 items. The tests are grouped according to the eight units of the…
Handbook for Driving Knowledge Testing.

ERIC Educational Resources Information Center

Pollock, William T.; McDole, Thomas L.

Materials intended for driving knowledge test development for use by operational licensing and education agencies are presented. A pool of 1,313 multiple choice test items is included, consisting of sets of specially developed and tested items covering principles of safe driving, legal regulations, and traffic control device knowledge pertinent to…
Examining Two Strategies to Link Mixed-Format Tests Using Multiple-Choice Anchors. Research Report. ETS RR-10-18

ERIC Educational Resources Information Center

Walker, Michael E.; Kim, Sooyeon

2010-01-01

This study examined the use of an all multiple-choice (MC) anchor for linking mixed format tests containing both MC and constructed-response (CR) items, in a nonequivalent groups design. An MC-only anchor could effectively link two such test forms if either (a) the MC and CR portions of the test measured the same construct, so that the MC anchor…
Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

NASA Astrophysics Data System (ADS)

Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

2014-02-01

The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.
Effects of Repeated Testing on Short- and Long-Term Memory Performance across Different Test Formats

ERIC Educational Resources Information Center

Stenlund, Tova; Sundström, Anna; Jonsson, Bert

2016-01-01

This study examined whether practice testing with short-answer (SA) items benefits learning over time compared to practice testing with multiple-choice (MC) items, and rereading the material. More specifically, the aim was to test the hypotheses of "retrieval effort" and "transfer appropriate processing" by comparing retention…
Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

ERIC Educational Resources Information Center

Kleinke, David J.

Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

ERIC Educational Resources Information Center

DeMars, Christine

Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
High time for a change: psychometric analysis of multiple-choice questions in nursing.

PubMed

Redmond, Sandra P; Hartigan-Rogers, Jackie A; Cobbett, Shelley

2012-11-26

Nurse educators teach students to develop an informed nursing practice but can educators claim the same grounding in the available evidence when formulating multiple-choice assessment tools to evaluate student learning? Multiple-choice questions are a popular assessment format within nursing education. While widely accepted as a credible format to assess student knowledge across disciplines, debate exists among educators regarding the number of options necessary to adequately test cognitive reasoning and optimal discrimination between student abilities. The purpose of this quasi-experimental between groups study was to examine the psychometric properties of three option multiple-choice questions when compared to the more traditional four option questions. Data analysis revealed that there were no statistically significant differences in the item discrimination, difficulty or the mean examination scores when multiple-choice test questions were administered with three versus four option answer choices. This study provides additional guidance for nurse educators to assist in improving multiple-choice question writing and test design.
Multiple choice questions can be designed or revised to challenge learners' critical thinking.

PubMed

Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A

2013-12-01

Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
Nested Logit Models for Multiple-Choice Item Response Data

ERIC Educational Resources Information Center

Suh, Youngsuk; Bolt, Daniel M.

2010-01-01

Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all…
Food Service Supervisor. Dietetic Support Personnel Achievement Test.

ERIC Educational Resources Information Center

Oklahoma State Dept. of Vocational and Technical Education, Stillwater.

This guide contains a series of multiple-choice items and guidelines to assist instructors in composing criterion-referenced tests for use in the food service supervisor component of Oklahoma's Dietetic Support Personnel training program. Test items addressing each of the following occupational duty areas are provided: human relations; nutrient…
Food Production Worker. Dietetic Support Personnel Achievement Test.

ERIC Educational Resources Information Center

Oklahoma State Dept. of Vocational and Technical Education, Stillwater.

This guide contains a series of multiple-choice items and guidelines to assist instructors in composing criterion-referenced tests for use in the food production worker component of Oklahoma's Dietetic Support Personnel training program. Test items addressing each of the following occupational duty areas are provided: human relations; hygiene and…
Food Service Worker. Dietetic Support Personnel Achievement Test.

ERIC Educational Resources Information Center

Oklahoma State Dept. of Vocational and Technical Education, Stillwater.

This guide contains a series of multiple-choice items and guidelines to assist instructors in composing criterion-referenced tests for use in the food service worker component of Oklahoma's Dietetic Support Personnel training program. Test items addressing each of the following occupational duty areas are provided: human relations; personal…
The Performance of IRT Model Selection Methods with Mixed-Format Tests

ERIC Educational Resources Information Center

Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G.

2012-01-01

When tests consist of multiple-choice and constructed-response items, researchers are confronted with the question of which item response theory (IRT) model combination will appropriately represent the data collected from these mixed-format tests. This simulation study examined the performance of six model selection criteria, including the…
Testing to the Top: Everything But the Kitchen Sink?

ERIC Educational Resources Information Center

Dietel, Ron

2011-01-01

Two tests intended to measure student achievement of the Common Core State Standards will face intense scrutiny, but the test makers say they will include performance assessments and other items that are not multiple-choice questions. Incorporating performance items on this tests will bring up issues over scoring, costs, and validity.
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
The Relationship of Item-Level Response Times with Test-Taker and Item Variables in an Operational CAT Environment. LSAC Research Report Series.

ERIC Educational Resources Information Center

Swygert, Kimberly A.

In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
High School Students' Concepts of Acids and Bases.

ERIC Educational Resources Information Center

Ross, Bertram H. B.

An investigation of Ontario high school students' understanding of acids and bases with quantitative and qualitative methods revealed misconceptions. A concept map, based on the objectives of the Chemistry Curriculum Guideline, generated multiple-choice items and interview questions. The multiple-choice test was administered to 34 grade 12…
Test-Wiseness Cues in the Options of Mathematics Items.

ERIC Educational Resources Information Center

Kuntz, Patricia

The quality of mathematics multiple choice items and their susceptibility to test wiseness were examined. Test wiseness was defined as "a subject's capacity to utilize the characteristics and formats of the test and/or test taking situation to receive a high score." The study used results of the Graduate Record Examinations Aptitude Test (GRE) and…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

2014-01-01

The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…

Development and Preliminary Testing of the Food Choice Priorities Survey (FCPS): Assessing the Importance of Multiple Factors on College Students' Food Choices.

PubMed

Vilaro, Melissa J; Zhou, Wenjun; Colby, Sarah E; Byrd-Bredbenner, Carol; Riggsbee, Kristin; Olfert, Melissa D; Barnett, Tracey E; Mathews, Anne E

2017-12-01

Understanding factors that influence food choice may help improve diet quality. Factors that commonly affect adults' food choices have been described, but measures that identify and assess food choice factors specific to college students are lacking. This study developed and tested the Food Choice Priorities Survey (FCPS) among college students. Thirty-seven undergraduates participated in two focus groups ( n = 19; 11 in the male-only group, 8 in the female-only group) and interviews ( n = 18) regarding typical influences on food choice. Qualitative data informed the development of survey items with a 5-point Likert-type scale (1 = not important, 5 = extremely important). An expert panel rated FCPS items for clarity, relevance, representativeness, and coverage using a content validity form. To establish test-retest reliability, 109 first-year college students completed the 14-item FCPS at two time points, 0-48 days apart ( M = 13.99, SD = 7.44). Using Cohen's weighted κ for responses within 20 days, 11 items demonstrated moderate agreement and 3 items had substantial agreement. Factor analysis revealed a three-factor structure (9 items). The FCPS is designed for college students and provides a way to determine the factors of greatest importance regarding food choices among this population. From a public health perspective, practical applications include using the FCPS to tailor health communications and behavior change interventions to factors most salient for food choices of college students.
An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis.

PubMed

Tarrant, Marie; Ware, James; Mohammed, Ahmed M

2009-07-07

Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

ERIC Educational Resources Information Center

Lee, Hee-Sun; Liu, Ou Lydia; Linn, Marcia C.

2011-01-01

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item…
Multiple Choice Items: How to Gain the Most out of Them.

ERIC Educational Resources Information Center

Talmir, Pinchas

1991-01-01

Describes how multiple-choice items can be designed and used as an effective diagnostic tool by avoiding their pitfalls and by taking advantage of their potential benefits. The following issues are discussed: correct' versus best answers; construction of diagnostic multiple-choice items; the problem of guessing; the use of justifications of…
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.

ERIC Educational Resources Information Center

Finney, Sara J.; Smith, Russell W.; Wise, Steven L.

Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
Aligning Items and Achievement Levels: A Study Comparing Expert Judgments

ERIC Educational Resources Information Center

Kaliski, Pamela; Huff, Kristen; Barry, Carol

2011-01-01

For educational achievement tests that employ multiple-choice (MC) items and aim to reliably classify students into performance categories, it is critical to design MC items that are capable of discriminating student performance according to the stated achievement levels. This is accomplished, in part, by clearly understanding how item design…
Application of a Utility Analysis to Evaluate a Novel Assessment Tool for Clinically Oriented Physiology and Pharmacology

ERIC Educational Resources Information Center

Cramer, Nicholas; Asmar, Abdo; Gorman, Laurel; Gros, Bernard; Harris, David; Howard, Thomas; Hussain, Mujtaba; Salazar, Sergio; Kibble, Jonathan D.

2016-01-01

Multiple-choice questions are a gold-standard tool in medical school for assessment of knowledge and are the mainstay of licensing examinations. However, multiple-choice questions items can be criticized for lacking the ability to test higher-order learning or integrative thinking across multiple disciplines. Our objective was to develop a novel…
Appropriateness Measurement with Polychotomous Item Response Models and Standardized Indices. Measurement Series, 84-1.

ERIC Educational Resources Information Center

Drasgow, Fritz; And Others

The test scores of some examinees on a multiple-choice test may not provide adequate measures of their abilities. The goal of appropriateness measurement is to identify such individuals. Earlier theoretical and experimental work considered examinees answering all, or almost all, test items. This article reports research that extends…
The Effects of Item by Item Feedback Given during an Ability Test.

ERIC Educational Resources Information Center

Whetton, C.; Childs, R.

1981-01-01

Answer-until-correct (AUC) is a procedure for providing feedback during a multiple-choice test, giving an increased range of scores. The performance of secondary students on a verbal ability test using AUC procedures was compared with a group using conventional instructions. AUC scores considerably enhanced reliability but not validity.…
Multiple-Choice and Short-Answer Exam Performance in a College Classroom

ERIC Educational Resources Information Center

Funk, Steven C.; Dickson, K. Laurie

2011-01-01

The authors experimentally investigated the effects of multiple-choice and short-answer format exam items on exam performance in a college classroom. They randomly assigned 50 students to take a 10-item short-answer pretest or posttest on two 50-item multiple-choice exams in an introduction to personality course. Students performed significantly…
Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

ERIC Educational Resources Information Center

Parish, Jane A.; Karisch, Brandi B.

2013-01-01

Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
[Development of critical thinking skill evaluation scale for nursing students].

PubMed

You, So Young; Kim, Nam Cho

2014-04-01

To develop a Critical Thinking Skill Test for Nursing Students. The construct concepts were drawn from a literature review and in-depth interviews with hospital nurses and surveys were conducted among students (n=607) from nursing colleges. The data were collected from September 13 to November 23, 2012 and analyzed using the SAS program, 9.2 version. The KR 20 coefficient for reliability, difficulty index, discrimination index, item-total correlation and known group technique for validity were performed. Four domains and 27 skills were identified and 35 multiple choice items were developed. Thirty multiple choice items which had scores higher than .80 on the content validity index were selected for the pre test. From the analysis of the pre test data, a modified 30 items were selected for the main test. In the main test, the KR 20 coefficient was .70 and Corrected Item-Total Correlations range was .11-.38. There was a statistically significant difference between two academic systems (p=.001). The developed instrument is the first critical thinking skill test reflecting nursing perspectives in hospital settings and is expected to be utilized as a tool which contributes to improvement of the critical thinking ability of nursing students.
Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

ERIC Educational Resources Information Center

Eleje, Lydia I.; Esomonu, Nkechi P. M.

2018-01-01

A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
A Study of the Homogeneity of Items Produced From Item Forms Across Different Taxonomic Levels.

ERIC Educational Resources Information Center

Weber, Margaret B.; Argo, Jana K.

This study determined whether item forms ( rules for constructing items related to a domain or set of tasks) would enable naive item writers to generate multiple-choice items at three taxonomic levels--knowledge, comprehension, and application. Students wrote 120 multiple-choice items from 20 item forms, corresponding to educational objectives…
Project Physics Tests 1, Concepts of Motion.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

Test items relating to Project Physics Unit 1 are presented in this booklet, consisting of 70 multiple-choice and 20 problem-and-essay questions. Concepts of motion are examined with respect to velocities, acceleration, forces, vectors, Newton's laws, and circular motion. Suggestions are made for time consumption in answering some items. Besides…
Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.

ERIC Educational Resources Information Center

Smith, Clifton L.; And Others

This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…
Student certainty answering misconception question: study of Three-Tier Multiple-Choice Diagnostic Test in Acid-Base and Solubility Equilibrium

NASA Astrophysics Data System (ADS)

Ardiansah; Masykuri, M.; Rahardjo, S. B.

2018-04-01

Students’ concept comprehension in three-tier multiple-choice diagnostic test related to student confidence level. The confidence level related to certainty and student’s self-efficacy. The purpose of this research was to find out students’ certainty in misconception test. This research was quantitative-qualitative research method counting students’ confidence level. The research participants were 484 students that were studying acid-base and equilibrium solubility subject. Data was collected using three-tier multiple-choice (3TMC) with thirty questions and students’ questionnaire. The findings showed that #6 item gives the highest misconception percentage and high student confidence about the counting of ultra-dilute solution’s pH. Other findings were that 1) the student tendency chosen the misconception answer is to increase over item number, 2) student certainty decreased in terms of answering the 3TMC, and 3) student self-efficacy and achievement were related each other in the research. The findings suggest some implications and limitations for further research.
An Instrument to Predict Job Performance of Home Health Aides--Testing the Reliability and Validity.

ERIC Educational Resources Information Center

Sturges, Jack; Quina, Patricia

The development of four paper-and-pencil tests, useful in assessing the effectiveness of inservice training provided to either nurses aides or home health aides, was described. These tests were designed for utilization in employment selection and case assignment. Two tests of 37 multiple-choice items and two tests of 10 matching items were…
Multiple Choice Questions Can Be Designed or Revised to Challenge Learners' Critical Thinking

ERIC Educational Resources Information Center

Tractenberg, Rochelle E.; Gushta, Matthew M.; Mulroney, Susan E.; Weissinger, Peggy A.

2013-01-01

Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to…
Comparison between three option, four option and five option multiple choice question tests for quality parameters: A randomized study.

PubMed

Vegada, Bhavisha; Shukla, Apexa; Khilnani, Ajeetkumar; Charan, Jaykaran; Desai, Chetna

2016-01-01

Most of the academic teachers use four or five options per item of multiple choice question (MCQ) test as formative and summative assessment. Optimal number of options in MCQ item is a matter of considerable debate among academic teachers of various educational fields. There is a scarcity of the published literature regarding the optimum number of option in each item of MCQ in the field of medical education. To compare three options, four options, and five options MCQs test for the quality parameters - reliability, validity, item analysis, distracter analysis, and time analysis. Participants were 3 rd semester M.B.B.S. students. Students were divided randomly into three groups. Each group was given one set of MCQ test out of three options, four options, and five option randomly. Following the marking of the multiple choice tests, the participants' option selections were analyzed and comparisons were conducted of the mean marks, mean time, validity, reliability and facility value, discrimination index, point biserial value, distracter analysis of three different option formats. Students score more ( P = 0.000) and took less time ( P = 0.009) for the completion of three options as compared to four options and five options groups. Facility value was more ( P = 0.004) in three options group as compared to four and five options groups. There was no significant difference between three groups for the validity, reliability, and item discrimination. Nonfunctioning distracters were more in the four and five options group as compared to three option group. Assessment based on three option MCQs is can be preferred over four option and five option MCQs.

Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

ERIC Educational Resources Information Center

Smith, Clifton L.; And Others

This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…
Practical Implications of Test Dimensionality for Item Response Theory Calibration of the Medical College Admission Test. MCAT Monograph.

ERIC Educational Resources Information Center

Childs, Ruth A.; Oppler, Scott H.

The use of item response theory (IRT) in the Medical College Admission Test (MCAT) testing program has been limited. This study provides a basis for future IRT analyses of the MCAT by exploring the dimensionality of each of the MCAT's three multiple-choice test sections (Verbal Reasoning, Physical Sciences, and Biological Sciences) and the…
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
A Two-Parameter Latent Trait Model. Methodology Project.

ERIC Educational Resources Information Center

Choppin, Bruce

On well-constructed multiple-choice tests, the most serious threat to measurement is not variation in item discrimination, but the guessing behavior that may be adopted by some students. Ways of ameliorating the effects of guessing are discussed, especially for problems in latent trait models. A new item response model, including an item parameter…
Instrument Formatting with Computer Data Entry in Mind.

ERIC Educational Resources Information Center

Boser, Judith A.; And Others

Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
Asymmetry in Student Achievement on Multiple-Choice and Constructed-Response Items in Reversible Mathematics Processes

ERIC Educational Resources Information Center

Sangwin, Christopher J.; Jones, Ian

2017-01-01

In this paper we report the results of an experiment designed to test the hypothesis that when faced with a question involving the inverse direction of a reversible mathematical process, students solve a multiple-choice version by verifying the answers presented to them by the direct method, not by undertaking the actual inverse calculation.…
Developing Form Assembly Specifications for Exams with Multiple Choice and Constructed Response Items: Balancing Reliability and Validity Concerns

ERIC Educational Resources Information Center

Hendrickson, Amy; Patterson, Brian; Ewing, Maureen

2010-01-01

The psychometric considerations and challenges associated with including constructed response items on tests are discussed along with how these issues affect the form assembly specifications for mixed-format exams. Reliability and validity, security and fairness, pretesting, content and skills coverage, test length and timing, weights, statistical…
Effects of Test Format, Self Concept and Anxiety on Item Response Changing Behaviour

ERIC Educational Resources Information Center

Afolabi, E. R. I.

2007-01-01

The study examined the effects of item format, self-concept and anxiety on response changing behaviour. Four hundred undergraduate students who offered a counseling psychology course in a Nigerian university participated in the study. Students' answers in multiple--choice and true--false formats of an achievement test were observed for response…
Feedback-related brain activity predicts learning from feedback in multiple-choice testing.

PubMed

Ernst, Benjamin; Steinhauser, Marco

2012-06-01

Different event-related potentials (ERPs) have been shown to correlate with learning from feedback in decision-making tasks and with learning in explicit memory tasks. In the present study, we investigated which ERPs predict learning from corrective feedback in a multiple-choice test, which combines elements from both paradigms. Participants worked through sets of multiple-choice items of a Swahili-German vocabulary task. Whereas the initial presentation of an item required the participants to guess the answer, corrective feedback could be used to learn the correct response. Initial analyses revealed that corrective feedback elicited components related to reinforcement learning (FRN), as well as to explicit memory processing (P300) and attention (early frontal positivity). However, only the P300 and early frontal positivity were positively correlated with successful learning from corrective feedback, whereas the FRN was even larger when learning failed. These results suggest that learning from corrective feedback crucially relies on explicit memory processing and attentional orienting to corrective feedback, rather than on reinforcement learning.
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.

ERIC Educational Resources Information Center

Maihoff, N. A.; Mehrens, Wm. A.

A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
Comedy workshop: an enjoyable way to develop multiple-choice questions.

PubMed

Droegemueller, William; Gant, Norman; Brekken, Alvin; Webb, Lynn

2005-01-01

To describe an innovative method of developing multiple-choice items for a board certification examination. The development of appropriate multiple-choice items is definitely more of an art, rather than a science. The comedy workshop format for developing questions for a certification examination is similar to the process used by comedy writers composing scripts for television shows. This group format dramatically diminishes the frustrations faced by an individual question writer attempting to create items. The vast majority of our comedy workshop participants enjoy and prefer the comedy workshop format. It provides an ideal environment in which to teach and blend the talents of inexperienced and experienced question writers. This is a descriptive article, in which we suggest an innovative process in the art of creating multiple-choice items for a high-stakes examination.
Physics 300 Provincial Examination.

ERIC Educational Resources Information Center

Manitoba Dept. of Education and Training, Winnipeg.

This document consists of the physics 300 provincial examination (English version), a separate "provincial summary report" on the results of giving the test, and a separate French language version of the examination. This physics examination contains a 53-item multiple choice section and an 12 item free response section. Subsections of…
Marine Education Knowledge Inventory.

ERIC Educational Resources Information Center

Hounshell, Paul B.; Hampton, Carolyn

This 35-item, multiple-choice Marine Education Knowledge Inventory was developed for use in upper elementary/middle schools to measure a student's knowledge of marine science. Content of test items is drawn from oceanography, ecology, earth science, navigation, and the biological sciences (focusing on marine animals). Steps in the construction of…
Performance of Certification and Recertification Examinees on Multiple Choice Test Items: Does Physician Age Have an Impact?

PubMed

Shen, Linjun; Juul, Dorthea; Faulkner, Larry R

2016-01-01

The development of recertification programs (now referred to as Maintenance of Certification or MOC) by the members of the American Board of Medical Specialties provides the opportunity to study knowledge base across the professional lifespan of physicians. Research results to date are mixed with some studies finding negative associations between age and various measures of competency and others finding no or minimal relationships. Four groups of multiple choice test items that were independently developed for certification and MOC examinations in psychiatry and neurology were administered to certification and MOC examinees within each specialty. Percent correct scores were calculated for each examinee. Differences between certification and MOC examinees were compared using unpaired t tests, and logistic regression was used to compare MOC and certification examinee performance on the common test items. Except for the neurology certification test items that addressed basic neurology concepts, the performance of the certification and MOC examinees was similar. The differences in performance on individual test items did not consistently favor one group or the other and could not be attributed to any distinguishable content or format characteristics of those items. The findings of this study are encouraging in that physicians who had recently completed residency training possessed clinical knowledge that was comparable to that of experienced physicians, and the experienced physicians' clinical knowledge was equivalent to that of recent residency graduates. The role testing can play in enhancing expertise is described.
Can Item Analysis of MCQs Accomplish the Need of a Proper Assessment Strategy for Curriculum Improvement in Medical Education?

ERIC Educational Resources Information Center

Pawade, Yogesh R.; Diwase, Dipti S.

2016-01-01

Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
Do large-scale assessments measure students' ability to integrate scientific knowledge?

NASA Astrophysics Data System (ADS)

Lee, Hee-Sun

2010-03-01

Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
An Item Response Theory Analysis of Palmore's Facts on Aging Quiz (FAQ) Using the Three Parameter Model.

ERIC Educational Resources Information Center

Obiekwe, Jerry C.

Palmore's Facts on Aging Quiz (FAQ) (E. Palmore, 1977) is an instrument that is used to educate, to measure learning, to test knowledge, to measure attitudes toward aging, and in research. A comparative analysis was performed between the FAQ I and its multiple choice version and the FAQ II and its multiple choice version in terms of their item…
Using Distractor-Driven Standards-Based Multiple-Choice Assessments and Rasch Modeling to Investigate Hierarchies of Chemistry Misconceptions and Detect Structural Problems with Individual Items

ERIC Educational Resources Information Center

Herrmann-Abell, Cari F.; DeBoer, George E.

2011-01-01

Distractor-driven multiple-choice assessment items and Rasch modeling were used as diagnostic tools to investigate students' understanding of middle school chemistry ideas. Ninety-one items were developed according to a procedure that ensured content alignment to the targeted standards and construct validity. The items were administered to 13360…
Configural Frequency Analysis as a Statistical Tool for Developmental Research.

ERIC Educational Resources Information Center

Lienert, Gustav A.; Oeveste, Hans Zur

1985-01-01

Configural frequency analysis (CFA) is suggested as a technique for longitudinal research in developmental psychology. Stability and change in answers to multiple choice and yes-no item patterns obtained with repeated measurements are identified by CFA and illustrated by developmental analysis of an item from Gorham's Proverb Test. (Author/DWH)

Investigating High School Students' Understanding of Chemical Equilibrium Concepts

ERIC Educational Resources Information Center

Karpudewan, Mageswary; Treagust, David F.; Mocerino, Mauro; Won, Mihye; Chandrasegaran, A. L.

2015-01-01

This study investigated the year 12 students' (N = 56) understanding of chemical equilibrium concepts after instruction using two conceptual tests, the "Chemical Equilibrium Conceptual Test 1" ("CECT-1") consisting of nine two-tier multiple-choice items and the "Chemical Equilibrium Conceptual Test 2"…
Physics Achievement Test.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

This document is an evaluation instrument developed as a part of Harvard Project Physics (HPP). It consists of a 36-item, multiple choice (five options) Physics Achievement Test (PAT) designed to measure general knowledge of physics as well as the material emphasized in HPP. (PEB)
Is It Working? Distractor Analysis Results from the Test Of Astronomy STandards (TOAST) Assessment Instrument

NASA Astrophysics Data System (ADS)

Slater, Stephanie

2009-05-01

The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Pursuing the Qualities of a "Good" Test

ERIC Educational Resources Information Center

Coniam, David

2014-01-01

This article examines the issue of the quality of teacher-produced tests, limiting itself in the current context to objective, multiple-choice tests. The article investigates a short, two-part 20-item English language test. After a brief overview of the key test qualities of reliability and validity, the article examines the two subtests in terms…
Examining Gender DIF on a Multiple-Choice Test of Mathematics: A Confirmatory Approach.

ERIC Educational Resources Information Center

Ryan, Katherine E.; Fan, Meichu

1996-01-01

Results for 3,244 female and 3,033 male junior high school students from the Second International Mathematics Study show that applied items in algebra, geometry, and computation were easier for males but arithmetic items were differentially easier for females. Implications of these findings for assessment and instruction are discussed. (SLD)
The Impact of Kentucky's Educational Reform Act on Writing throughout the Commonwealth.

ERIC Educational Resources Information Center

Harnack, Andrew; And Others

1994-01-01

The central role of writing in Kentucky's Education Reform Act is most evident in Kentucky's new assessment system, which employs writing on all levels. Even tests that have recently included multiple-choice items may be replaced by response items that require students to apply knowledge, concepts, and skills in a writing format. Writing itself is…
Two systems drive attention to rewards.

PubMed

Kovach, Christopher K; Sutterer, Matthew J; Rushia, Sara N; Teriakidis, Adrianna; Jenison, Rick L

2014-01-01

How options are framed can dramatically influence choice preference. While salience of information plays a central role in this effect, precisely how it is mediated by attentional processes remains unknown. Current models assume a simple relationship between attention and choice, according to which preference should be uniformly biased towards the attended item over the whole time-course of a decision between similarly valued items. To test this prediction we considered how framing alters the orienting of gaze during a simple choice between two options, using eye movements as a sensitive online measure of attention. In one condition participants selected the less preferred item to discard and in the other, the more preferred item to keep. We found that gaze gravitates towards the item ultimately selected, but did not observe the effect to be uniform over time. Instead, we found evidence for distinct early and late processes that guide attention according to preference in the first case and task demands in the second. We conclude that multiple time-dependent processes govern attention during choice, and that these may contribute to framing effects in different ways.
Two systems drive attention to rewards

PubMed Central

Kovach, Christopher K.; Sutterer, Matthew J.; Rushia, Sara N.; Teriakidis, Adrianna; Jenison, Rick L.

2014-01-01

How options are framed can dramatically influence choice preference. While salience of information plays a central role in this effect, precisely how it is mediated by attentional processes remains unknown. Current models assume a simple relationship between attention and choice, according to which preference should be uniformly biased towards the attended item over the whole time-course of a decision between similarly valued items. To test this prediction we considered how framing alters the orienting of gaze during a simple choice between two options, using eye movements as a sensitive online measure of attention. In one condition participants selected the less preferred item to discard and in the other, the more preferred item to keep. We found that gaze gravitates towards the item ultimately selected, but did not observe the effect to be uniform over time. Instead, we found evidence for distinct early and late processes that guide attention according to preference in the first case and task demands in the second. We conclude that multiple time-dependent processes govern attention during choice, and that these may contribute to framing effects in different ways. PMID:24550868
Are Faculty Predictions or Item Taxonomies Useful for Estimating the Outcome of Multiple-Choice Examinations?

ERIC Educational Resources Information Center

Kibble, Jonathan D.; Johnson, Teresa

2011-01-01

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The…
A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

ERIC Educational Resources Information Center

Wolkowitz, Amanda A.; Skorupski, William P.

2013-01-01

When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
Meatcutting Testbook, Part I.

ERIC Educational Resources Information Center

Strazicich, Mirko, Ed.

This document contains objective tests for each lesson in the Meatcutting Workbook, Part I (see note), which is designed for apprenticeship programs in meatcutting in California. Each of the 36 tests contains from 10 to 45 multiple-choice items. The tests are grouped according to the eight units of the workbook: the apprentice meatcutter; applied…
Using Multigroup Confirmatory Factor Analysis to Test Measurement Invariance in Raters: A Clinical Skills Examination Application

ERIC Educational Resources Information Center

Kahraman, Nilufer; Brown, Crystal B.

2015-01-01

Psychometric models based on structural equation modeling framework are commonly used in many multiple-choice test settings to assess measurement invariance of test items across examinee subpopulations. The premise of the current article is that they may also be useful in the context of performance assessment tests to test measurement invariance…
An Investigation of the Accuracy of Alternative Methods of True Score Estimation in High-Stakes Mixed-Format Examinations.

ERIC Educational Resources Information Center

Klinger, Don A.; Rogers, W. Todd

2003-01-01

The estimation accuracy of procedures based on classical test score theory and item response theory (generalized partial credit model) were compared for examinations consisting of multiple-choice and extended-response items. Analysis of British Columbia Scholarship Examination results found an error rate of about 10 percent for both methods, with…
Attainment of Selected Earth Science Concepts by Texas High School Seniors.

ERIC Educational Resources Information Center

Rollins, Mavis M.; And Others

The purpose of this study was to determine whether high school seniors (N=492) had attained each of five selected earth science concepts and if said attainment was influenced by the number of science courses completed. A 72-item, multiple-choice format test (12 items for each concept) was developed and piloted previous to this study to measure…
An Alternative to the 3PL: Using Asymmetric Item Characteristic Curves to Address Guessing Effects

ERIC Educational Resources Information Center

Lee, Sora; Bolt, Daniel M.

2018-01-01

Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating guessing effects on multiple-choice items are well documented. We consider the use of a residual heteroscedasticity (RH) model as an alternative, and compare its performance to the 3PL with real test data sets and through simulation…
Assessing the Life Science Knowledge of Students and Teachers Represented by the K-8 National Science Standards

ERIC Educational Resources Information Center

Sadler, Philip M.; Coyle, Harold; Cook Smith, Nancy; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test…
Identification of technical item flaws leads to improvement of the quality of single best Multiple Choice Questions.

PubMed

Fayyaz Khan, Humaira; Farooq Danish, Khalid; Saeed Awan, Azra; Anwar, Masood

2013-05-01

The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011. This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ's submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test. Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%. Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ's.
Modeling Incorrect Responses to Multiple-Choice Items with Multilinear Formula Score Theory.

ERIC Educational Resources Information Center

Drasgow, Fritz; And Others

This paper addresses the information revealed in incorrect option selection on multiple choice items. Multilinear Formula Scoring (MFS), a theory providing methods for solving psychological measurement problems of long standing, is first used to estimate option characteristic curves for the Armed Services Vocational Aptitude Battery Arithmetic…
Force Concept Inventory-Based Multiple-Choice Test for Investigating Students' Representational Consistency

ERIC Educational Resources Information Center

Nieminen, Pasi; Savinainen, Antti; Viiri, Jouni

2010-01-01

This study investigates students' ability to interpret multiple representations consistently (i.e., representational consistency) in the context of the force concept. For this purpose we developed the Representational Variant of the Force Concept Inventory (R-FCI), which makes use of nine items from the 1995 version of the Force Concept Inventory…
Revisiting the role of recollection in item versus forced-choice recognition memory.

PubMed

Cook, Gabriel I; Marsh, Richard L; Hicks, Jason L

2005-08-01

Many memory theorists have assumed that forced-choice recognition tests can rely more on familiarity, whereas item (yes-no) tests must rely more on recollection. In actuality, several studies have found no differences in the contributions of recollection and familiarity underlying the two different test formats. Using word frequency to manipulate stimulus characteristics, the present study demonstrated that the contributions of recollection to item versus forced-choice tests is variable. Low word frequency resulted in significantly more recollection in an item test than did a forced-choice procedure, but high word frequency produced the opposite result. These results clearly constrain any uniform claim about the degree to which recollection supports responding in item versus forced-choice tests.

Decision making under internal uncertainty: the case of multiple-choice tests with different scoring rules.

PubMed

Bereby-Meyer, Yoella; Meyer, Joachim; Budescu, David V

2003-02-01

This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT. Copyright 2002 Elsevier Science B.V.
Development of a State-Wide Competency Test for Marketing Education. Final Report.

ERIC Educational Resources Information Center

Smith, Clifton L.

A project was conducted to develop a valid, competency-referenced test on the core competencies identified for the Missouri Fundamentals of Marketing curriculum. During the project: (1) multiple-choice test items based on the core competencies in the Fundamentals of Marketing curriculum were developed; (2) instructions for onsite administration of…
Developing Information Skills Test for Malaysian Youth Students Using Rasch Analysis

ERIC Educational Resources Information Center

Karim, Aidah Abdul; Shah, Parilah M.; Din, Rosseni; Ahmad, Mazalah; Lubis, Maimun Aqhsa

2014-01-01

This study explored the psychometric properties of a locally developed information skills test for youth students in Malaysia using Rasch analysis. The test was a combination of 24 structured and multiple choice items with a 4-point grading scale. The test was administered to 72 technical college students and 139 secondary school students. The…
Construction of Valid and Reliable Test for Assessment of Students

ERIC Educational Resources Information Center

Osadebe, P. U.

2015-01-01

The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…
Construction of Economics Achievement Test for Assessment of Students

ERIC Educational Resources Information Center

Osadebe, P. U.

2014-01-01

The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…
American Sign Language Comprehension Test: A Tool for Sign Language Researchers

ERIC Educational Resources Information Center

Hauser, Peter C.; Paludneviciene, Raylene; Riddle, Wanda; Kurz, Kim B.; Emmorey, Karen; Contreras, Jessica

2016-01-01

The American Sign Language Comprehension Test (ASL-CT) is a 30-item multiple-choice test that measures ASL receptive skills and is administered through a website. This article describes the development and psychometric properties of the test based on a sample of 80 college students including deaf native signers, hearing native signers, deaf…
Project Physics Tests 6, The Nucleus.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

Test items relating to Project Physics Unit 6 are presented in this booklet. Included are 70 multiple-choice and 24 problem-and-essay questions. Nuclear physics fundamentals are examined with respect to the shell model, isotopes, neutrons, protons, nuclides, charge-to-mass ratios, alpha particles, Becquerel's discovery, gamma rays, cyclotrons,…
Project Physics Tests 5, Models of the Atom.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

Test items relating to Project Physics Unit 5 are presented in this booklet. Included are 70 multiple-choice and 23 problem-and-essay questions. Concepts of atomic model are examined on aspects of relativistic corrections, electron emission, photoelectric effects, Compton effect, quantum theories, electrolysis experiments, atomic number and mass,…
Project Physics Tests 4, Light and Electromagnetism.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

Test items relating to Project Physics Unit 4 are presented in this booklet. Included are 70 multiple-choice and 22 problem-and-essay questions. Concepts of light and electromagnetism are examined on charges, reflection, electrostatic forces, electric potential, speed of light, electromagnetic waves and radiations, Oersted's and Faraday's work,…
Project Physics Tests 3, The Triumph of Mechanics.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

Test items relating to Project Physics Unit 3 are presented in this booklet. Included are 70 multiple-choice and 20 problem-and-essay questions. Concepts of mechanics are examined on energy, momentum, kinetic theory of gases, pulse analyses, "heat death," water waves, power, conservation laws, normal distribution, thermodynamic laws, and…
Science Library of Test Items. Volume Three. Mastery Testing Programme. Introduction and Manual.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

A set of short tests aimed at measuring student mastery of specific skills in the natural sciences are presented with a description of the mastery program's purposes, development, and methods. Mastery learning, criterion-referenced testing, and the scope of skills to be tested are defined. Each of the multiple choice tests for grades 7 through 10…
Fundamental Fraction Knowledge of Preservice Elementary Teachers: A Cross-National Study in the United States and Taiwan

ERIC Educational Resources Information Center

Luo, Fenqjen; Lo, Jane-Jane; Leu, Yuh-Chyn

2011-01-01

The purpose of this paper is to show the similarities as well as the differences of fundamental fraction knowledge owned by preservice elementary teachers from the United States (N = 89) and Taiwan (N = 85). To this end, we examined and compared their performance on an instrument including 15 multiple-choice test items. The items were categorized…
Grade 9 Pilot Test. Mathematics. June 1988 = 9e Annee Test Pilote. Mathematiques. Juin 1988.

ERIC Educational Resources Information Center

Alberta Dept. of Education, Edmonton.

This pilot test for ninth grade mathematics is written in both French and English. The test consists of 75 multiple-choice items. Students are given 90 minutes to complete the examination and the use of a calculator is highly recommended. The test content covers a wide range of mathematical topics including: decimals; exponents; arithmetic word…
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Preference of Students on the Format of Options in a Multiple-Choice Test

ERIC Educational Resources Information Center

Oyzon, Voltaire Q.; Bendulo, Hermabeth O.; Tibus, Erlinda D.; Bande, Rhodora A.; Macalinao, Myrna L.

2016-01-01

Schools in the Philippines, especially those that are offering teacher education programs, are advised to construct examinations that are Licensure Examination for Teachers (LET)-like test items. This is because "if any aspect of a test is unfamiliar to candidates, they are likely to perform less well than they would do otherwise on…
Validation of a Standardized Multiple-Choice Multicultural Competence Test: Implications for Training, Assessment, and Practice

ERIC Educational Resources Information Center

Gillem, Angela R.; Bartoli, Eleonora; Bertsch, Kristin N.; McCarthy, Maureen A.; Constant, Kerra; Marrero-Meisky, Sheila; Robbins, Steven J.; Bellamy, Scarlett

2016-01-01

The Multicultural Counseling and Psychotherapy Test (MCPT), a measure of multicultural counseling competence (MCC), was validated in 2 phases. In Phase 1, the authors administered 451 test items derived from multicultural guidelines in counseling and psychology to 32 multicultural experts and 30 nonexperts. In Phase 2, the authors administered the…
Correction for Guessing in the Framework of the 3PL Item Response Theory

ERIC Educational Resources Information Center

Chiu, Ting-Wei

2010-01-01

Guessing behavior is an important topic with regard to assessing proficiency on multiple choice tests, particularly for examinees at lower levels of proficiency due to greater the potential for systematic error or bias which that inflates observed test scores. Methods that incorporate a correction for guessing on high-stakes tests generally rely…
The Potential Use of the Discouraging Random Guessing (DRG) Approach in Multiple-Choice Exams in Medical Education.

ERIC Educational Resources Information Center

Friedman, Miriam; And Others

1987-01-01

Test performances of sophomore medical students on a pretest and final exam (under guessing and no-guessing instructions) were compared. Discouraging random guessing produced test information with improved test reliability and less distortion of item difficulty. More able examinees were less compliant than less able examinees. (Author/RH)
A Comparison of Domain-Referenced and Classic Psychometric Test Construction Methods.

ERIC Educational Resources Information Center

Willoughby, Lee; And Others

This study compared a domain referenced approach with a traditional psychometric approach in the construction of a test. Results of the December, 1975 Quarterly Profile Exam (QPE) administered to 400 examinees at a university were the source of data. The 400 item QPE is a five alternative multiple choice test of information a "safe"…
The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses

ERIC Educational Resources Information Center

Walstad, William B.; Wagner, Jamie

2016-01-01

This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…

Measuring University students' understanding of the greenhouse effect - a comparison of multiple-choice, short answer and concept sketch assessment tools with respect to students' mental models

NASA Astrophysics Data System (ADS)

Gold, A. U.; Harris, S. E.

2013-12-01

The greenhouse effect comes up in most discussions about climate and is a key concept related to climate change. Existing studies have shown that students and adults alike lack a detailed understanding of this important concept or might hold misconceptions. We studied the effectiveness of different interventions on University-level students' understanding of the greenhouse effect. Introductory level science students were tested for their pre-knowledge of the greenhouse effect using validated multiple-choice questions, short answers and concept sketches. All students participated in a common lesson about the greenhouse effect and were then randomly assigned to one of two lab groups. One group explored an existing simulation about the greenhouse effect (PhET-lesson) and the other group worked with absorption spectra of different greenhouse gases (Data-lesson) to deepen the understanding of the greenhouse effect. All students completed the same assessment including multiple choice, short answers and concept sketches after participation in their lab lesson. 164 students completed all the assessments, 76 completed the PhET lesson and 77 completed the data lesson. 11 students missed the contrasting lesson. In this presentation we show the comparison between the multiple-choice questions, short answer questions and the concept sketches of students. We explore how well each of these assessment types represents student's knowledge. We also identify items that are indicators of the level of understanding of the greenhouse effect as measured in correspondence of student answers to an expert mental model and expert responses. Preliminary data analysis shows that student who produce concept sketch drawings that come close to expert drawings also choose correct multiple-choice answers. However, correct multiple-choice answers are not necessarily an indicator that a student produces an expert-like correlating concept sketch items. Multiple-choice questions that require detailed knowledge of the greenhouse effect (e.g. direction of re-emission of infrared energy from greenhouse gas) are significantly more likely to be answered correctly by students who also produce expert-like concept sketch items than by students who don't include this aspect in their sketch and don't answer the multiple choice questions correctly. This difference is not as apparent for less technical multiple-choice questions (e.g. type of radiation emitted by Sun). Our findings explore the formation of student's mental models throughout different interventions and how well the different assessment techniques used in this study represent the student understanding of the overall concept.
The Vitamin D Endocrine System.

ERIC Educational Resources Information Center

Norman, Anthony W.

1985-01-01

Discusses the physiology and biochemistry of the vitamin D endocrine system, including role of biological calcium and phosphorus, vitamin D metabolism, and related diseases. A 10-item, multiple-choice test which can be used to obtain continuing medical education credit is included. (JN)
Understanding Test-Takers' Perceptions of Difficulty in EAP Vocabulary Tests: The Role of Experiential Factors

ERIC Educational Resources Information Center

Oruç Ertürk, Nesrin; Mumford, Simon E.

2017-01-01

This study, conducted by two researchers who were also multiple-choice question (MCQ) test item writers at a private English-medium university in an English as a foreign language (EFL) context, was designed to shed light on the factors that influence test-takers' perceptions of difficulty in English for academic purposes (EAP) vocabulary, with the…
Free-Response and Multiple-Choice Items: Measures of the Same Ability?

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

This study examined the relationship of multiple-choice and free-response items contained on the College Board's Advanced Placement Computer Science (APCS) examination. Subjects were two samples of 1,000 randomly drawn from the population of 7,372 high school students taking the 1988 examination of the APCS "AB" form. Most were high…
The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

PubMed

Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

2010-10-01

The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Selected Test Items in American History. Bulletin Number 6, Fifth Edition.

ERIC Educational Resources Information Center

Anderson, Howard R.; Lindquist, E. F.

Designed for high school students, this bulletin provides an extensive file of 1,062 multiple-choice questions in American history. Taken largely from the Iowa Every-Pupil Program and the Cooperative Test Service standardized examinations, the questions are chronologically divided into 16 topic areas. They include exploration and discovery;…
The None-of-the-Above Option: An Empirical Study.

ERIC Educational Resources Information Center

Frary, Robert B.

1991-01-01

The use of the "none-of-the-above" option (NOTA) in 20 college-level multiple-choice tests was evaluated for classes with 100 or more students. Eight academic disciplines were represented, and 295 NOTA and 724 regular test items were used. It appears that the NOTA can be compatible with good classroom measurement. (TJH)
Project Physics Tests 2, Motion in the Heavens.

ERIC Educational Resources Information Center

Harvard Univ., Cambridge, MA. Harvard Project Physics.

Test items relating to Project Physics Unit 2 are presented in this booklet. Included are 70 multiple-choice and 22 problem-and-essay questions. Concepts of motion in the heavens are examined for planetary motions, heliocentric theory, forces exerted on the planets, Kepler's laws, gravitational force, Galileo's work, satellite orbits, Jupiter's…
Force Concept Inventory-based multiple-choice test for investigating students' representational consistency

NASA Astrophysics Data System (ADS)

Nieminen, Pasi; Savinainen, Antti; Viiri, Jouni

2010-07-01

This study investigates students’ ability to interpret multiple representations consistently (i.e., representational consistency) in the context of the force concept. For this purpose we developed the Representational Variant of the Force Concept Inventory (R-FCI), which makes use of nine items from the 1995 version of the Force Concept Inventory (FCI). These original FCI items were redesigned using various representations (such as motion map, vectorial and graphical), yielding 27 multiple-choice items concerning four central concepts underpinning the force concept: Newton’s first, second, and third laws, and gravitation. We provide some evidence for the validity and reliability of the R-FCI; this analysis is limited to the student population of one Finnish high school. The students took the R-FCI at the beginning and at the end of their first high school physics course. We found that students’ (n=168) representational consistency (whether scientifically correct or not) varied considerably depending on the concept. On average, representational consistency and scientifically correct understanding increased during the instruction, although in the post-test only a few students performed consistently both in terms of representations and scientifically correct understanding. We also compared students’ (n=87) results of the R-FCI and the FCI, and found that they correlated quite well.
Bilingual Test as a Test Accommodation to Determine the Mathematics Achievement of Mainstream Students with Limited English Proficiency

ERIC Educational Resources Information Center

Shanmugam, S. Kanageswari Suppiah; Lan, Ong Saw

2013-01-01

Purpose: This study aims to investigate the validity of using bilingual test to measure the mathematics achievement of students who have limited English proficiency (LEP). The bilingual test and the English-only test consist of 20 computation and 20 word problem multiple-choice questions (from TIMSS 2003 and 2007 released items. The bilingual test…
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Writing Multiple Choice Outcome Questions to Assess Knowledge and Competence.

PubMed

Brady, Erik D

2015-11-01

Few articles contemplate the need for good guidance in question item-writing in the continuing education (CE) space. Although many of the core principles of sound item design translate to the CE health education team, the need exists for specific examples for nurse educators that clearly describe how to measure changes in competence and knowledge using multiple choice items. In this article, some keys points and specific examples for nursing CE providers are shared. Copyright 2015, SLACK Incorporated.
Constructing objective tests

NASA Astrophysics Data System (ADS)

Aubrecht, Gordon J.; Aubrecht, Judith D.

1983-07-01

True-false or multiple-choice tests can be useful instruments for evaluating student progress. We examine strategies for planning objective tests which serve to test the material covered in science (physics) courses. We also examine strategies for writing questions for tests within a test blueprint. The statistical basis for judging the quality of test items are discussed. Reliability, difficulty, and discrimination indices are defined and examples presented. Our recommendation are rather easily put into practice.
Understanding Rasch Measurement: Distractors with Information in Multiple Choice Items: A Rationale Based on the Rasch Model

ERIC Educational Resources Information Center

Andrich, David; Styles, Irene

2011-01-01

There is a substantial literature on attempts to obtain information on the proficiency of respondents from distractors in multiple choice items. Information in a distractor implies that a person who chooses that distractor has greater proficiency than if the person chose another distractor with no information. A further implication is that the…
PubMed Central

PANATTO, D.; ARATA, L.; BEVILACQUA, I.; APPRATO, L.; GASPARINI, R.; AMICIZIA, D.

2015-01-01

Summary Introduction. Health-related knowledge is often assessed through multiple-choice tests. Among the different types of formats, researchers may opt to use multiple-mark items, i.e. with more than one correct answer. Although multiple-mark items have long been used in the academic setting – sometimes with scant or inconclusive results – little is known about the implementation of this format in research on in-field health education and promotion. Methods. A study population of secondary school students completed a survey on nutrition-related knowledge, followed by a single- lecture intervention. Answers were scored by means of eight different scoring algorithms and analyzed from the perspective of classical test theory. The same survey was re-administered to a sample of the students in order to evaluate the short-term change in their knowledge. Results. In all, 286 questionnaires were analyzed. Partial scoring algorithms displayed better psychometric characteristics than the dichotomous rule. In particular, the algorithm proposed by Ripkey and the balanced rule showed greater internal consistency and relative efficiency in scoring multiple-mark items. A penalizing algorithm in which the proportion of marked distracters was subtracted from that of marked correct answers was the only one that highlighted a significant difference in performance between natives and immigrants, probably owing to its slightly better discriminatory ability. This algorithm was also associated with the largest effect size in the pre-/post-intervention score change. Discussion. The choice of an appropriate rule for scoring multiple- mark items in research on health education and promotion should consider not only the psychometric properties of single algorithms but also the study aims and outcomes, since scoring rules differ in terms of biasness, reliability, difficulty, sensitivity to guessing and discrimination. PMID:26900331
Force, Velocity, and Work: The Effects of Different Contexts on Students' Understanding of Vector Concepts Using Isomorphic Problems

ERIC Educational Resources Information Center

Barniol, Pablo; Zavala, Genaro

2014-01-01

In this article we compare students' understanding of vector concepts in problems with no physical context, and with three mechanics contexts: force, velocity, and work. Based on our "Test of Understanding of Vectors," a multiple-choice test presented elsewhere, we designed two isomorphic shorter versions of 12 items each: a test with no…
Applied Reading Test--Forms A and B, Interim Manual, and Answer Sheets.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

Designed for use in the selection of apprentices, trainees, technical and trade personnel, and any other persons who need to read and understand text of a technical nature, this Applied Reading Test specimen set contains six passages and 32 items, has a 30-minute time limit, and is presented in a reusable multiple choice test booklet. The specimen…
Immediate vs. Delayed Feedback in a Computer-Managed Test: Effects on Long-Term Retention. Technical Report, March 1976-August 1976.

ERIC Educational Resources Information Center

Sturges, Persis T.

This experiment was designed to test the effect of immediate and delayed feedback on retention of learning in an educational situation. Four groups of college undergraduates took a multiple-choice computer-managed test. Three of these groups received informative feedback (the entire item with the correct answer identified) either: (1) immediately…
Constructing a Criterion Reference Test to Measure the Research and Statistical Competencies of Graduate Students at the Jordanian Governmental Universities

ERIC Educational Resources Information Center

Al-Habashneh, Maher Hussein; Najjar, Nabil Juma

2017-01-01

This study aimed at constructing a criterion-reference test to measure the research and statistical competencies of graduate students at the Jordanian governmental universities, the test has to be in its first form of (50) multiple choice items, then the test was introduced to (5) arbitrators with competence in measurement and evaluation to…
Comprehension of confidence intervals - development and piloting of patient information materials for people with multiple sclerosis: qualitative study and pilot randomised controlled trial.

PubMed

Rahn, Anne C; Backhus, Imke; Fuest, Franz; Riemann-Lorenz, Karin; Köpke, Sascha; van de Roemer, Adrianus; Mühlhauser, Ingrid; Heesen, Christoph

2016-09-20

Presentation of confidence intervals alongside information about treatment effects can support informed treatment choices in people with multiple sclerosis. We aimed to develop and pilot-test different written patient information materials explaining confidence intervals in people with relapsing-remitting multiple sclerosis. Further, a questionnaire on comprehension of confidence intervals was developed and piloted. We developed different patient information versions aiming to explain confidence intervals. We used an illustrative example to test three different approaches: (1) short version, (2) "average weight" version and (3) "worm prophylaxis" version. Interviews were conducted using think-aloud and teach-back approaches to test feasibility and analysed using qualitative content analysis. To assess comprehension of confidence intervals, a six-item multiple choice questionnaire was developed and tested in a pilot randomised controlled trial using the online survey software UNIPARK. Here, the average weight version (intervention group) was tested against a standard patient information version on confidence intervals (control group). People with multiple sclerosis were invited to take part using existing mailing-lists of people with multiple sclerosis in Germany and were randomised using the UNIPARK algorithm. Participants were blinded towards group allocation. Primary endpoint was comprehension of confidence intervals, assessed with the six-item multiple choice questionnaire with six points representing perfect knowledge. Feasibility of the patient information versions was tested with 16 people with multiple sclerosis. For the pilot randomised controlled trial, 64 people with multiple sclerosis were randomised (intervention group: n = 36; control group: n = 28). More questions were answered correctly in the intervention group compared to the control group (mean 4.8 vs 3.8, mean difference 1.1 (95 % CI 0.42-1.69), p = 0.002). The questionnaire's internal consistency was moderate (Cronbach's alpha = 0.56). The pilot-phase shows promising results concerning acceptability and feasibility. Pilot randomised controlled trial results indicate that the patient information is well understood and that knowledge gain on confidence intervals can be assessed with a set of six questions. German Clinical Trials Register: DRKS00008561 . Registered 8th of June 2015.

Industrial Arts Test Development, Book III. Resource Items for Graphics Technology, Power Technology, Production Technology.

ERIC Educational Resources Information Center

New York State Education Dept., Albany.

This booklet is designed to assist teachers in developing examinations for classroom use. It is a collection of 955 objective test questions, mostly multiple choice, for industrial arts students in the three areas of graphics technology, power technology, and production technology. Scoring keys are provided. There are no copyright restrictions,…
Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.

ERIC Educational Resources Information Center

Oosterhof, Albert C.; Coats, Pamela K.

Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…
A Stratified Study of Students' Understanding of Basic Optics Concepts in Different Contexts Using Two-Tier Multiple-Choice Items

ERIC Educational Resources Information Center

Chu, Hye-Eun; Treagust, David F.; Chandrasegaran, A. L.

2009-01-01

A large scale study involving 1786 year 7-10 Korean students from three school districts in Seoul was undertaken to evaluate their understanding of basic optics concepts using a two-tier multiple-choice diagnostic instrument consisting of four pairs of items, each of which evaluated the same concept in two different contexts. The instrument, which…
Sustainable Assessment for Large Science Classes: Non-Multiple Choice, Randomised Assignments through a Learning Management System

ERIC Educational Resources Information Center

Schultz, Madeleine

2011-01-01

This paper reports on the development of a tool that generates randomised, non-multiple choice assessment within the BlackBoard Learning Management System interface. An accepted weakness of multiple-choice assessment is that it cannot elicit learning outcomes from upper levels of Biggs' SOLO taxonomy. However, written assessment items require…
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Constructing Multiple-Choice Items to Measure Higher-Order Thinking

ERIC Educational Resources Information Center

Scully, Darina

2017-01-01

Across education, certification and licensure, there are repeated calls for the development of assessments that target "higher-order thinking," as opposed to mere recall of facts. A common assumption is that this necessitates the use of constructed response or essay-style test questions; however, empirical evidence suggests that this may…
Nuclear Energy Assessment Battery. Form C.

ERIC Educational Resources Information Center

Showers, Dennis Edward

This publication consists of a nuclear energy assessment battery for secondary level students. The test contains 44 multiple choice items and is organized into four major sections. Parts include: (1) a knowledge scale; (2) attitudes toward nuclear energy; (3) a behaviors and intentions scale; and (4) an anxiety scale. Directions are provided for…
An Empirical Comparison of Five Linear Equating Methods for the NEAT Design

ERIC Educational Resources Information Center

Suh, Youngsuk; Mroch, Andrew A.; Kane, Michael T.; Ripkey, Douglas R.

2009-01-01

In this study, a data base containing the responses of 40,000 candidates to 90 multiple-choice questions was used to mimic data sets for 50-item tests under the "nonequivalent groups with anchor test" (NEAT) design. Using these smaller data sets, we evaluated the performance of five linear equating methods for the NEAT design with five levels of…
Using Two-Tier Test to Identify Primary Students' Conceptual Understanding and Alternative Conceptions in Acid Base

ERIC Educational Resources Information Center

Bayrak, Beyza Karadeniz

2013-01-01

The purpose of this study was to identify primary students' conceptual understanding and alternative conceptions in acid-base. For this reason, a 15 items two-tier multiple choice test administered 56 eighth grade students in spring semester 2009-2010. Data for this study were collected using a conceptual understanding scale prepared to include…
catcher: A Software Program to Detect Answer Copying in Multiple-Choice Tests Based on Nominal Response Model

ERIC Educational Resources Information Center

Kalender, Ilker

2012-01-01

catcher is a software program designed to compute the [omega] index, a common statistical index for the identification of collusions (cheating) among examinees taking an educational or psychological test. It requires (a) responses and (b) ability estimations of individuals, and (c) item parameters to make computations and outputs the results of…
Computerized Classification Testing under the One-Parameter Logistic Response Model with Ability-Based Guessing

ERIC Educational Resources Information Center

Wang, Wen-Chung; Huang, Sheng-Yun

2011-01-01

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…
A Normalized Direct Approach for Estimating the Parameters of the Normal Ogive Three-Parameter Model for Ability Tests.

ERIC Educational Resources Information Center

Gugel, John F.

A new method for estimating the parameters of the normal ogive three-parameter model for multiple-choice test items--the normalized direct (NDIR) procedure--is examined. The procedure is compared to a more commonly used estimation procedure, Lord's LOGIST, using computer simulations. The NDIR procedure uses the normalized (mid-percentile)…
Evaluation of an Intervention Instructional Program to Facilitate Understanding of Basic Particle Concepts among Students Enrolled in Several Levels of Study

ERIC Educational Resources Information Center

Treagust, David F.; Chandrasegaran, A. L.; Zain, Ahmad N. M.; Ong, Eng Tek; Karpudewan, Mageswary; Halim, Lilia

2011-01-01

The efficacy of an intervention instructional program was evaluated to facilitate understanding of particle theory concepts among students (N = 190) using a diagnostic instrument consisting of eleven two-tier multiple-choice items in a pre-test--post-test design. The students involved were high school students, undergraduates and postgraduates…
Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university.

PubMed

Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A

2018-01-01

Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
Calibrating the Medical Council of Canada's Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs.

PubMed

De Champlain, Andre F; Boulais, Andre-Philippe; Dallas, Andrew

2016-01-01

The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory. Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
An Examination of the Perceived Importance of Technical Competence in Acquisition Project Management

DTIC Science & Technology

1991-09-01

Develop (First Draft) Instructions Critique (Revision) Answerability Pilot Test (Second Draft) Analysis Response Mode Revision Useability Preparation...appropriate questionnaire items. Initially, the set of questions developed for the study reflected a few shortcomings. A pilot test of the first draft among...resulted. First, feedback from the pilot test indicated a need to reduce the completion time. Because the multiple choice format required several
The results of STEM education methods for enhancing critical thinking and problem solving skill in physics the 10th grade level

NASA Astrophysics Data System (ADS)

Soros, P.; Ponkham, K.; Ekkapim, S.

2018-01-01

This research aimed to: 1) compare the critical think and problem solving skills before and after learning using STEM Education plan, 2) compare student achievement before and after learning about force and laws of motion using STEM Education plan, and 3) the satisfaction of learning by using STEM Education. The sample used were 37 students from grade 10 at Borabu School, Borabu District, Mahasarakham Province, semester 2, Academic year 2016. Tools used in this study consist of: 1) STEM Education plan about the force and laws of motion for grade 10 students of 1 schemes with total of 14 hours, 2) The test of critical think and problem solving skills with multiple-choice type of 5 options and 2 option of 30 items, 3) achievement test on force and laws of motion with multiple-choice of 4 options of 30 items, 4) satisfaction learning with 5 Rating Scale of 20 items. The statistics used in data analysis were percentage, mean, standard deviation, and t-test (Dependent). The results showed that 1) The student with learning using STEM Education plan have score of critical think and problem solving skills on post-test higher than pre-test with statistically significant level .01. 2) The student with learning using STEM Education plan have achievement score on post-test higher than pre-test with statistically significant level of .01. 3) The student'level of satisfaction toward the learning by using STEM Education plan was at a high level (X ¯ = 4.51, S.D=0.56).
Testing primary-school children's understanding of the nature of science.

PubMed

Koerber, Susanne; Osterhaus, Christopher; Sodian, Beate

2015-03-01

Understanding the nature of science (NOS) is a critical aspect of scientific reasoning, yet few studies have investigated its developmental beginnings and initial structure. One contributing reason is the lack of an adequate instrument. Two studies assessed NOS understanding among third graders using a multiple-select (MS) paper-and-pencil test. Study 1 investigated the validity of the MS test by presenting the items to 68 third graders (9-year-olds) and subsequently interviewing them on their underlying NOS conception of the items. All items were significantly related between formats, indicating that the test was valid. Study 2 applied the same instrument to a larger sample of 243 third graders, and their performance was compared to a multiple-choice (MC) version of the test. Although the MC format inflated the guessing probability, there was a significant relation between the two formats. In summary, the MS format was a valid method revealing third graders' NOS understanding, thereby representing an economical test instrument. A latent class analysis identified three groups of children with expertise in qualitatively different aspects of NOS, suggesting that there is not a single common starting point for the development of NOS understanding; instead, multiple developmental pathways may exist. © 2014 The British Psychological Society.
Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

ERIC Educational Resources Information Center

Andrich, David; Marais, Ida; Humphry, Stephen Mark

2016-01-01

Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…
Definite Integral Automatic Analysis Mechanism Research and Development Using the "Find the Area by Integration" Unit as an Example

ERIC Educational Resources Information Center

Ting, Mu Yu

2017-01-01

Using the capabilities of expert knowledge structures, the researcher prepared test questions on the university calculus topic of "finding the area by integration." The quiz is divided into two types of multiple choice items (one out of four and one out of many). After the calculus course was taught and tested, the results revealed that…

A Study of Students' Readiness to Learn Calculus

ERIC Educational Resources Information Center

Carlson, Marilyn P.; Madison, Bernard; West, Richard D.

2015-01-01

The Calculus Concept Readiness (CCR) instrument assesses foundational understandings and reasoning abilities that have been documented to be essential for learning calculus. The CCR Taxonomy describes the understandings and reasoning abilities assessed by CCR. The CCR is a 25-item multiple-choice instrument that can be used as a placement test for…
The Influence of Distractor Strength and Response Order on MCQ Responding

ERIC Educational Resources Information Center

Kiat, John Emmanuel; Ong, Ai Rene; Ganesan, Asha

2018-01-01

Multiple-choice questions (MCQs) play a key role in standardised testing and in-class assessment. Research into the influence of within-item response order on MCQ characteristics has been mixed. While some researchers have shown preferential selection of response options presented earlier in the answer list, others have failed to replicate these…
Applying Item Response Theory Methods to Examine the Impact of Different Response Formats

ERIC Educational Resources Information Center

Hohensinn, Christine; Kubinger, Klaus D.

2011-01-01

In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…
An Introduction to Multilinear Formula Score Theory. Measurement Series 84-4.

ERIC Educational Resources Information Center

Levine, Michael V.

Formula score theory (FST) associates each multiple choice test with a linear operator and expresses all of the real functions of item response theory as linear combinations of the operator's eigenfunctions. Hard measurement problems can then often be reformulated as easier, standard mathematical problems. For example, the problem of estimating…
Progress Monitoring in Grade 5 Science for Low Achievers

ERIC Educational Resources Information Center

Vannest, Kimberly J.; Parker, Richard; Dyer, Nicole

2011-01-01

This article presents procedures and results from a 2-year project developing science key vocabulary (KV) short tests suitable for progress monitoring Grade 5 science in Texas public schools using computer-generated, -administered, and -scored assessments. KV items included KV definitions and important usages in a multiple-choice cloze format. A…
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.

PubMed

Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R

2018-05-01

In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Cognitive task analysis for teaching technical skills in an inanimate surgical skills laboratory.

PubMed

Velmahos, George C; Toutouzas, Konstantinos G; Sillin, Lelan F; Chan, Linda; Clark, Richard E; Theodorou, Demetrios; Maupin, Fredric

2004-01-01

The teaching of surgical skills is based mostly on the traditional "see one, do one, teach one" resident-to-resident method. Surgical skills laboratories provide a new environment for teaching skills but their effectiveness has not been adequately tested. Cognitive task analysis is an innovative method to teach skills, used successfully in nonmedical fields. The objective of this study is to evaluate the effectiveness of a 3-hour surgical skills laboratory course on central venous catheterization (CVC), taught by the principles of cognitive task analysis to surgical interns. Upon arrival to the Department of Surgery, 26 new interns were randomized to either receive a surgical skills laboratory course on CVC ("course" group, n = 12) or not ("traditional" group, n = 14). The course consisted mostly of hands-on training on inanimate CVC models. All interns took a 15-item multiple-choice question test on CVC at the beginning of the study. Within two and a half months all interns performed CVC on critically ill patients. The outcome measures were cognitive knowledge and technical-skill competence on CVC. These outcomes were assessed by a 14-item checklist evaluating the interns while performing CVC on a patient and by the 15-item multiple-choice-question test, which was repeated at that time. There were no differences between the two groups in the background characteristics of the interns or the patients having CVC. The scores at the initial multiple-choice test were similar (course: 7.33 +/- 1.07, traditional: 8 +/- 2.15, P = 0.944). However, the course interns scored significantly higher in the repeat test compared with the traditional interns (11 +/- 1.86 versus 8.64 +/- 1.82, P = 0.03). Also, the course interns achieved a higher score on the 14-item checklist (12.6 +/- 1.1 versus 7.5 +/- 2.2, P <0.001). They required fewer attempts to find the vein (3.3 +/- 2.2 versus 6.4 +/- 4.2, P = 0.046) and showed a trend toward less time to complete the procedure (15.4 +/- 9.5 versus 20.6 +/- 9.1 minutes, P = 0.149). A surgical skills laboratory course on CVC, taught by the principles of cognitive task analysis and using inanimate models, improves the knowledge and technical skills of new surgical interns on this task.
Evaluation of the flipped classroom approach in a veterinary professional skills course

PubMed Central

Moffett, Jenny; Mill, Aileen C

2014-01-01

Background The flipped classroom is an educational approach that has had much recent coverage in the literature. Relatively few studies, however, use objective assessment of student performance to measure the impact of the flipped classroom on learning. The purpose of this study was to evaluate the use of a flipped classroom approach within a medical education setting to the first two levels of Kirkpatrick and Kirkpatrick’s effectiveness of training framework. Methods This study examined the use of a flipped classroom approach within a professional skills course offered to postgraduate veterinary students. A questionnaire was administered to two cohorts of students: those who had completed a traditional, lecture-based version of the course (Introduction to Veterinary Medicine [IVM]) and those who had completed a flipped classroom version (Veterinary Professional Foundations I [VPF I]). The academic performance of students within both cohorts was assessed using a set of multiple-choice items (n=24) nested within a written examination. Data obtained from the questionnaire were analyzed using Cronbach’s alpha, Kruskal–Wallis tests, and factor analysis. Data obtained from student performance in the written examination were analyzed using the nonparametric Wilcoxon rank sum test. Results A total of 133 IVM students and 64 VPF I students (n=197) agreed to take part in the study. Overall, study participants favored the flipped classroom approach over the traditional classroom approach. With respect to student academic performance, the traditional classroom students outperformed the flipped classroom students on a series of multiple-choice items (IVM mean =21.4±1.48 standard deviation; VPF I mean =20.25±2.20 standard deviation; Wilcoxon test, w=7,578; P<0.001). Conclusion This study demonstrates that learners seem to prefer a flipped classroom approach. The flipped classroom was rated more positively than the traditional classroom on many different characteristics. This preference, however, did not translate into improved student performance, as assessed by a series of multiple-choice items delivered during a written examination. PMID:25419164
Evaluation of the flipped classroom approach in a veterinary professional skills course.

PubMed

Moffett, Jenny; Mill, Aileen C

2014-01-01

The flipped classroom is an educational approach that has had much recent coverage in the literature. Relatively few studies, however, use objective assessment of student performance to measure the impact of the flipped classroom on learning. The purpose of this study was to evaluate the use of a flipped classroom approach within a medical education setting to the first two levels of Kirkpatrick and Kirkpatrick's effectiveness of training framework. This study examined the use of a flipped classroom approach within a professional skills course offered to postgraduate veterinary students. A questionnaire was administered to two cohorts of students: those who had completed a traditional, lecture-based version of the course (Introduction to Veterinary Medicine [IVM]) and those who had completed a flipped classroom version (Veterinary Professional Foundations I [VPF I]). The academic performance of students within both cohorts was assessed using a set of multiple-choice items (n=24) nested within a written examination. Data obtained from the questionnaire were analyzed using Cronbach's alpha, Kruskal-Wallis tests, and factor analysis. Data obtained from student performance in the written examination were analyzed using the nonparametric Wilcoxon rank sum test. A total of 133 IVM students and 64 VPF I students (n=197) agreed to take part in the study. Overall, study participants favored the flipped classroom approach over the traditional classroom approach. With respect to student academic performance, the traditional classroom students outperformed the flipped classroom students on a series of multiple-choice items (IVM mean =21.4±1.48 standard deviation; VPF I mean =20.25±2.20 standard deviation; Wilcoxon test, w=7,578; P<0.001). This study demonstrates that learners seem to prefer a flipped classroom approach. The flipped classroom was rated more positively than the traditional classroom on many different characteristics. This preference, however, did not translate into improved student performance, as assessed by a series of multiple-choice items delivered during a written examination.
A New Family of Models for the Multiple-Choice Item.

DTIC Science & Technology

1979-12-19

analysis of the verbal scholastic aptitude test using Birnhaum’s three-parameter logistic model. Educational and Psychological Measurement, 28, 989-1020...16. [8] McBride, J. R. Some properties of a Bayesian adaptive ability testing strategy. Applied Psychological Measurement, 1, 121-140, 1977. [9...University of Michigan Ann Arbor, MI 48106 ’~KL -137- Non Govt Mon Govt 1 Dr. Earl Hunt 1 Dr. Frederick N. Lord Dept. of Psychology Educational Testing
Developing an Array Binary Code Assessment Rubric for Multiple- Choice Questions Using Item Arrays and Binary-Coded Responses

ERIC Educational Resources Information Center

Haro, Elizabeth K.; Haro, Luis S.

2014-01-01

The multiple-choice question (MCQ) is the foundation of knowledge assessment in K-12, higher education, and standardized entrance exams (including the GRE, MCAT, and DAT). However, standard MCQ exams are limited with respect to the types of questions that can be asked when there are only five choices. MCQs offering additional choices more…
An item response curves analysis of the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.

2012-09-01

Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.
Trends in computer applications in science assessment

NASA Astrophysics Data System (ADS)

Kumar, David D.; Helgeson, Stanley L.

1995-03-01

Seven computer applications to science assessment are reviewed. Conventional test administration includes record keeping, grading, and managing test banks. Multiple-choice testing involves forced selection of an answer from a menu, whereas constructed-response testing involves options for students to present their answers within a set standard deviation. Adaptive testing attempts to individualize the test to minimize the number of items and time needed to assess a student's knowledge. Figurai response testing assesses science proficiency in pictorial or graphic mode and requires the student to construct a mental image rather than selecting a response from a multiple choice menu. Simulations have been found useful for performance assessment on a large-scale basis in part because they make it possible to independently specify different aspects of a real experiment. An emerging approach to performance assessment is solution pathway analysis, which permits the analysis of the steps a student takes in solving a problem. Virtually all computer-based testing systems improve the quality and efficiency of record keeping and data analysis.
An Empirical Test of a Strategy for Training Examinees in the Use of Partial Information in Taking Multiple Choice Tests.

ERIC Educational Resources Information Center

Bliss, Leonard B.

The aim of this study was to show that the superiority of corrected-for-guessing scores over number right scores as true score estimates depends on the ability of examinees to recognize situations where they can eliminate one or more alternatives as incorrect and to omit items where they would only be guessing randomly. Previous investigations…
Attentional priority determines working memory precision.

PubMed

Klyszejko, Zuzanna; Rahmati, Masih; Curtis, Clayton E

2014-12-01

Visual working memory is a system used to hold information actively in mind for a limited time. The number of items and the precision with which we can store information has limits that define its capacity. How much control do we have over the precision with which we store information when faced with these severe capacity limitations? Here, we tested the hypothesis that rank-ordered attentional priority determines the precision of multiple working memory representations. We conducted two psychophysical experiments that manipulated the priority of multiple items in a two-alternative forced choice task (2AFC) with distance discrimination. In Experiment 1, we varied the probabilities with which memorized items were likely to be tested. To generalize the effects of priority beyond simple cueing, in Experiment 2, we manipulated priority by varying monetary incentives contingent upon successful memory for items tested. Moreover, we illustrate our hypothesis using a simple model that distributed attentional resources across items with rank-ordered priorities. Indeed, we found evidence in both experiments that priority affects the precision of working memory in a monotonic fashion. Our results demonstrate that representations of priority may provide a mechanism by which resources can be allocated to increase the precision with which we encode and briefly store information. Copyright © 2014 Elsevier Ltd. All rights reserved.
Test of understanding of vectors: A reliable multiple-choice vector concept test

NASA Astrophysics Data System (ADS)

Barniol, Pablo; Zavala, Genaro

2014-06-01

In this article we discuss the findings of our research on students' understanding of vector concepts in problems without physical context. First, we develop a complete taxonomy of the most frequent errors made by university students when learning vector concepts. This study is based on the results of several test administrations of open-ended problems in which a total of 2067 students participated. Using this taxonomy, we then designed a 20-item multiple-choice test [Test of understanding of vectors (TUV)] and administered it in English to 423 students who were completing the required sequence of introductory physics courses at a large private Mexican university. We evaluated the test's content validity, reliability, and discriminatory power. The results indicate that the TUV is a reliable assessment tool. We also conducted a detailed analysis of the students' understanding of the vector concepts evaluated in the test. The TUV is included in the Supplemental Material as a resource for other researchers studying vector learning, as well as instructors teaching the material.
A Critical Analysis of the Body of Work Method for Setting Cut-Scores

ERIC Educational Resources Information Center

Radwan, Nizam; Rogers, W. Todd

2006-01-01

The recent increase in the use of constructed-response items in educational assessment and the dissatisfaction with the nature of the decision that the judges must make using traditional standard-setting methods created a need to develop new and effective standard-setting procedures for tests that include both multiple-choice and…
Influence of Particle Theory Conceptions on Pre-Service Science Teachers' Understanding of Osmosis and Diffusion

ERIC Educational Resources Information Center

AlHarbi, Nawaf N. S.; Treagust, David F.; Chandrasegaran, A. L.; Won, Mihye

2015-01-01

This study investigated the understanding of diffusion, osmosis and particle theory of matter concepts among 192 pre-service science teachers in Saudi Arabia using a 17-item two-tier multiple-choice diagnostic test. The data analysis showed that the pre-service teachers' understanding of osmosis and diffusion concepts was mildly correlated with…
Development and Evaluation of a Questionnaire to Assess Physical Educators' Knowledge of Student Assessment

ERIC Educational Resources Information Center

Emmanouilidou, Kyriaki; Derri, Vassiliki; Aggelousis, Nicolaos; Vassiliadou, Olga

2012-01-01

The purpose of this pilot study was to develop and evaluate an instrument for measuring Greek elementary physical educators' knowledge of student assessment. A multiple-choice questionnaire comprised of items about concepts, methods, tools, and types of student assessment in physical education was designed and tested. The initial 35-item…
Determination of Students' Alternative Conceptions about Chemical Equilibrium: A Review of Research and the Case of Turkey

ERIC Educational Resources Information Center

Ozmen, Haluk

2008-01-01

This study aims to determine prospective science student teachers' alternative conceptions of the chemical equilibrium concept. A 13-item pencil and paper, two-tier multiple choice diagnostic instrument, the Test to Identify Students' Alternative Conceptions (TISAC), was developed and administered to 90 second-semester science student teachers…

High confidence in falsely recognizing prototypical faces.

PubMed

Sampaio, Cristina; Reinke, Victoria; Mathews, Jeffrey; Swart, Alexandra; Wallinger, Stephen

2018-06-01

We applied a metacognitive approach to investigate confidence in recognition of prototypical faces. Participants were presented with sets of faces constructed digitally as deviations from prototype/base faces. Participants were then tested with a simple recognition task (Experiment 1) or a multiple-choice task (Experiment 2) for old and new items plus new prototypes, and they showed a high rate of confident false alarms to the prototypes. Confidence and accuracy relationship in this face recognition paradigm was found to be positive for standard items but negative for the prototypes; thus, it was contingent on the nature of the items used. The data have implications for lineups that employ match-to-suspect strategies.
State Test Programs Mushroom as NCLB Mandate Kicks in: Nearly Half of States Are Expanding Their Testing Programs to Additional Grades This School Year to Comply with the Federal No Child Left Behind Act

ERIC Educational Resources Information Center

Olson, Lynn

2005-01-01

Twenty-three states are expanding their testing programs to additional grades this school year to comply with the federal No Child Left Behind Act. In devising the new tests, most states have defied predictions and chosen to go beyond multiple-choice items, by including questions that ask students to construct their own responses. But many state…
Should essays and other "open-ended"-type questions retain a place in written summative assessment in clinical medicine?

PubMed

Hift, Richard J

2014-11-28

Written assessments fall into two classes: constructed-response or open-ended questions, such as the essay and a number of variants of the short-answer question, and selected-response or closed-ended questions; typically in the form of multiple-choice. It is widely believed that constructed response written questions test higher order cognitive processes in a manner that multiple-choice questions cannot, and consequently have higher validity. An extensive review of the literature suggests that in summative assessment neither premise is evidence-based. Well-structured open-ended and multiple-choice questions appear equivalent in their ability to assess higher cognitive functions, and performance in multiple-choice assessments may correlate more highly than the open-ended format with competence demonstrated in clinical practice following graduation. Studies of construct validity suggest that both formats measure essentially the same dimension, at least in mathematics, the physical sciences, biology and medicine. The persistence of the open-ended format in summative assessment may be due to the intuitive appeal of the belief that synthesising an answer to an open-ended question must be both more cognitively taxing and similar to actual experience than is selecting a correct response. I suggest that cognitive-constructivist learning theory would predict that a well-constructed context-rich multiple-choice item represents a complex problem-solving exercise which activates a sequence of cognitive processes which closely parallel those required in clinical practice, hence explaining the high validity of the multiple-choice format. The evidence does not support the proposition that the open-ended assessment format is superior to the multiple-choice format, at least in exit-level summative assessment, in terms of either its ability to test higher-order cognitive functioning or its validity. This is explicable using a theory of mental models, which might predict that the multiple-choice format will have higher validity, a statement for which some empiric support exists. Given the superior reliability and cost-effectiveness of the multiple-choice format consideration should be given to phasing out open-ended format questions in summative assessment. Whether the same applies to non-exit-level assessment and formative assessment is a question which remains to be answered; particularly in terms of the educational effect of testing, an area which deserves intensive study.
Will a Short Training Session Improve Multiple-Choice Item-Writing Quality by Dental School Faculty? A Pilot Study.

PubMed

Dellinges, Mark A; Curtis, Donald A

2017-08-01

Faculty members are expected to write high-quality multiple-choice questions (MCQs) in order to accurately assess dental students' achievement. However, most dental school faculty members are not trained to write MCQs. Extensive faculty development programs have been used to help educators write better test items. The aim of this pilot study was to determine if a short workshop would result in improved MCQ item-writing by dental school faculty at one U.S. dental school. A total of 24 dental school faculty members who had previously written MCQs were randomized into a no-intervention group and an intervention group in 2015. Six previously written MCQs were randomly selected from each of the faculty members and given an item quality score. The intervention group participated in a training session of one-hour duration that focused on reviewing standard item-writing guidelines to improve in-house MCQs. The no-intervention group did not receive any training but did receive encouragement and an explanation of why good MCQ writing was important. The faculty members were then asked to revise their previously written questions, and these were given an item quality score. The item quality scores for each faculty member were averaged, and the difference from pre-training to post-training scores was evaluated. The results showed a significant difference between pre-training and post-training MCQ difference scores for the intervention group (p=0.04). This pilot study provides evidence that the training session of short duration was effective in improving the quality of in-house MCQs.
Influence of Context on Item Parameters in Forced-Choice Personality Assessments

ERIC Educational Resources Information Center

Lin, Yin; Brown, Anna

2017-01-01

A fundamental assumption in computerized adaptive testing is that item parameters are invariant with respect to context--items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the…
Improving Measures via Examining the Behavior of Distractors in Multiple-Choice Tests

PubMed Central

Sideridis, Georgios; Tsaousis, Ioannis; Al Harbi, Khaleel

2017-01-01

The purpose of the present article was to illustrate, using an example from a national assessment, the value from analyzing the behavior of distractors in measures that engage the multiple-choice format. A secondary purpose of the present article was to illustrate four remedial actions that can potentially improve the measurement of the construct(s) under study. Participants were 2,248 individuals who took a national examination of chemistry. The behavior of the distractors was analyzed by modeling their behavior within the Rasch model. Potentially informative distractors were (a) further modeled using the partial credit model, (b) split onto separate items and retested for model fit and parsimony, (c) combined to form a “super” item or testlet, and (d) reexamined after deleting low-ability individuals who likely guessed on those informative, albeit erroneous, distractors. Results indicated that all but the item split strategies were associated with better model fit compared with the original model. The best fitted model, however, involved modeling and crediting informative distractors via the partial credit model or eliminating the responses of low-ability individuals who likely guessed on informative distractors. The implications, advantages, and disadvantages of modeling informative distractors for measurement purposes are discussed. PMID:29795904
An Exploratory Study of the Relationships between Reported Imagery and the Comprehension and Recall of a Story in Fifth Graders. Instructional Research Laboratory Technical Paper # R82007.

ERIC Educational Resources Information Center

Sadoski, Mark C.

A study investigated the role of visual imagery in the comprehension and retention of prose. Subjects were 48 fifth grade students who orally read a story and then completed three comprehension tasks directly related to the story: a retelling, an oral reading cloze test, and a multiple choice question test comprised of items demonstrated to be…
Guide to Developing High-Quality, Reliable, and Valid Multiple-Choice Assessments

ERIC Educational Resources Information Center

Towns, Marcy H.

2014-01-01

Chemistry faculty members are highly skilled in obtaining, analyzing, and interpreting physical measurements, but often they are less skilled in measuring student learning. This work provides guidance for chemistry faculty from the research literature on multiple-choice item development in chemistry. Areas covered include content, stem, and…
Clientele Recognition of Library Terms and Concepts Used by Librarians: A Case of an Academic Library in the Philippines

ERIC Educational Resources Information Center

Cana, Mercy B.; Cueto, Quiza Lynn Grace G.; De Guzman, Allan B.; Fuchigami, Kaori B.; Manalo, Leona Rica T.; Yu, Jake Cathleen U.

2005-01-01

Using a 30-item multiple-choice type test, this investigation focused on the ability of college students to recognise terms and concepts used by librarians. A total of 447 respondents representing the fields of Education, Nutrition, Food Technology, Tourism and Hotel and Restaurant Management took part in this investigation. Data were gathered…
Development of a Microcomputer-Based Adaptive Testing System. Phase I. Specification of Requirements and Preliminary Design.

DTIC Science & Technology

1982-06-30

treatments, and cure (or kill ) a patient. Administratively, the items were in a multiple-choice format and the simulation proceeded by branching...Discs: dual 5 1/4 inch floppies (IM) Bus: N/A Operating System: CP/M, MmmOST Price: $3,495 -14 ~-174- - ’i~ Model 820 Xerox 1341 West Mockingbird Lane
Getting Lucky: How Guessing Threatens the Validity of Performance Classifications

ERIC Educational Resources Information Center

Foley, Brett P.

2016-01-01

There is always a chance that examinees will answer multiple choice (MC) items correctly by guessing. Design choices in some modern exams have created situations where guessing at random through the full exam--rather than only for a subset of items where the examinee does not know the answer--can be an effective strategy to pass the exam. This…
The establisment of an achievement test for determination of primary teachers’ knowledge level of earthquake

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aydin, Süleyman, E-mail: yupul@hotmail.com; Haşiloğlu, M. Akif, E-mail: mehmet.hasiloglu@hotmail.com; Kunduraci, Ayşe, E-mail: ayse-kndrc@hotmail.com

In this study it was aimed to improve an academic achievement test to establish the students’ knowledge about the earthquake and the ways of protection from earthquakes. In the method of this study, the steps that Webb (1994) was created to improve an academic achievement test for a unit were followed. In the developmental process of multiple choice test having 25 questions, was prepared to measure the pre-service teachers’ knowledge levels about the earthquake and the ways of protection from earthquakes. The multiple choice test was presented to view of six academics (one of them was from geographic field andmore » five of them were science educator) and two expert teachers in science Prepared test was applied to 93 pre-service teachers studying in elementary education department in 2014-2015 academic years. As a result of validity and reliability of the study, the test was composed of 20 items. As a result of these applications, Pearson Moments Multiplication half-reliability coefficient was found to be 0.94. When this value is adjusted according to Spearman Brown reliability coefficient the reliability coefficient was set at 0.97.« less
The development of a computer assisted instruction and assessment system in pharmacology.

PubMed

Madsen, B W; Bell, R C

1977-01-01

We describe the construction of a computer based system for instruction and assessment in pharmacology, utilizing a large bank of multiple choice questions. Items were collected from many sources, edited and coded for student suitability, topic, taxonomy and difficulty and text references. Students reserve a time during the day, specify the type of test desired and questions are presented randomly from the subset satisfying their criteria. Answers are scored after each question and a summary given at the end of every test; details on item performance are recorded automatically. The biggest hurdle in implementation was the assembly, review, classification and editing of items, while the programming was relatively straight-forward. A number of modifications had to be made to the initial plans and changes will undoubtedly continue with further experience. When fully operational the system will possess a number of advantages including: elimination of test preparation, editing and marking; facilitated item review opportunities; increased objectivity, feedback, flexibility and descreased anxiety in students.
The Applicability of Interactive Item Templates in Varied Knowledge Types

ERIC Educational Resources Information Center

Koong, Chorng-Shiuh; Wu, Chi-Ying

2011-01-01

A well-edited assessment can enhance student's learning motives. Applicability of items, which includes item content and template, plays a crucial role in authoring a good assessment. Templates in discussion contain not only conventional true & false, multiple choice, completion item and short answer but also of those interactive ones. Methods…
Diagnostic Opportunities Using Rasch Measurement in the Context of a Misconceptions-Based Physical Science Assessment

ERIC Educational Resources Information Center

Wind, Stefanie A.; Gale, Jessica D.

2015-01-01

Multiple-choice (MC) items that are constructed such that distractors target known misconceptions for a particular domain provide useful diagnostic information about student misconceptions (Herrmann-Abell & DeBoer, 2011, 2014; Sadler, 1998). Item response theory models can be used to examine misconceptions distractor-driven multiple-choice…
Preference index supported by motivation tests in Nile tilapia

PubMed Central

2017-01-01

The identification of animal preferences is assumed to provide better rearing environments for the animals in question. Preference tests focus on the frequency of approaches or the time an animal spends in proximity to each item of the investigated resource during a multiple-choice trial. Recently, a preference index (PI) was proposed to differentiate animal preferences from momentary responses (Sci Rep, 2016, 6:28328, DOI: 10.1038/srep28328). This index also quantifies the degree of preference for each item. Each choice response is also weighted, with the most recent responses weighted more heavily, but the index includes the entire bank of tests, and thus represents a history-based approach. In this study, we compared this PI to motivation tests, which consider how much effort is expended to access a resource. We performed choice tests over 7 consecutive days for 34 Nile tilapia fish that presented with different colored compartments in each test. We first detected the preferred and non-preferred colors of each fish using the PI and then tested their motivation to reach these compartments. We found that fish preferences varied individually, but the results were consistent with the motivation profiles, as individual fish were more motivated (the number of touches made on transparent, hinged doors that prevented access to the resource) to access their preferred items. On average, most of the 34 fish avoided the color yellow and showed less motivation to reach yellow and red colors. The fish also exhibited greater motivation to access blue and green colors (the most preferred colors). These results corroborate the PI as a reliable tool for the identification of animal preferences. We recommend this index to animal keepers and researchers to identify an animal’s preferred conditions. PMID:28426689
Preference index supported by motivation tests in Nile tilapia.

PubMed

Maia, Caroline Marques; Volpato, Gilson Luiz

2017-01-01

The identification of animal preferences is assumed to provide better rearing environments for the animals in question. Preference tests focus on the frequency of approaches or the time an animal spends in proximity to each item of the investigated resource during a multiple-choice trial. Recently, a preference index (PI) was proposed to differentiate animal preferences from momentary responses (Sci Rep, 2016, 6:28328, DOI: 10.1038/srep28328). This index also quantifies the degree of preference for each item. Each choice response is also weighted, with the most recent responses weighted more heavily, but the index includes the entire bank of tests, and thus represents a history-based approach. In this study, we compared this PI to motivation tests, which consider how much effort is expended to access a resource. We performed choice tests over 7 consecutive days for 34 Nile tilapia fish that presented with different colored compartments in each test. We first detected the preferred and non-preferred colors of each fish using the PI and then tested their motivation to reach these compartments. We found that fish preferences varied individually, but the results were consistent with the motivation profiles, as individual fish were more motivated (the number of touches made on transparent, hinged doors that prevented access to the resource) to access their preferred items. On average, most of the 34 fish avoided the color yellow and showed less motivation to reach yellow and red colors. The fish also exhibited greater motivation to access blue and green colors (the most preferred colors). These results corroborate the PI as a reliable tool for the identification of animal preferences. We recommend this index to animal keepers and researchers to identify an animal's preferred conditions.
The role of unconscious memory errors in judgments of confidence for sentence recognition.

PubMed

Sampaio, Cristina; Brewer, William F

2009-03-01

The present experiment tested the hypothesis that unconscious reconstructive memory processing can lead to the breakdown of the relationship between memory confidence and memory accuracy. Participants heard deceptive schema-inference sentences and nondeceptive sentences and were tested with either simple or forced-choice recognition. The nondeceptive items showed a positive relation between confidence and accuracy in both simple and forced-choice recognition. However, the deceptive items showed a strong negative confidence/accuracy relationship in simple recognition and a low positive relationship in forced choice. The mean levels of confidence for erroneous responses for deceptive items were inappropriately high in simple recognition but lower in forced choice. These results suggest that unconscious reconstructive memory processes involved in memory for the deceptive schema-inference items led to inaccurate confidence judgments and that, when participants were made aware of the deceptive nature of the schema-inference items through the use of a forced-choice procedure, they adjusted their confidence accordingly.
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.

PubMed

Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen

2004-10-01

To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
Item Reliabilities for a Family of Answer-Until-Correct (AUC) Scoring Rules.

ERIC Educational Resources Information Center

Kane, Michael T.; Moloney, James M.

The Answer-Until-Correct (AUC) procedure has been proposed in order to increase the reliability of multiple-choice items. A model for examinees' behavior when they must respond to each item until they answer it correctly is presented. An expression for the reliability of AUC items, as a function of the characteristics of the item and the scoring…

Developing a Placement Exam for Spanish Heritage Language Learners: Item Analysis and Learner Characteristics

ERIC Educational Resources Information Center

Wilson, Damian Vergara

2012-01-01

This paper illustrates a method of item analysis used to identify discriminating multiple-choice items in placement data. The data come from two rounds of pilots given to both SHL students and Spanish as a Second Language (SSL) students. In the first round, 104 items were administered to 507 students. After discarding poor items, the second round…
Student Questionnaire. [Harvard Project Physics

ERIC Educational Resources Information Center

Welch, Wayne W.; Ahlgren, Andrew

This 60-item questionnaire was designed to gather general background information from students who had used the Harvard Project Physics curriculum. The instrument includes three 20-item subscales: (1) attitude toward physics, (2) career interest, and (3) student characteristics. Items are multiple choice (5 options), and the introductory material…
Middle School Students' Conceptual Learning from the Implementation of a New NSF Supported Curriculum: Interactions in Physical Science[TM

ERIC Educational Resources Information Center

Eick, Charles J.; Dias, Michael; Smith, Nancy R. Cook

2009-01-01

A new National Science Foundation supported curriculum, Interactions in Physical Science[TM], was evaluated on students' conceptual change in the twelve concept areas of the national physical science content standard (B) for grades 5-8. Eighth grade students (N = 66) were evaluated pre and post on a 31-item multiple-choice test of conceptual…
Designing Adaptive Instructional Environments: Insights from Empirical Evidence

DTIC Science & Technology

2011-10-01

theorems. Cohen’s f effect size for pretest to posttest gain, averaged across different problems = 0.46. 7 Basis for Adaptation Ability of...problems and took a posttest . Measures of Learning 26-item multiple choice pretest and posttest . Effect size on posttest scores as measured by...solving algebraic equations. Measures of Learning Pretest and posttest using rapid diagnostic testing procedure: Student had to provide their
Health-Related Fitness Knowledge and Its Relation to Student Physical Activity Patterns at a Large U.S. Southern State University

ERIC Educational Resources Information Center

Keating, Xiaofen D.; Castro-Pinero, Jose; Centeio, Erin; Harrison, Louis, Jr.; Ramirez, Tere; Chen, Li

2010-01-01

This study examined student health-related fitness (HRF) knowledge and its relationship to physical activity (PA). The participants were undergraduate students from a large U.S. state university. HRF knowledge was assessed using a test consisting of 150 multiple choice items. Differences in HRF knowledge scores by sex, ethnicity, and years in…
Model Choice and Sample Size in Item Response Theory Analysis of Aphasia Tests

ERIC Educational Resources Information Center

Hula, William D.; Fergadiotis, Gerasimos; Martin, Nadine

2012-01-01

Purpose: The purpose of this study was to identify the most appropriate item response theory (IRT) measurement model for aphasia tests requiring 2-choice responses and to determine whether small samples are adequate for estimating such models. Method: Pyramids and Palm Trees (Howard & Patterson, 1992) test data that had been collected from…
Assessing the Life Science Knowledge of Students and Teachers Represented by the K–8 National Science Standards

PubMed Central

Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Assessing the life science knowledge of students and teachers represented by the K-8 national science standards.

PubMed

Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
Comparisons of mathematics achievement of grade 8 students in the United States and the Russian Federation.

PubMed

Bazarova, Saodat I; Engelhard, George

2004-01-01

Using the Mantel-Haenszel (MH) Procedure, we analyzed data for 7,087 American and 4,022 Russian Grade 8 students from the Third International Mathematics and Science Study (TIMSS) to compare mathematics achievement in the two countries on each of the 124 multiple-choice items. The results of the analyses indicate that the performance of the students on individual multiple-choice mathematics items vary by country. The results also suggest that the relationship between country and item performance differ as a function of content area. A total score of a country's achievement does not provide the whole picture of achievement dynamics; it averages out potentially important information on student achievement and the causes of their performance relative to other countries. The dynamics of achievement across countries will not be revealed unless the analyses are done at the item level.
The Effect of Position and Format on the Difficulty of Assessment Exercises.

ERIC Educational Resources Information Center

Burton, Nancy W.; And Others

Assessment exercises (items) in three different formats--multiple-choice with an "I don't know" (IDK) option, multiple-choice without the IDK, and open-ended--were placed at the beginning, middle and end of 45-minute assessment packages (instruments). A balanced incomplete blocks analysis of variance was computed to determine the biasing…
Developing a prelicensure exam for Canada: an international collaboration.

PubMed

Hobbins, Bonnie; Bradley, Pat

2013-01-01

Nine previously conducted studies indicate that Elsevier's HESI Exit Exam (E(2)) is 96.36%-99.16% accurate in predicting success on the National Council Licensure Examination for Registered Nurses. No similar standardized exam is available in Canada to predict Canadian Registered Nurse Examination (CRNE) success. Like the E(2), such an exam could be used to evaluate Canadian nursing students' preparedness for the CRNE, and scores on the numerous subject matter categories could be used to guide students' remediation efforts so that, ultimately, they are successful on their first attempt at taking the CRNE. The international collaboration between a HESI test construction expert and a nursing faculty member from Canada, who served as the content expert, resulted in the development of a 180-item, multiple-choice/single-answer prelicensure exam (PLE) that was pilot tested with Canadian nursing students (N = 175). Item analysis data obtained from this pilot testing were used to develop a 160-item PLE, which includes an additional 20 pilot test items. The estimated reliability of this exam is 0.91, and it exhibits congruent validity with the CRNE because the PLE test blueprint mimics the CRNE test blueprint. Copyright © 2013 Elsevier Inc. All rights reserved.
Assessment of representational competence in kinematics

NASA Astrophysics Data System (ADS)

Klein, P.; Müller, A.; Kuhn, J.

2017-06-01

A two-tier instrument for representational competence in the field of kinematics (KiRC) is presented, designed for a standard (1st year) calculus-based introductory mechanics course. It comprises 11 multiple choice (MC) and 7 multiple true-false (MTF) questions involving multiple representational formats, such as graphs, pictures, and formal (mathematical) expressions (1st tier). Furthermore, students express their answer confidence for selected items, providing additional information (2nd tier). Measurement characteristics of KiRC were assessed in a validation sample (pre- and post-test, N =83 and N =46 , respectively), including usefulness for measuring learning gain. Validity is checked by interviews and by benchmarking KiRC against related measures. Values for item difficulty, discrimination, and consistency are in the desired ranges; in particular, a good reliability was obtained (KR 20 =0.86 ). Confidence intervals were computed and a replication study yielded values within the latter. For practical and research purposes, KiRC as a diagnostic tool goes beyond related extant instruments both for the representational formats (e.g., mathematical expressions) and for the scope of content covered (e.g., choice of coordinate systems). Together with the satisfactory psychometric properties it appears a versatile and reliable tool for assessing students' representational competency in kinematics (and of its potential change). Confidence judgments add further information to the diagnostic potential of the test, in particular for representational misconceptions. Moreover, we present an analytic result for the question—arising from guessing correction or educational considerations—of how the total effect size (Cohen's d ) varies upon combination of two test components with known individual effect sizes, and then discuss the results in the case of KiRC (MC and MTF combination). The introduced method of test combination analysis can be applied to any test comprising two components for the purpose of finding effect size ranges.
Automatically Scoring Short Essays for Content. CRESST Report 836

ERIC Educational Resources Information Center

Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.

2013-01-01

The Common Core assessments emphasize short essay constructed response items over multiple choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way is found to score them automatically. Current automatic essay scoring techniques are…
Investigating the Stability of Four Methods for Estimating Item Bias.

ERIC Educational Resources Information Center

Perlman, Carole L.; And Others

The reliability of item bias estimates was studied for four methods: (1) the transformed delta method; (2) Shepard's modified delta method; (3) Rasch's one-parameter residual analysis; and (4) the Mantel-Haenszel procedure. Bias statistics were computed for each sample using all methods. Data were from administration of multiple-choice items from…
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
The development and validation of a test of science critical thinking for fifth graders.

PubMed

Mapeala, Ruslan; Siew, Nyet Moi

2015-01-01

The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.
The positive and negative consequences of multiple-choice testing.

PubMed

Roediger, Henry L; Marsh, Elizabeth J

2005-09-01

Multiple-choice tests are commonly used in educational settings but with unknown effects on students' knowledge. The authors examined the consequences of taking a multiple-choice test on a later general knowledge test in which students were warned not to guess. A large positive testing effect was obtained: Prior testing of facts aided final cued-recall performance. However, prior testing also had negative consequences. Prior reading of a greater number of multiple-choice lures decreased the positive testing effect and increased production of multiple-choice lures as incorrect answers on the final test. Multiple-choice testing may inadvertently lead to the creation of false knowledge.
Analyzing Multiple-Choice Questions by Model Analysis and Item Response Curves

NASA Astrophysics Data System (ADS)

Wattanakasiwich, P.; Ananta, S.

2010-07-01

In physics education research, the main goal is to improve physics teaching so that most students understand physics conceptually and be able to apply concepts in solving problems. Therefore many multiple-choice instruments were developed to probe students' conceptual understanding in various topics. Two techniques including model analysis and item response curves were used to analyze students' responses from Force and Motion Conceptual Evaluation (FMCE). For this study FMCE data from more than 1000 students at Chiang Mai University were collected over the past three years. With model analysis, we can obtain students' alternative knowledge and the probabilities for students to use such knowledge in a range of equivalent contexts. The model analysis consists of two algorithms—concentration factor and model estimation. This paper only presents results from using the model estimation algorithm to obtain a model plot. The plot helps to identify a class model state whether it is in the misconception region or not. Item response curve (IRC) derived from item response theory is a plot between percentages of students selecting a particular choice versus their total score. Pros and cons of both techniques are compared and discussed.
Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing.

PubMed

Butler, Andrew C; Roediger, Henry L

2008-04-01

Multiple-choice tests are used frequently in higher education without much consideration of the impact this form of assessment has on learning. Multiple-choice testing enhances retention of the material tested (the testing effect); however, unlike other tests, multiple-choice can also be detrimental because it exposes students to misinformation in the form of lures. The selection of lures can lead students to acquire false knowledge (Roediger & Marsh, 2005). The present research investigated whether feedback could be used to boost the positive effects and reduce the negative effects of multiple-choice testing. Subjects studied passages and then received a multiple-choice test with immediate feedback, delayed feedback, or no feedback. In comparison with the no-feedback condition, both immediate and delayed feedback increased the proportion of correct responses and reduced the proportion of intrusions (i.e., lure responses from the initial multiple-choice test) on a delayed cued recall test. Educators should provide feedback when using multiple-choice tests.
The role of source memory in gambling task decision making.

PubMed

Whitney, Paul; Hinson, John M

2012-01-01

The role of memory in the Iowa Gambling Task (IGT) was tested in two experiments that dissociated item memory (memory for losses obtained) from source memory (the deck that produced a given loss). In Experiment 1, participants observed 75 choices that had been made by controls or patients in previous research, followed by memory tests, and then 25 active choices from the participant. In Experiment 2, participants made choices for 75 trials, performed the memory tests, and then made 25 final choices. The data show that item and source memory can diverge within the IGT, and that source memory makes a significant contribution to IGT performance.

Investigating the potential influence of established multiple-choice test-taking cues on item response in a pharmacotherapy board certification examination preparatory manual: a pilot study.

PubMed

Gettig, Jacob P

2006-04-01

To determine the prevalence of established multiple-choice test-taking correct and incorrect answer cues in the American College of Clinical Pharmacy's Updates in Therapeutics: The Pharmacotherapy Preparatory Course, 2005 Edition, as an equal or lesser surrogate indication of the prevalence of such cues in the Pharmacotherapy board certification examination. All self-assessment and patient case question-and-answer sets were assessed individually to determine if they were subject to selected correct and incorrect answer cues commonly seen in multiple-choice question writing. If the question was considered evaluable, correct answer cues-longest answer, mid-range number, one of two similar choices, and one of two opposite choices-were tallied. In addition, incorrect answer cues- inclusionary language and grammatical mismatch-were also tallied. Each cue was counted if it did what was expected or did the opposite of what was expected. Multiple cues could be identified in each question. A total of 237 (47.7%) of 497 questions in the manual were deemed evaluable. A total of 325 correct answer cues and 35 incorrect answer cues were identified in the 237 evaluable questions. Most evaluable questions contained one to two correct and/or incorrect answer cue(s). Longest answer was the most frequently identified correct answer cue; however, it was the least likely to identify the correct answer. Inclusionary language was the most frequently identified incorrect answer cue. Incorrect answer cues were considerably more likely to identify incorrect answer choices than correct answer cues were able to identify correct answer choices. The use of established multiple-choice test-taking cues is unlikely to be of significant help when taking the Pharmacotherapy board certification examination, primarily because of the lack of questions subject to such cues and the inability of correct answer cues to accurately identify correct answers. Incorrect answer cues, especially the use of inclusionary language, almost always will accurately identify an incorrect answer choice. Assuming that questions in the preparatory course manual were equal or lesser surrogates of those in the board certification examination, it is unlikely that intuition alone can replace adequate preparation and studying as the sole determinant of examination success.
An algorithm for calculating exam quality as a basis for performance-based allocation of funds at medical schools.

PubMed

Kirschstein, Timo; Wolters, Alexander; Lenz, Jan-Hendrik; Fröhlich, Susanne; Hakenberg, Oliver; Kundt, Günther; Darmüntzel, Martin; Hecker, Michael; Altiner, Attila; Müller-Hilke, Brigitte

2016-01-01

The amendment of the Medical Licensing Act (ÄAppO) in Germany in 2002 led to the introduction of graded assessments in the clinical part of medical studies. This, in turn, lent new weight to the importance of written tests, even though the minimum requirements for exam quality are sometimes difficult to reach. Introducing exam quality as a criterion for the award of performance-based allocation of funds is expected to steer the attention of faculty members towards more quality and perpetuate higher standards. However, at present there is a lack of suitable algorithms for calculating exam quality. In the spring of 2014, the students' dean commissioned the "core group" for curricular improvement at the University Medical Center in Rostock to revise the criteria for the allocation of performance-based funds for teaching. In a first approach, we developed an algorithm that was based on the results of the most common type of exam in medical education, multiple choice tests. It included item difficulty and discrimination, reliability as well as the distribution of grades achieved. This algorithm quantitatively describes exam quality of multiple choice exams. However, it can also be applied to exams involving short assay questions and the OSCE. It thus allows for the quantitation of exam quality in the various subjects and - in analogy to impact factors and third party grants - a ranking among faculty. Our algorithm can be applied to all test formats in which item difficulty, the discriminatory power of the individual items, reliability of the exam and the distribution of grades are measured. Even though the content validity of an exam is not considered here, we believe that our algorithm is suitable as a general basis for performance-based allocation of funds.
Australian Item Bank Program: Science Item Bank. Book 3: Biology.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The Australian Science Item Bank consists of three volumes of multiple-choice questions. Book 3 contains questions on the biological sciences. The questions are designed to be suitable for high school students (year 8 to year 12 in Australian schools). The questions are classified by the subject content of the question, the cognitive skills…
Guide to an Assessment of Consumer Skills.

ERIC Educational Resources Information Center

Education Commission of the States, Denver, CO.

This guide is intended to assist those interested in developing and/or assessing consumer skills. It is an accompanyment to a separate collection of survey items (mostly in a multiple choice format) designed to assess seventeen-year-olds' consumer skills. It is suggested that the items can be used as part of an item pool, as an instructional tool,…
Algorithms for Developing Test Questions from Sentences in Instructional Materials: an Extension of an Earlier Study

DTIC Science & Technology

1980-01-01

Silverfish, Canine, and Cicadas . b. ’Mgorithinically--Sdverfish, Females, Individuals, and Wasps. This process resulted in 16.0 multiple-choice items: 20...and Their Text Frequency Nouns Adjectives Rare Singleton Keyword Rare Singleton Keyword Instars Insect (8) Plant-feeding Immature (3) Cicadas ...Sllverflsh (c) Caniru’S (d) Cicadas c. Foils Produced Algorl ..nii-allv: Sllverflsh Fenvilea ItuliviJu.i Is Wanps ?. Kevword Adjective—Immature
Development of the Exam of GeoloGy Standards, EGGS, to Measure Students' Conceptual Understanding of Geology Concepts

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, T. F.; Slater, S. J.

2017-12-01

Discipline-based geoscience education researchers have considerable need for criterion-referenced, easy-to-administer and easy-to-score, conceptual diagnostic surveys for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing discipline-based science education research to improve teaching and learning across the geosciences, this study establishes the reliability and validity of a 28-item, multiple-choice, pre- and post- Exam of GeoloGy Standards, hereafter simply called EGGS. The content knowledge EGGS addresses is based on 11 consensus concepts derived from a systematic, thematic analysis of the overlapping ideas presented in national science education reform documents including the Next Generation Science Standards, the AAAS Benchmarks for Science Literacy, the Earth Science Literacy Principles, and the NRC National Science Education Standards. Using community agreed upon best-practices for creating, field-testing, and iteratively revising modern multiple-choice test items using classical item analysis techniques, EGGS emphasizes natural student language over technical scientific vocabulary, leverages illustrations over students' reading ability, specifically targets students' misconceptions identified in the scholarly literature, and covers the range of topics most geology educators expect general education students to know at the end of their formal science learning experiences. The current version of EGGS is judged to be valid and reliable with college-level, introductory science survey students based on both standard quantitative and qualitative measures, including extensive clinical interviews with targeted students and systematic expert review.
Seafarers Knowledge Inventory.

ERIC Educational Resources Information Center

Hounshell, Paul B.

This 60-item, multiple-choice Seafarers Knowledge Inventory was developed for use in marine vocational classes (grades 9-12) to measure a student's knowledge of information that "seafarers" should know. Items measure knowledge of various aspects of boating operation, weather, safety, winds, and oceanography. Steps in the construction of…
A hybrid heuristic for the multiple choice multidimensional knapsack problem

NASA Astrophysics Data System (ADS)

Mansi, Raïd; Alves, Cláudio; Valério de Carvalho, J. M.; Hanafi, Saïd

2013-08-01

In this article, a new solution approach for the multiple choice multidimensional knapsack problem is described. The problem is a variant of the multidimensional knapsack problem where items are divided into classes, and exactly one item per class has to be chosen. Both problems are NP-hard. However, the multiple choice multidimensional knapsack problem appears to be more difficult to solve in part because of its choice constraints. Many real applications lead to very large scale multiple choice multidimensional knapsack problems that can hardly be addressed using exact algorithms. A new hybrid heuristic is proposed that embeds several new procedures for this problem. The approach is based on the resolution of linear programming relaxations of the problem and reduced problems that are obtained by fixing some variables of the problem. The solutions of these problems are used to update the global lower and upper bounds for the optimal solution value. A new strategy for defining the reduced problems is explored, together with a new family of cuts and a reformulation procedure that is used at each iteration to improve the performance of the heuristic. An extensive set of computational experiments is reported for benchmark instances from the literature and for a large set of hard instances generated randomly. The results show that the approach outperforms other state-of-the-art methods described so far, providing the best known solution for a significant number of benchmark instances.
Force, velocity, and work: The effects of different contexts on students' understanding of vector concepts using isomorphic problems

NASA Astrophysics Data System (ADS)

Barniol, Pablo; Zavala, Genaro

2014-12-01

In this article we compare students' understanding of vector concepts in problems with no physical context, and with three mechanics contexts: force, velocity, and work. Based on our "Test of Understanding of Vectors," a multiple-choice test presented elsewhere, we designed two isomorphic shorter versions of 12 items each: a test with no physical context, and a test with mechanics contexts. For this study, we administered the items twice to students who were finishing an introductory mechanics course at a large private university in Mexico. The first time, we administered the two 12-item tests to 608 students. In the second, we only tested the items for which we had found differences in students' performances that were difficult to explain, and in this case, we asked them to show their reasoning in written form. In the first administration, we detected no significant difference between the medians obtained in the tests; however, we did identify significant differences in some of the items. For each item we analyze the type of difference found between the tests in the selection of the correct answer, the most common error on each of the tests, and the differences in the selection of incorrect answers. We also investigate the causes of the different context effects. Based on these analyses, we establish specific recommendations for the instruction of vector concepts in an introductory mechanics course. In the Supplemental Material we include both tests for other researchers studying vector learning, and for physics teachers who teach this material.
The Geoscience Concept Test: A New Assessment Tool Based on Student Misconceptions

NASA Astrophysics Data System (ADS)

Libarkin, J.; Anderson, S. W.; Boone, W. J.; Beilfuss, M.; Dahl, J.

2002-12-01

We developed and began pilot testing of an earth science assessment tool called the geoscience concept test (GCT). The GCT uses student misconceptions as distractors in a 30 item multiple-choice instrument. Student misconceptions were first assessed through the analysis of nearly 300 questionnaires administered in introductory geology courses at three institutions. Results from the questionnaires guided the development of an interview protocol that was used by four interviewers at four different institutions. Over 100 in-depth student interviews lasting from 0.5 to 1 hour probed topics related to the Earth's interior, geologic time, and the formation of Earth surface features such as mountains and volcanoes to better define misconceptions. Thematic content analysis of the interviews identified a number of widely held misconceptions, which were then incorporated into the GCT as multiple-choice distractors (wrong answers). For content validity, the initial GCT was reviewed by seven experts (3 geoscientists and 4 science educators) and revised before pilot testing. Approximately 100 introductory and non-science major college students from four institutions were assessed with the GCT pilot in the spring of 2002. Rasch model analysis of this data showed that students found the pilot test difficult, and the level of difficulty was consistent between the four institutions. Analysis of individual items showed that students had fewer misconceptions regarding the locations of earthquakes, and many misconceptions regarding the locations of volcanoes on the Earth's surface, suggesting a disconnect in their understanding of the role of plate tectonics in these phenomena. Analysis of the misfit statistic for each item showed that none of the questions misfit, although we dropped one question and modified the wording of another for clarity in the next round of piloting. A second round of piloting scheduled for the fall of 2002 includes nearly 3000 students from 34 institutions in 19 states.
Development of knowledge tests for multi-disciplinary emergency training: a review and an example.

PubMed

Sørensen, J L; Thellesen, L; Strandbygaard, J; Svendsen, K D; Christensen, K B; Johansen, M; Langhoff-Roos, P; Ekelund, K; Ottesen, B; Van Der Vleuten, C

2015-01-01

The literature is sparse on written test development in a post-graduate multi-disciplinary setting. Developing and evaluating knowledge tests for use in multi-disciplinary post-graduate training is challenging. The objective of this study was to describe the process of developing and evaluating a multiple-choice question (MCQ) test for use in a multi-disciplinary training program in obstetric-anesthesia emergencies. A multi-disciplinary working committee with 12 members representing six professional healthcare groups and another 28 participants were involved. Recurrent revisions of the MCQ items were undertaken followed by a statistical analysis. The MCQ items were developed stepwise, including decisions on aims and content, followed by testing for face and content validity, construct validity, item-total correlation, and reliability. To obtain acceptable content validity, 40 out of originally 50 items were included in the final MCQ test. The MCQ test was able to distinguish between levels of competence, and good construct validity was indicated by a significant difference in the mean score between consultants and first-year trainees, as well as between first-year trainees and medical and midwifery students. Evaluation of the item-total correlation analysis in the 40 items set revealed that 11 items needed re-evaluation, four of which addressed content issues in local clinical guidelines. A Cronbach's alpha of 0.83 for reliability was found, which is acceptable. Content and construct validity and reliability were acceptable. The presented template for the development of this MCQ test could be useful to others when developing knowledge tests and may enhance the overall quality of test development. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
There’s more to food store choice than proximity: a questionnaire development study

PubMed Central

2013-01-01

Background Proximity of food stores is associated with dietary intake and obesity; however, individuals frequently shop at stores that are not the most proximal. Little is known about other factors that influence food store choice. The current research describes the development of the Food Store Selection Questionnaire (FSSQ) and describes preliminary results of field testing the questionnaire. Methods Development of the FSSQ involved a multidisciplinary literature review, qualitative analysis of focus group transcripts, and expert and community reviews. Field testing consisted of 100 primary household food shoppers (93% female, 64% African American), in rural and urban Arkansas communities, rating FSSQ items as to their importance in store choice and indicating their top two reasons. After eliminating 14 items due to low mean importance scores and high correlations with other items, the final FSSQ questionnaire consists of 49 items. Results Items rated highest in importance were: meat freshness; store maintenance; store cleanliness; meat varieties; and store safety. Items most commonly rated as top reasons were: low prices; proximity to home; fruit/vegetable freshness; fruit/vegetable variety; and store cleanliness. Conclusions The FSSQ is a comprehensive questionnaire for detailing key reasons in food store choice. Although proximity to home was a consideration for participants, there were clearly other key factors in their choice of a food store. Understanding the relative importance of these different dimensions driving food store choice in specific communities may be beneficial in informing policies and programs designed to support healthy dietary intake and obesity prevention. PMID:23773428
There's more to food store choice than proximity: a questionnaire development study.

PubMed

Krukowski, Rebecca A; Sparks, Carla; DiCarlo, Marisha; McSweeney, Jean; West, Delia Smith

2013-06-17

Proximity of food stores is associated with dietary intake and obesity; however, individuals frequently shop at stores that are not the most proximal. Little is known about other factors that influence food store choice. The current research describes the development of the Food Store Selection Questionnaire (FSSQ) and describes preliminary results of field testing the questionnaire. Development of the FSSQ involved a multidisciplinary literature review, qualitative analysis of focus group transcripts, and expert and community reviews. Field testing consisted of 100 primary household food shoppers (93% female, 64% African American), in rural and urban Arkansas communities, rating FSSQ items as to their importance in store choice and indicating their top two reasons. After eliminating 14 items due to low mean importance scores and high correlations with other items, the final FSSQ questionnaire consists of 49 items. Items rated highest in importance were: meat freshness; store maintenance; store cleanliness; meat varieties; and store safety. Items most commonly rated as top reasons were: low prices; proximity to home; fruit/vegetable freshness; fruit/vegetable variety; and store cleanliness. The FSSQ is a comprehensive questionnaire for detailing key reasons in food store choice. Although proximity to home was a consideration for participants, there were clearly other key factors in their choice of a food store. Understanding the relative importance of these different dimensions driving food store choice in specific communities may be beneficial in informing policies and programs designed to support healthy dietary intake and obesity prevention.
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

ERIC Educational Resources Information Center

Kim, Jihye; Oshima, T. C.

2013-01-01

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Manipulations of Choice Familiarity in Multiple-Choice Testing Support a Retrieval Practice Account of the Testing Effect

ERIC Educational Resources Information Center

Jang, Yoonhee; Pashler, Hal; Huber, David E.

2014-01-01

We performed 4 experiments assessing the learning that occurs when taking a test. Our experiments used multiple-choice tests because the processes deployed during testing can be manipulated by varying the nature of the choice alternatives. Previous research revealed that a multiple-choice test that includes "none of the above" (NOTA)…
The Impact Analysis of Psychological Reliability of Population Pilot Study for Selection of Particular Reliable Multi-Choice Item Test in Foreign Language Research Work

ERIC Educational Resources Information Center

Fazeli, Seyed Hossein

2010-01-01

The purpose of research described in the current study is the psychological reliability, its importance, application, and more to investigate on the impact analysis of psychological reliability of population pilot study for selection of particular reliable multi-choice item test in foreign language research work. The population for subject…
Development and evaluation of a thermochemistry concept inventory for college-level general chemistry

NASA Astrophysics Data System (ADS)

Wren, David A.

The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
New York Community Environment Study Questionnaire.

ERIC Educational Resources Information Center

Glaser, Daniel; Snow, Mary

This questionnaire assesses neighborhood drug problem concern, drug use practices, knowledge of drugs and agencies dealing with drugs, and views on drug education in persons aged 13 or older. The questionnaire has 31 items (multiple-choice or free response), most with several parts. The items deal with demographic and personal data, problems in…
The development of a knowledge test of depression and its treatment for patients suffering from non-psychotic depression: a psychometric assessment

PubMed Central

Gabriel, Adel; Violato, Claudio

2009-01-01

Background To develop and psychometrically assess a multiple choice question (MCQ) instrument to test knowledge of depression and its treatments in patients suffering from depression. Methods A total of 63 depressed patients and twelve psychiatric experts participated. Based on empirical evidence from an extensive review, theoretical knowledge and in consultations with experts, 27-item MCQ knowledge of depression and its treatment test was constructed. Data collected from the psychiatry experts were used to assess evidence of content validity for the instrument. Results Cronbach's alpha of the instrument was 0.68, and there was an overall 87.8% agreement (items are highly relevant) between experts about the relevance of the MCQs to test patient knowledge on depression and its treatments. There was an overall satisfactory patients' performance on the MCQs with 78.7% correct answers. Results of an item analysis indicated that most items had adequate difficulties and discriminations. Conclusion There was adequate reliability and evidence for content and convergent validity for the instrument. Future research should employ a lager and more heterogeneous sample from both psychiatrist and community samples, than did the present study. Meanwhile, the present study has resulted in psychometrically tested instruments for measuring knowledge of depression and its treatment of depressed patients. PMID:19754944
The development of a science process assessment for fourth-grade students

NASA Astrophysics Data System (ADS)

Smith, Kathleen A.; Welliver, Paul W.

In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.

Automatic Short Essay Scoring Using Natural Language Processing to Extract Semantic Information in the Form of Propositions. CRESST Report 831

ERIC Educational Resources Information Center

Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.

2013-01-01

The Common Core assessments emphasize short essay constructed-response items over multiple-choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way to score them automatically can be found. Current automatic essay-scoring techniques…
Assessing learning in small sized physics courses

NASA Astrophysics Data System (ADS)

Ene, Emanuela; Ackerson, Bruce J.

2018-01-01

We describe the construction, validation, and testing of a concept inventory for an Introduction to Physics of Semiconductors course offered by the department of physics to undergraduate engineering students. By design, this inventory addresses both content knowledge and the ability to interpret content via different cognitive processes outlined in Bloom's revised taxonomy. The primary challenge comes from the low number of test takers. We describe the Rasch modeling analysis for this concept inventory, and the results of the calibration on a small sample size, with the intention of providing a useful blueprint to other instructors. Our study involved 101 students from Oklahoma State University and fourteen faculty teaching or doing research in the field of semiconductors at seven universities. The items were written in four-option multiple-choice format. It was possible to calibrate a 30-item unidimensional scale precisely enough to characterize the student population enrolled each semester and, therefore, to allow the tailoring of the learning activities of each class. We show that this scale can be employed as an item bank from which instructors could extract short testlets and where we can add new items fitting the existing calibration.
Effects of Test Expectation on Multiple-Choice Performance and Subjective Ratings

ERIC Educational Resources Information Center

Balch, William R.

2007-01-01

Undergraduates studied the definitions of 16 psychology terms, expecting either a multiple-choice (n = 132) or short-answer (n = 122) test. All students then received the same multiple-choice test, requiring them to recognize the definitions as well as novel examples of the terms. Compared to students expecting a multiple-choice test, those…
A Analysis of Saudi Arabian High School Students' Misconceptions about Physics Concepts.

NASA Astrophysics Data System (ADS)

Al-Rubayea, Abdullah A. M.

This study was conducted to explore Saudi high students' misconceptions in selected physics concepts. It also detected the effects of gender, grade level and location of school on Saudi high school students' misconceptions. In addition, a further analysis of students' misconceptions in each question was investigated and a correlation between students' responses, confidence in answers and sensibleness was conducted. There was an investigation of sources of students' answers in this study. Finally, this study included an analysis of students' selection of reasons only in the instrument. The instrument used to detect the students' misconceptions was a modified form of the Misconception Identification in Science Questionnaire (MISQ). This instrument was developed by Franklin (1992) to detected students' misconceptions in selected physics concepts. This test is a two-tier multiple choice test that examines four areas of physics: Force and motion, heat and temperature, light and color and electricity and magnetism. This study included a sample of 1080 Saudi high school students who were randomly selected from six Saudi educational districts. This study also included both genders, the three grade levels of Saudi high schools, six different educational districts, and a city and a town in each educational district. The sample was equally divided between genders, grade levels, and educational districts. The result of this study revealed that Saudi Arabian high school students hold numerous misconceptions about selected physics concepts. It also showed that tenth grade students were significantly different than the other grades. The result also showed that different misconceptions are held by the students for each concept in the MISQ. A positive correlation between students' responses, confidence in answers and sensibleness in many questions was shown. In addition, it showed that guessing was the most dominant source of misconceptions. The result revealed that gender and grade level had an affect on students' choice of decision on the MISQ items. A positive change in the means of gender and grade levels in the multiple choice test and gender differences in selection of reason may be associated with specific concepts. No significant difference in frequencies of the reasons chosen by the student to justify their answers were found in most of the items (10 items).
A Quantum Chemistry Concept Inventory for Physical Chemistry Classes

ERIC Educational Resources Information Center

Dick-Perez, Marilu; Luxford, Cynthia J.; Windus, Theresa L.; Holme, Thomas

2016-01-01

A 14-item, multiple-choice diagnostic assessment tool, the quantum chemistry concept inventory or QCCI, is presented. Items were developed based on published student misconceptions and content coverage and then piloted and used in advanced physical chemistry undergraduate courses. In addition to the instrument itself, data from both a pretest,…
Final Sampling Bias in Haptic Judgments: How Final Touch Affects Decision-Making.

PubMed

Mitsuda, Takashi; Yoshioka, Yuichi

2018-01-01

When people make a choice between multiple items, they usually evaluate each item one after the other repeatedly. The effect of the order and number of evaluating items on one's choices is essential to understanding the decision-making process. Previous studies have shown that when people choose a favorable item from two items, they tend to choose the item that they evaluated last. This tendency has been observed regardless of sensory modalities. This study investigated the origin of this bias by using three experiments involving two-alternative forced-choice tasks using handkerchiefs. First, the bias appeared in a smoothness discrimination task, which indicates that the bias was not based on judgments of preference. Second, the handkerchief that was touched more often tended to be chosen more frequently in the preference task, but not in the smoothness discrimination task, indicating that a mere exposure effect enhanced the bias. Third, in the condition where the number of touches did not differ between handkerchiefs, the bias appeared when people touched a handkerchief they wanted to touch last, but not when people touched the handkerchief that was predetermined. This finding suggests a direct coupling between final voluntary touching and judgment.
Validation and structural analysis of the kinematics concept test

NASA Astrophysics Data System (ADS)

Lichtenberger, A.; Wagner, C.; Hofer, S. I.; Stern, E.; Vaterlaus, A.

2017-06-01

The kinematics concept test (KCT) is a multiple-choice test designed to evaluate students' conceptual understanding of kinematics at the high school level. The test comprises 49 multiple-choice items about velocity and acceleration, which are based on seven kinematic concepts and which make use of three different representations. In the first part of this article we describe the development and the validation process of the KCT. We applied the KCT to 338 Swiss high school students who attended traditional teaching in kinematics. We analyzed the response data to provide the psychometric properties of the test. In the second part we present the results of a structural analysis of the test. An exploratory factor analysis of 664 student answers finally uncovered the seven kinematics concepts as factors. However, the analysis revealed a hierarchical structure of concepts. At the higher level, mathematical concepts group together, and then split up into physics concepts at the lower level. Furthermore, students who seem to understand a concept in one representation have difficulties transferring the concept to similar problems in another representation. Both results have implications for teaching kinematics. First, teaching mathematical concepts beforehand might be beneficial for learning kinematics. Second, instructions have to be designed to teach students the change between different representations.
Building the BIKE: Development and Testing of the Biotechnology Instrument for Knowledge Elicitation (BIKE)

NASA Astrophysics Data System (ADS)

Witzig, Stephen B.; Rebello, Carina M.; Siegel, Marcelle A.; Freyermuth, Sharyn K.; Izci, Kemal; McClure, Bruce

2014-10-01

Identifying students' conceptual scientific understanding is difficult if the appropriate tools are not available for educators. Concept inventories have become a popular tool to assess student understanding; however, traditionally, they are multiple choice tests. International science education standard documents advocate that assessments should be reform based, contain diverse question types, and should align with instructional approaches. To date, no instrument of this type targeting student conceptions in biotechnology has been developed. We report here the development, testing, and validation of a 35-item Biotechnology Instrument for Knowledge Elicitation (BIKE) that includes a mix of question types. The BIKE was designed to elicit student thinking and a variety of conceptual understandings, as opposed to testing closed-ended responses. The design phase contained nine steps including a literature search for content, student interviews, a pilot test, as well as expert review. Data from 175 students over two semesters, including 16 student interviews and six expert reviewers (professors from six different institutions), were used to validate the instrument. Cronbach's alpha on the pre/posttest was 0.664 and 0.668, respectively, indicating the BIKE has internal consistency. Cohen's kappa for inter-rater reliability among the 6,525 total items was 0.684 indicating substantial agreement among scorers. Item analysis demonstrated that the items were challenging, there was discrimination among the individual items, and there was alignment with research-based design principles for construct validity. This study provides a reliable and valid conceptual understanding instrument in the understudied area of biotechnology.
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper.

PubMed

Sim, Si-Mui; Rasiah, Raja Isaiah

2006-02-01

This paper reports the relationship between the difficulty level and the discrimination power of true/false-type multiple-choice questions (MCQs) in a multidisciplinary paper for the para-clinical year of an undergraduate medical programme. MCQ items in papers taken from Year II Parts A, B and C examinations for Sessions 2001/02, and Part B examinations for 2002/03 and 2003/04, were analysed to obtain their difficulty indices and discrimination indices. Each paper consisted of 250 true/false items (50 questions of 5 items each) on topics drawn from different disciplines. The questions were first constructed and vetted by the individual departments before being submitted to a central committee, where the final selection of the MCQs was made, based purely on the academic judgement of the committee. There was a wide distribution of item difficulty indices in all the MCQ papers analysed. Furthermore, the relationship between the difficulty index (P) and discrimination index (D) of the MCQ items in a paper was not linear, but more dome-shaped. Maximal discrimination (D = 51% to 71%) occurred with moderately easy/difficult items (P = 40% to 74%). On average, about 38% of the MCQ items in each paper were "very easy" (P > or =75%), while about 9% were "very difficult" (P <25%). About two-thirds of these very easy/difficult items had "very poor" or even negative discrimination (D < or =20%). MCQ items that demonstrate good discriminating potential tend to be moderately difficult items, and the moderately-to-very difficult items are more likely to show negative discrimination. There is a need to evaluate the effectiveness of our MCQ items.
Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple choice questions? Research paper

PubMed Central

Palmer, Edward J; Devitt, Peter G

2007-01-01

Background Reliable and valid written tests of higher cognitive function are difficult to produce, particularly for the assessment of clinical problem solving. Modified Essay Questions (MEQs) are often used to assess these higher order abilities in preference to other forms of assessment, including multiple-choice questions (MCQs). MEQs often form a vital component of end-of-course assessments in higher education. It is not clear how effectively these questions assess higher order cognitive skills. This study was designed to assess the effectiveness of the MEQ to measure higher-order cognitive skills in an undergraduate institution. Methods An analysis of multiple-choice questions and modified essay questions (MEQs) used for summative assessment in a clinical undergraduate curriculum was undertaken. A total of 50 MCQs and 139 stages of MEQs were examined, which came from three exams run over two years. The effectiveness of the questions was determined by two assessors and was defined by the questions ability to measure higher cognitive skills, as determined by a modification of Bloom's taxonomy, and its quality as determined by the presence of item writing flaws. Results Over 50% of all of the MEQs tested factual recall. This was similar to the percentage of MCQs testing factual recall. The modified essay question failed in its role of consistently assessing higher cognitive skills whereas the MCQ frequently tested more than mere recall of knowledge. Conclusion Construction of MEQs, which will assess higher order cognitive skills cannot be assumed to be a simple task. Well-constructed MCQs should be considered a satisfactory replacement for MEQs if the MEQs cannot be designed to adequately test higher order skills. Such MCQs are capable of withstanding the intellectual and statistical scrutiny imposed by a high stakes exit examination. PMID:18045500
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

ERIC Educational Resources Information Center

Woods, Carol M.; Grimm, Kevin J.

2011-01-01

In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Multiple Choice Testing and the Retrieval Hypothesis of the Testing Effect

ERIC Educational Resources Information Center

Sensenig, Amanda E.

2010-01-01

Taking a test often leads to enhanced later memory for the tested information, a phenomenon known as the "testing effect". This memory advantage has been reliably demonstrated with recall tests but not multiple choice tests. One potential explanation for this finding is that multiple choice tests do not rely on retrieval processes to the same…
Fifth Graders' Learning About Simple Machines Through Engineering Design-Based Instruction Using LEGO™ Materials

NASA Astrophysics Data System (ADS)

Marulcu, Ismail; Barnett, Mike

2013-10-01

This study is part of a 5-year National Science Foundation-funded project, Transforming Elementary Science Learning Through LEGO™ Engineering Design. In this study, we report on the successes and challenges of implementing an engineering design-based and LEGO™-oriented unit in an urban classroom setting and we focus on the impact of the unit on students' content understanding of simple machines. The LEGO™ engineering-based simple machines module, which was developed for fifth graders by our research team, was implemented in an urban school in a large city in the Northeastern region of the USA. Thirty-three fifth grade students participated in the study, and they showed significant growth in content understanding. We measured students' content knowledge by using identical paper tests and semistructured interviews before and after instruction. Our paired t test analysis results showed that students significantly improved their test and interview scores (t = -3.62, p < 0.001 for multiple-choice items and t = -9.06, p < 0.000 for the open-ended items in the test and t = -12.11, p < 0.000 for the items in interviews). We also identified several alternative conceptions that are held by students on simple machines.
Evaluation of Modified Essay Questions (MEQ) and Multiple Choice Questions (MCQ) as a tool for Assessing the Cognitive Skills of Undergraduate Medical Students

PubMed Central

Khan, Moeen-uz-Zafar; Aljarallah, Badr Muhammad

2011-01-01

Objectives: Developing and testing the cognitive skills and abstract thinking of undergraduate medical students are the main objectives of problem based learning. Modified Essay Questions (MEQ) and Multiple Choice Questions (MCQ) may both be designed to test these skills. The objectives of this study were to assess the effectiveness of both forms of questions in testing the different levels of the cognitive skills of undergraduate medical students and to detect any item writing flaws in the questions. Methods: A total of 50 MEQs and 50 MCQs were evaluated. These questions were chosen randomly from various examinations given to different batches of undergraduate medical students taking course MED 411–412 at the Department of Medicine, Qassim University from the years 2005 to 2009. The effectiveness of the questions was determined by two assessors and was defined by the question’s ability to measure higher cognitive skills, as determined by modified Bloom’s taxonomy, and its quality as determined by the presence of item writing flaws. ‘SPSS15’ and ‘Medcalc’ programs were used to tabulate and analyze the data. Results: The percentage of questions testing the level III (problem solving) cognitive skills of the students was 40% for MEQs and 60% for the MCQs; the remaining questions merely assessed the recall and comprehension. No significant difference was found between MEQ and MCQ in relation to the type of questions (recall; comprehension or problem solving x2 = 5.3, p = 0.07).The agreement between the two assessors was quite high in case of MCQ (kappa=0.609; SE 0.093; 95%CI 0.426 – 0.792) but lower in case of MEQ (kappa=0.195; SE 0.073; 95%CI 0.052 – 0.338). 16% of the MEQs and 12% of the MCQs had item writing flaws. Conclusion: A well constructed MCQ is superior to MEQ in testing the higher cognitive skills of undergraduate medical students in a problem based learning setup. Constructing an MEQ for assessing the cognitive skills of a student is not a simple task and is more frequently associated with item writing flaws. PMID:22489228
"But I Thought I Knew That!" Student Confidence Judgments on Course Examinations in Introductory Psychology

ERIC Educational Resources Information Center

Nevid, Jeffrey S.; Cheney, Brianna; Thompson, Clarissa

2015-01-01

Students in an introductory psychology class rated their level of confidence in their answers to exam questions on four multiple-choice exams through the course of a semester. Correlations between confidence judgments and accuracy (correct vs. incorrect) at the individual item level showed modest but significant relationships for item sets scaled…
Music lessons are associated with increased verbal memory in individuals with Williams syndrome.

PubMed

Dunning, Brittany A; Martens, Marilee A; Jungers, Melissa K

2014-11-16

Williams syndrome (WS) is a genetic disorder characterized by intellectual delay and an affinity for music. It has been previously shown that familiar music can enhance verbal memory in individuals with WS who have had music training. There is also evidence that unfamiliar, or novel, music may also improve cognitive recall. This study was designed to examine if a novel melody could also enhance verbal memory in individuals with WS, and to more fully characterize music training in this population. We presented spoken or sung sentences that described an animal and its group name to 44 individuals with WS, and then tested their immediate and delayed memory using both recall and multiple choice formats. Those with formal music training (average duration of training 4½ years) scored significantly higher on both the spoken and sung recall items, as well as on the spoken multiple choice items, than those with no music training. Music therapy, music enjoyment, age, and Verbal IQ did not impact performance on the memory tasks. These findings provide further evidence that formal music lessons may impact the neurological pathways associated with verbal memory in individuals with WS, consistent with findings in typically developing individuals. Copyright © 2014 Elsevier Ltd. All rights reserved.
The Positive and Negative Consequences of Multiple-Choice Testing

ERIC Educational Resources Information Center

Roediger, Henry L.; Marsh, Elizabeth J.

2005-01-01

Multiple-choice tests are commonly used in educational settings but with unknown effects on students' knowledge. The authors examined the consequences of taking a multiple-choice test on a later general knowledge test in which students were warned not to guess. A large positive testing effect was obtained: Prior testing of facts aided final…
Students' Geographic Knowledge and Skills in Different Kinds of Tests: Multiple-Choice versus Performance Assessment.

ERIC Educational Resources Information Center

Kon, Jane Heckley; Martin-Kniep, Giselle O.

1992-01-01

Describes a case study to determine whether performance tests are a feasible alternative to multiple-choice tests. Examines the difficulties of administering and scoring performance assessments. Explains that the study employed three performance tests and one multiple-choice test. Concludes that performance test administration and scoring was no…

Dental Student Academic Integrity in U.S. Dental Schools: Current Status and Recommendations for Enhancement.

PubMed

Graham, Bruce S; Knight, G William; Graham, Linda

2016-01-01

Cheating incidents in 2006-07 led U.S. dental schools to heighten their efforts to enhance the environment of academic integrity in their institutions. The aims of this study were to document the measures being used by U.S. dental schools to discourage student cheating, determine the current incidence of reported cheating, and make recommendations for enhancing a culture of integrity in dental education. In late 2014-early 2015, an online survey was distributed to academic deans of all 61 accredited U.S. dental schools that had four classes of dental students enrolled; 50 (82%) responded. Among measures used, 98% of respondents reported having policy statements regarding student academic integrity, 92% had an Honor Code, 96% provided student orientation to integrity policies, and most used proctoring of final exams (91%) and tests (93%). Regarding disciplinary processes, 27% reported their faculty members only rarely reported suspected cheating (though required in 76% of the schools), and 40% disseminated anonymous results of disciplinary hearings. A smaller number of schools (n=36) responded to the question about student cheating than to other questions; those results suggested that reported cheating had increased almost threefold since 1998. The authors recommend that schools add cheating case scenarios to professional ethics curricula; disseminate outcomes of cheating enforcement actions; have students sign a statement attesting to compliance with academic integrity policies at every testing activity; add curricular content on correct writing techniques to avoid plagiarism; require faculty to distribute retired test items; acquire examination-authoring software programs to enable faculty to generate new multiple-choice items and different versions of the same multiple-choice tests; avoid take-home exams when assessing independent student knowledge; and utilize student assessment methods directly relevant to clinical practice.
Integrating personalized medical test contents with XML and XSL-FO.

PubMed

Toddenroth, Dennis; Dugas, Martin; Frankewitsch, Thomas

2011-03-01

In 2004 the adoption of a modular curriculum at the medical faculty in Muenster led to the introduction of centralized examinations based on multiple-choice questions (MCQs). We report on how organizational challenges of realizing faculty-wide personalized tests were addressed by implementation of a specialized software module to automatically generate test sheets from individual test registrations and MCQ contents. Key steps of the presented method for preparing personalized test sheets are (1) the compilation of relevant item contents and graphical media from a relational database with database queries, (2) the creation of Extensible Markup Language (XML) intermediates, and (3) the transformation into paginated documents. The software module by use of an open source print formatter consistently produced high-quality test sheets, while the blending of vectorized textual contents and pixel graphics resulted in efficient output file sizes. Concomitantly the module permitted an individual randomization of item sequences to prevent illicit collusion. The automatic generation of personalized MCQ test sheets is feasible using freely available open source software libraries, and can be efficiently deployed on a faculty-wide scale.
Technical flaws in multiple-choice questions in the access exam to medical specialties ("examen MIR") in Spain (2009-2013).

PubMed

Rodríguez-Díez, María Cristina; Alegre, Manuel; Díez, Nieves; Arbea, Leire; Ferrer, Marta

2016-02-03

The main factor that determines the selection of a medical specialty in Spain after obtaining a medical degree is the MIR ("médico interno residente", internal medical resident) exam. This exam consists of 235 multiple-choice questions with five options, some of which include images provided in a separate booklet. The aim of this study was to analyze the technical quality of the multiple-choice questions included in the MIR exam over the last five years. All the questions included in the exams from 2009 to 2013 were analyzed. We studied the proportion of questions including clinical vignettes, the number of items related to an image and the presence of technical flaws in the questions. For the analysis of technical flaws, we adapted the National Board of Medical Examiners (NBME) guidelines. We looked for 18 different issues included in the manual, grouped into two categories: issues related to testwiseness and issues related to irrelevant difficulties. The final number of questions analyzed was 1,143. The percentage of items based on clinical vignettes increased from 50% in 2009 to 56-58% in the following years (2010-2013). The percentage of items based on an image increased progressively from 10% in 2009 to 15% in 2012 and 2013. The percentage of items with at least one technical flaw varied between 68 and 72%. We observed a decrease in the percentage of items with flaws related to testwiseness, from 30% in 2009 to 20% in 2012 and 2013. While most of these issues decreased dramatically or even disappeared (such as the imbalance in the correct option numbers), the presence of non-plausible options remained frequent. With regard to technical flaws related to irrelevant difficulties, no improvement was observed; this is especially true with respect to negative stem questions and "hinged" questions. The formal quality of the MIR exam items has improved over the last five years with regard to testwiseness. A more detailed revision of the items submitted, checking systematically for the presence of technical flaws, could improve the validity and discriminatory power of the exam, without increasing its difficulty.
Taking the Test Taker's Perspective: Response Process and Test Motivation in Multidimensional Forced-Choice Versus Rating Scale Instruments.

PubMed

Sass, Rachelle; Frick, Susanne; Reips, Ulf-Dietrich; Wetzel, Eunike

2018-03-01

The multidimensional forced-choice (MFC) format has been proposed as an alternative to the rating scale (RS) response format. However, it is unclear how changing the response format may affect the response process and test motivation of participants. In Study 1, we investigated the MFC response process using the think-aloud technique. In Study 2, we compared test motivation between the RS format and different versions of the MFC format (presenting 2, 3, 4, and 5 items simultaneously). The response process to MFC item blocks was similar to the RS response process but involved an additional step of weighing the items within a block against each other. The RS and MFC response format groups did not differ in their test motivation. Thus, from the test taker's perspective, the MFC format is somewhat more demanding to respond to, but this does not appear to decrease test motivation.
Development and Validation of the Homeostasis Concept Inventory

PubMed Central

McFarland, Jenny L.; Price, Rebecca M.; Wenderoth, Mary Pat; Martinková, Patrícia; Cliff, William; Michael, Joel; Modell, Harold; Wright, Ann

2017-01-01

We present the Homeostasis Concept Inventory (HCI), a 20-item multiple-choice instrument that assesses how well undergraduates understand this critical physiological concept. We used an iterative process to develop a set of questions based on elements in the Homeostasis Concept Framework. This process involved faculty experts and undergraduate students from associate’s colleges, primarily undergraduate institutions, regional and research-intensive universities, and professional schools. Statistical results provided strong evidence for the validity and reliability of the HCI. We found that graduate students performed better than undergraduates, biology majors performed better than nonmajors, and students performed better after receiving instruction about homeostasis. We used differential item analysis to assess whether students from different genders, races/ethnicities, and English language status performed differently on individual items of the HCI. We found no evidence of differential item functioning, suggesting that the items do not incorporate cultural or gender biases that would impact students’ performance on the test. Instructors can use the HCI to guide their teaching and student learning of homeostasis, a core concept of physiology. PMID:28572177
"None of the above" as a correct and incorrect alternative on a multiple-choice test: implications for the testing effect.

PubMed

Odegard, Timothy N; Koen, Joshua D

2007-11-01

Both positive and negative testing effects have been demonstrated with a variety of materials and paradigms (Roediger & Karpicke, 2006b). The present series of experiments replicate and extend the research of Roediger and Marsh (2005) with the addition of a "none-of-the-above" response option. Participants (n=32 in both experiments) read a set of passages, took an initial multiple-choice test, completed a filler task, and then completed a final cued-recall test (Experiment 1) or multiple-choice test (Experiment 2). Questions were manipulated on the initial multiple-choice test by adding a "none-of-the-above" response alternative (choice "E") that was incorrect ("E" Incorrect) or correct ("E" Correct). The results from both experiments demonstrated that the positive testing effect was negated when the "none-of-the-above" alternative was the correct response on the initial multiple-choice test, but was still present when the "none-of-the-above" alternative was an incorrect response.
[Continuing medical education: how to write multiple choice questions].

PubMed

Soler Fernández, R; Méndez Díaz, C; Rodríguez García, E

2013-06-01

Evaluating professional competence in medicine is a difficult but indispensable task because it makes it possible to evaluate, at different times and from different perspectives, the extent to which the knowledge, skills, and values required for exercising the profession have been acquired. Tests based on multiple choice questions have been and continue to be among the most useful tools for objectively evaluating learning in medicine. When these tests are well designed and correctly used, they can stimulate learning and even measure higher cognitive skills. Designing a multiple choice test is a difficult task that requires knowledge of the material to be tested and of the methodology of test preparation as well as time to prepare the test. The aim of this article is to review what can be evaluated through multiple choice tests, the rules and guidelines that should be taken into account when writing multiple choice questions, the different formats that can be used, the most common errors in elaborating multiple choice tests, and how to analyze the results of the test to verify its quality. Copyright © 2012 SERAM. Published by Elsevier Espana. All rights reserved.
Student perception and post-exam analysis of one best MCQs and one correct MCQs: A comparative study.

PubMed

Adhi, Mohammad Idrees; Aly, Syed Moyn

2018-04-01

To find differences between One-Correct and One-Best multiple-choice questions with relation to student scores, post-exam item analyses results and student perception. This comparative cross-sectional study was conducted at the Dow University of Health Sciences, Karachi, from November 2010 to April 2011, and comprised medical students. Data was analysed using SPSS 18. Of the 207 participants, 16(7.7%) were boys and 191(92.3%) were girls. The mean score in Paper I was 18.62±4.7, while in Paper II it was 19.58±6.1. One-Best multiple-choice questions performed better than One-Correct. There was no statistically significant difference in the mean scores of the two papers or in the difficulty indices. Difficulty and discrimination indices correlated well in both papers. Cronbach's alpha of paper I was 0.584 and that of paper II was 0.696. Point-biserial values were better for paper II than for paper I. Most students expressed dissatisfaction with paper II. One-Best multiple-choice questions showed better scores, higher reliability, better item performance and correlation values.
Translation of P = kT into a Pictorial External Representation by High School Seniors

ERIC Educational Resources Information Center

Matijaševic, Igor; Korolija, Jasminka N.; Mandic, Ljuba M.

2016-01-01

This paper describes the results achieved by high school seniors on an item which involves translation of the equation P = kT into a corresponding pictorial external representation. The majority of students (the classes of 2011, 2012 and 2013) did not give the correct answer to the multiple choice part of the translation item. They chose pictorial…
Comparing Two Types of Diagnostic Items to Evaluate Understanding of Heat and Temperature Concepts

ERIC Educational Resources Information Center

Chu, Hye-Eun; Chandrasegaran, A. L.; Treagust, David F.

2018-01-01

The purpose of this research was to investigate an efficient method to assess year 8 (age 13-14) students' conceptual understanding of heat and temperature concepts. Two different types of instruments were used in this study: Type 1, consisting of multiple-choice items with open-ended justifications; and Type 2, consisting of two-tier…
Interpreting Secondary Students' Performance on a Timed, Multiple-Choice Reading Comprehension Assessment: The Prevalence and Impact of Non-Attempted Items

ERIC Educational Resources Information Center

Clemens, Nathan H.; Davis, John L.; Simmons, Leslie E.; Oslund, Eric L.; Simmons, Deborah C.

2015-01-01

Standardized measures are often used as an index of students' reading comprehension and scores have important implications, particularly for students who perform below expectations. This study examined secondary-level students' patterns of responding and the prevalence and impact of non-attempted items on a timed, group-administered,…
Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

PubMed Central

Andrich, David; Marais, Ida; Humphry, Stephen Mark

2015-01-01

Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The consequence is that the proficiencies of the more proficient students are increased relative to those of the less proficient. Not controlling the guessing bias underestimates the progress of students across 7 years of schooling with important educational implications. PMID:29795871
Informed choice: understanding knowledge in the context of screening uptake.

PubMed

Michie, Susan; Dormandy, Elizabeth; Marteau, Theresa M

2003-07-01

This study evaluates a scale measuring knowledge about a screening test and investigates the association between knowledge, uptake and attitudes towards screening. One thousand four hundred ninety-nine pregnant women completed the knowledge scale of the multidimensional measure of informed choice (MMIC). Three hundred forty-five of these women and 152 professionals providing antenatal care also rated the importance of the knowledge items. Item characteristic curves show that, with one exception, the knowledge items reflect a spread of difficulty and are able to discriminate between people. All items were seen as essential or helpful by both women and health professionals, with two items seen as particularly important and one as unimportant. There were some differences between health professionals, women with low risk results and women with high risk results. Knowledge was not associated with uptake, attitude, or the extent to which uptake was consistent with women's attitudes towards undergoing the test.
Development of a representational conceptual evaluation in the first law of thermodynamics

NASA Astrophysics Data System (ADS)

Sriyansyah, S. P.; Suhandi, A.

2016-08-01

As part of an ongoing research to investigate student consistency in understanding the first law of thermodynamics, a representational conceptual evaluation (RCET) has been developed to assess student conceptual understanding, representational consistency, and scientific consistency in the introductory physics course. Previous physics education research findings were used to develop the test. RCET items were 30 items which designed as an isomorphic multiple-choice test with three different representations concerning the concept of work, heat, first law of thermodynamics, and its application in the thermodynamic processes. Here, we present preliminary measures of the validity and reliability of the instrument, including the classical test statistics. This instrument can be used to measure the intended concept in the first law of thermodynamics and it will give the consistent results with the ability to differentiate well between high-achieving students and low-achieving students and also students at different level. As well as measuring the effectiveness of the learning process in the concept of the first law of thermodynamics.
Analyzing Test-Taking Behavior: Decision Theory Meets Psychometric Theory.

PubMed

Budescu, David V; Bo, Yuanchao

2015-12-01

We investigate the implications of penalizing incorrect answers to multiple-choice tests, from the perspective of both test-takers and test-makers. To do so, we use a model that combines a well-known item response theory model with prospect theory (Kahneman and Tversky, Prospect theory: An analysis of decision under risk, Econometrica 47:263-91, 1979). Our results reveal that when test-takers are fully informed of the scoring rule, the use of any penalty has detrimental effects for both test-takers (they are always penalized in excess, particularly those who are risk averse and loss averse) and test-makers (the bias of the estimated scores, as well as the variance and skewness of their distribution, increase as a function of the severity of the penalty).
Lower-fat menu items in restaurants satisfy customers.

PubMed

Fitzpatrick, M P; Chapman, G E; Barr, S I

1997-05-01

To evaluate a restaurant-based nutrition program by measuring customer satisfaction with lower-fat menu items and assessing patrons' reactions to the program. Questionnaires to assess satisfaction with menu items were administered to patrons in eight of the nine restaurants that volunteered to participate in the nutrition program. One patron from each participating restaurant was randomly selected for a semistructured interview about nutrition programming in restaurants. Persons dining in eight participating restaurants over a 1-week period (n = 686). Independent samples t tests were used to compare respondents' satisfaction with lower-fat and regular menu items. Two-way analysis of variance tests were completed using overall satisfaction as the dependent variable and menu-item classification (ie, lower fat or regular) and one of eight other menu item and respondent characteristics as independent variables. Qualitative methods were used to analyze interview transcripts. Of 1,127 menu items rated for satisfaction, 205 were lower fat, 878 were regular, and 44 were of unknown classification. Customers were significantly more satisfied with lower-fat than with regular menu items (P < .001). Overall satisfaction did not vary by any of the other independent variables. Interview results indicate the importance of restaurant during as an indulgent experience. High satisfaction with lower-fat menu items suggests that customers will support restaurant providing such choices. Dietitians can use these findings to encourage restaurateurs to include lower-fat choices on their menus, and to assure clients that their expectations of being indulged are not incompatible with these choices.
Multi-step routes of capuchin monkeys in a laser pointer traveling salesman task.

PubMed

Howard, Allison M; Fragaszy, Dorothy M

2014-09-01

Prior studies have claimed that nonhuman primates plan their routes multiple steps in advance. However, a recent reexamination of multi-step route planning in nonhuman primates indicated that there is no evidence for planning more than one step ahead. We tested multi-step route planning in capuchin monkeys using a pointing device to "travel" to distal targets while stationary. This device enabled us to determine whether capuchins distinguish the spatial relationship between goals and themselves and spatial relationships between goals and the laser dot, allocentrically. In Experiment 1, two subjects were presented with identical food items in Near-Far (one item nearer to subject) and Equidistant (both items equidistant from subject) conditions with a laser dot visible between the items. Subjects moved the laser dot to the items using a joystick. In the Near-Far condition, one subject demonstrated a bias for items closest to self but the other subject chose efficiently. In the second experiment, subjects retrieved three food items in similar Near-Far and Equidistant arrangements. Both subjects preferred food items nearest the laser dot and showed no evidence of multi-step route planning. We conclude that these capuchins do not make choices on the basis of multi-step look ahead strategies. © 2014 Wiley Periodicals, Inc.
Validation of science virtual test to assess 8th grade students' critical thinking on living things and environmental sustainability theme

NASA Astrophysics Data System (ADS)

Rusyati, Lilit; Firman, Harry

2017-05-01

This research was motivated by the importance of multiple-choice questions that indicate the elements and sub-elements of critical thinking and implementation of computer-based test. The method used in this research was descriptive research for profiling the validation of science virtual test to measure students' critical thinking in junior high school. The participant is junior high school students of 8th grade (14 years old) while science teacher and expert as the validators. The instrument that used as a tool to capture the necessary data are sheet of an expert judgment, sheet of legibility test, and science virtual test package in multiple choice form with four possible answers. There are four steps to validate science virtual test to measure students' critical thinking on the theme of "Living Things and Environmental Sustainability" in 7th grade Junior High School. These steps are analysis of core competence and basic competence based on curriculum 2013, expert judgment, legibility test and trial test (limited and large trial test). The test item criterion based on trial test are accepted, accepted but need revision, and rejected. The reliability of the test is α = 0.747 that categorized as `high'. It means the test instruments used is reliable and high consistency. The validity of Rxy = 0.63 means that the validity of the instrument was categorized as `high' according to interpretation value of Rxy (correlation).
Multiple Imputation of Multilevel Missing Data-Rigor versus Simplicity

ERIC Educational Resources Information Center

Drechsler, Jörg

2015-01-01

Multiple imputation is widely accepted as the method of choice to address item-nonresponse in surveys. However, research on imputation strategies for the hierarchical structures that are typically found in the data in educational contexts is still limited. While a multilevel imputation model should be preferred from a theoretical point of view if…
Aggregating Polytomous DIF Results over Multiple Test Administrations

ERIC Educational Resources Information Center

Zwick, Rebecca; Ye, Lei; Isham, Steven

2018-01-01

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…

Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia

PubMed Central

Gaffas, Eisha M; Sequeira, Reginald P; Namla, Riyadh A Al; Al-Harbi, Khalid S

2012-01-01

Background The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations. Objective To develop test blueprints for the written examinations used in the psychiatry residency program. Methods Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom’s taxonomy were elicited. Results Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program. Conclusion A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs. PMID:23762000
Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia.

PubMed

Gaffas, Eisha M; Sequeira, Reginald P; Namla, Riyadh A Al; Al-Harbi, Khalid S

2012-01-01

The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations. To develop test blueprints for the written examinations used in the psychiatry residency program. Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom's taxonomy were elicited. Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program. A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs.
An Investigation into the Relationship between Students' Conceptions of the Particulate Nature of Matter and Their Understanding of Chemical Bonding

ERIC Educational Resources Information Center

Othman, Jazilah; Treagust, David F.; Chandrasegaran, A. L.

2008-01-01

A thorough understanding of chemical bonding requires familiarity with the particulate nature of matter. In this study, a two-tier multiple-choice diagnostic instrument consisting of ten items (five items involving each of the two concepts) was developed to assess students' understanding of the particulate nature of matter and chemical bonding so…
Influence of multiple categories on the prediction of unknown properties

PubMed Central

Verde, Michael F.; Murphy, Gregory L.; Ross, Brian H.

2006-01-01

Knowing an item's category helps us predict its unknown properties. Previous studies suggest that when asked to evaluate the probability of an unknown property, people tend to consider only an item's most likely category, ignoring alternative categories. In the present study, property prediction took the form of either a probability rating or a speeded, binary-choice judgment. Consistent with past findings, subjects ignored alternative categories in their probability ratings. However, their binary-choice judgments were influenced by alternative categories. This novel finding suggests that the way category knowledge is used in prediction depends critically on the form of the prediction. PMID:16156183
Optimizing multiple-choice tests as tools for learning.

PubMed

Little, Jeri L; Bjork, Elizabeth Ligon

2015-01-01

Answering multiple-choice questions with competitive alternatives can enhance performance on a later test, not only on questions about the information previously tested, but also on questions about related information not previously tested-in particular, on questions about information pertaining to the previously incorrect alternatives. In the present research, we assessed a possible explanation for this pattern: When multiple-choice questions contain competitive incorrect alternatives, test-takers are led to retrieve previously studied information pertaining to all of the alternatives in order to discriminate among them and select an answer, with such processing strengthening later access to information associated with both the correct and incorrect alternatives. Supporting this hypothesis, we found enhanced performance on a later cued-recall test for previously nontested questions when their answers had previously appeared as competitive incorrect alternatives in the initial multiple-choice test, but not when they had previously appeared as noncompetitive alternatives. Importantly, however, competitive alternatives were not more likely than noncompetitive alternatives to be intruded as incorrect responses, indicating that a general increased accessibility for previously presented incorrect alternatives could not be the explanation for these results. The present findings, replicated across two experiments (one in which corrective feedback was provided during the initial multiple-choice testing, and one in which it was not), thus strongly suggest that competitive multiple-choice questions can trigger beneficial retrieval processes for both tested and related information, and the results have implications for the effective use of multiple-choice tests as tools for learning.
Diagnostic grand rounds: a new teaching concept to train diagnostic reasoning.

PubMed

Stieger, Stefan; Praschinger, Andrea; Kletter, Kurt; Kainberger, Franz

2011-06-01

Diagnostic reasoning is a core skill in teaching and learning in undergraduate curricula. Diagnostic grand rounds (DGRs) as a subform of grand rounds are intended to train the students' skills in the selection of appropriate tests and in the interpretation of test results. The aim of this study was to test DGRs for their ability to improve diagnostic reasoning by using a pre-post-test design. During one winter term, all 398 fifth-year students (36.1% male, 63.9% female) solved 23 clinical cases presented in 8 DGRs. In an online questionnaire, a Diagnostic Thinking Inventory (DTI) with 41 items was evaluated for flexibility in thinking and structure of knowledge in memory. Results were correlated with those from a summative multiple-choice knowledge test and of the learning objectives in a logbook. The students' DTI scores in the post-test were significantly higher than those reported in the pre-test. DTI scores at either testing time did not correlate with medical knowledge as assessed by a multiple-choice knowledge test. Abilities acquired during clinical clerkships as documented in a logbook could only account for a small proportion of the increase in the flexibility subscale score. This effect still remained significant after accounting for potential confounders. Establishing DGRs proofed to be an effective way of successfully improving both students' diagnostic reasoning and the ability to select the appropriate test method in routine clinical practice. Copyright © 2009 Elsevier Ireland Ltd. All rights reserved.
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.

PubMed

Bernhofer, Esther I; St Marie, Barbara; Bena, James F

2017-08-01

All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
An Exploration of Positional Response Sets in Disadvantaged Children and a Technique for Reduction of Such Sets. Final Report.

ERIC Educational Resources Information Center

Victor, Jack

This study is concerned with an examination of tendencies among individuals or groups to vary in their selection of certain types of responses when the same choice is presented in some other form, the tendencies being termed "response sets." Positional response sets (PRS), to which multiple-choice type items are prone, have reportedly…
Measuring student learning using initial and final concept test in an STEM course

NASA Astrophysics Data System (ADS)

Kaw, Autar; Yalcin, Ali

2012-06-01

Effective assessment is a cornerstone in measuring student learning in higher education. For a course in Numerical Methods, a concept test was used as an assessment tool to measure student learning and its improvement during the course. The concept test comprised 16 multiple choice questions and was given in the beginning and end of the class for three semesters. Hake's gain index, a measure of learning gains from pre- to post-tests, of 0.36 to 0.41 were recorded. The validity and reliability of the concept test was checked via standard measures such as Cronbach's alpha, content and criterion-related validity, item characteristic curves and difficulty and discrimination indices. The performance of various subgroups such as pre-requisite grades, transfer students, gender and age were also studied.
Are Multiple Choice Tests Fair to Medical Students with Specific Learning Disabilities?

ERIC Educational Resources Information Center

Ricketts, Chris; Brice, Julie; Coombes, Lee

2010-01-01

The purpose of multiple choice tests of medical knowledge is to estimate as accurately as possible a candidate's level of knowledge. However, concern is sometimes expressed that multiple choice tests may also discriminate in undesirable and irrelevant ways, such as between minority ethnic groups or by sex of candidates. There is little literature…
Do Streaks Matter in Multiple-Choice Tests?

ERIC Educational Resources Information Center

Kiss, Hubert János; Selei, Adrienn

2018-01-01

Success in life is determined to a large extent by school performance, which in turn depends heavily on grades obtained in exams. In this study, we investigate a particular type of exam: multiple-choice tests. More concretely, we study if patterns of correct answers in multiple-choice tests affect performance. We design an experiment to study if…
Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review

ERIC Educational Resources Information Center

Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin

2017-01-01

Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…
Procedural aspects of the organization of the comprehensive European Board of Ophthalmology Diploma examination

PubMed Central

Sunaric-Mégevand, Gordana; Aclimandos, Wagih

2016-01-01

The comprehensive European Board of Ophthalmology Diploma (EBOD) examination is one of 38 European medical specialty examinations. This review aims at disclosing the specific procedures and content of the EBOD examination. It is a descriptive study summarizing the present organization of the EBOD examination. It is the 3rd largest European postgraduate medical assessment after anaesthesiology and cardiology. The master language is English for the Part 1 written test (knowledge test with 52 modified type X multiple-choice questions) (in the past the written test was also available in French and German). Ophthalmology training of minimum 4 years in a full or associated European Union of Medical Specialists (UEMS) member state is a prerequisite. Problem-solving skills are tested in the Part 2 oral assessment, which is a viva of 4 subjects conducted in English with support for native language whenever feasible. The comprehensive EBOD examination is one of the leading examinations organized by UEMS European Boards or Specialist Sections from the point of number of examinees, item banking, and item contents. PMID:27464640
Students’ understanding of forces: Force diagrams on horizontal and inclined plane

NASA Astrophysics Data System (ADS)

Sirait, J.; Hamdani; Mursyid, S.

2018-03-01

This study aims to analyse students’ difficulties in understanding force diagrams on horizontal surfaces and inclined planes. Physics education students (pre-service physics teachers) of Tanjungpura University, who had completed a Basic Physics course, took a Force concept test which has six questions covering three concepts: an object at rest, an object moving at constant speed, and an object moving at constant acceleration both on a horizontal surface and on an inclined plane. The test is in a multiple-choice format. It examines the ability of students to select appropriate force diagrams depending on the context. The results show that 44% of students have difficulties in solving the test (these students only could solve one or two items out of six items). About 50% of students faced difficulties finding the correct diagram of an object when it has constant speed and acceleration in both contexts. In general, students could only correctly identify 48% of the force diagrams on the test. The most difficult task for the students in terms was identifying the force diagram representing forces exerted on an object on in an inclined plane.
The Development of the Planet Formation Concept Inventory: A Preliminary Analysis of Version 1

NASA Astrophysics Data System (ADS)

Simon, Molly; Impey, Chris David; Buxner, Sanlyn

2018-01-01

The topic of planet formation is poorly represented in the educational literature, especially at the college level. As recently as 2014, when developing the Test of Astronomy Standards (TOAST), Slater (2014) noted that for two topics (formation of the Solar System and cosmology), “high quality test items that reflect our current understanding of students’ conceptions were not available [in the literature]” (Slater,2014, p. 8). Furthermore, nearly half of ASTR 101 enrollments are at 2 year/community colleges where both instructors and students have little access to current research and models of planet formation. In response, we administered six student replied response (SSR) short answer questions on the topic of planet formation to n = 1,050 students enrolled in introductory astronomy and planetary science courses at The University of Arizona in the Fall 2016 and Spring 2017 semesters. After analyzing and coding the data from the SSR questions, we developed a preliminary version of the Planet Formation Concept Inventory (PFCI). The PFCI is a multiple-choice instrument with 20 planet formation-related questions, and 4 demographic-related questions. We administered version 1 of the PFCI to six introductory astronomy and planetary science courses (n ~ 700 students) during the Fall 2017 semester. We provided students with 7-8 multiple-choice with explanation of reasoning (MCER) questions from the PFCI. Students selected an answer (similar to a traditional multiple-choice test), and then briefly explained why they chose the answer they did. We also conducted interviews with ~15 students to receive feedback on the quality of the questions and clarity of the instrument. We will present an analysis of the MCER responses and student interviews, and discuss any modifications that will be made to the instrument as a result.
Multimodal Likelihoods in Educational Assessment: Will the Real Maximum Likelihood Score Please Stand up?

ERIC Educational Resources Information Center

Wothke, Werner; Burket, George; Chen, Li-Sue; Gao, Furong; Shu, Lianghua; Chia, Mike

2011-01-01

It has been known for some time that item response theory (IRT) models may exhibit a likelihood function of a respondent's ability which may have multiple modes, flat modes, or both. These conditions, often associated with guessing of multiple-choice (MC) questions, can introduce uncertainty and bias to ability estimation by maximum likelihood…
Do item-writing flaws reduce examinations psychometric quality?

PubMed

Pais, João; Silva, Artur; Guimarães, Bruno; Povo, Ana; Coelho, Elisabete; Silva-Pereira, Fernanda; Lourinho, Isabel; Ferreira, Maria Amélia; Severo, Milton

2016-08-11

The psychometric characteristics of multiple-choice questions (MCQ) changed when taking into account their anatomical sites and the presence of item-writing flaws (IWF). The aim is to understand the impact of the anatomical sites and the presence of IWF in the psychometric qualities of the MCQ. 800 Clinical Anatomy MCQ from eight examinations were classified as standard or flawed items and according to one of the eight anatomical sites. An item was classified as flawed if it violated at least one of the principles of item writing. The difficulty and discrimination indices of each item were obtained. 55.8 % of the MCQ were flawed items. The anatomical site of the items explained 6.2 and 3.2 % of the difficulty and discrimination parameters and the IWF explained 2.8 and 0.8 %, respectively. The impact of the IWF was heterogeneous, the Writing the Stem and Writing the Choices categories had a negative impact (higher difficulty and lower discrimination) while the other categories did not have any impact. The anatomical site effect was higher than IWF effect in the psychometric characteristics of the examination. When constructing MCQ, the focus should be in the topic/area of the items and only after in the presence of IWF.
Development and validation of a patient self-assessed questionnaire on satisfaction with communication of the multiple sclerosis diagnosis.

PubMed

Solari, A; Mattarozzi, K; Vignatelli, L; Giordano, A; Russo, P M; Uccelli, M Messmer; D'Alessandro, R

2010-10-01

We describe the development and clinical validation of a patient self-administered tool assessing the quality of multiple sclerosis diagnosis disclosure. A multiple sclerosis expert panel generated questionnaire items from the Doctor's Interpersonal Skills Questionnaire, literature review, and interviews with neurology inpatients. The resulting 19-item Comunicazione medico-paziente nella Sclerosi Multipla (COSM) was pilot tested/debriefed on seven patients with multiple sclerosis and administered to 80 patients newly diagnosed with multiple sclerosis. The resulting revised 20-item version (COSM-R) was debriefed on five patients with multiple sclerosis, field tested/debriefed on multiple sclerosis patients, and field tested on 105 patients newly diagnosed with multiple sclerosis participating in a clinical trial on an information aid. The hypothesized monofactorial structure of COSM-R section 2 was tested on the latter two groups. The questionnaire was well accepted. Scaling assumptions were satisfactory in terms of score distributions, item-total correlations and internal consistency. Factor analysis confirmed section 2's monofactorial structure, which was also test-retest reliable (intraclass correlation coefficient [ICC] 0.73; 95% CI 0.54-0.85). Section 1 had only fair test-retest reliability (ICC 0.45; 95% CI 0.12-0.69), and three items had 8-21% missed responses. COSM-R is a brief, easy-to-interpret MS-specific questionnaire for use as a health care indicator.
Can Multiple-Choice Testing Induce Desirable Difficulties? Evidence from the Laboratory and the Classroom.

PubMed

Bjork, Elizabeth Ligon; Soderstrom, Nicholas C; Little, Jeri L

2015-01-01

The term desirable difficulties (Bjork, 1994) refers to conditions of learning that, though often appearing to cause difficulties for the learner and to slow down the process of acquisition, actually improve long-term retention and transfer. One known desirable difficulty is testing (as compared with restudy), although typically it is tests that clearly involve retrieval--such as free and cued recall tests--that are thought to induce these learning benefits and not multiple-choice tests. Nonetheless, multiple-choice testing is ubiquitous in educational settings and many other high-stakes situations. In this article, we discuss research, in both the laboratory and the classroom, exploring whether multiple-choice testing can also be fashioned to promote the type of retrieval processes known to improve learning, and we speculate about the necessary properties that multiple-choice questions must possess, as well as the metacognitive strategy students need to use in answering such questions, to achieve this goal.

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Risk of error estimated from Palestine pharmacists' knowledge and certainty on the adverse effects and contraindications of active pharmaceutical ingredients and excipients.

PubMed

Shawahna, Ramzi; Al-Rjoub, Mohammed; Al-Horoub, Mohammed M; Al-Hroub, Wasif; Al-Rjoub, Bisan; Al-Nabi, Bashaaer Abd

2016-01-01

This study aimed to investigate community pharmacists' knowledge and certainty of adverse effects and contraindications of pharmaceutical products to estimate the risk of error. Factors influencing their knowledge and certainty were also investigated. The knowledge of community pharmacists was assessed in a cross-sectional design using a multiple-choice questions test on the adverse effects and contraindications of active pharmaceutical ingredients and excipients from May 2014 to March 2015. Self-rated certainty scores were also recorded for each question. Knowledge and certainty scores were combined to estimate the risk of error. Out of 315 subjects, 129 community pharmacists (41.0%) completed the 30 multiple-choice questions test on active ingredients and excipients. Knowledge on active ingredients was associated with the year of graduation and obtaining a licence to practice pharmacy. Knowledge on excipients was associated with the degree obtained. There was higher risk of error in items on excipients than those on ingredients (P<0.01). The knowledge of community pharmacists in Palestine was insufficient with high risk of errors. Knowledge of community pharmacists on the safety issues of active ingredients and excipients need to be improved.
All of the above: When multiple correct response options enhance the testing effect.

PubMed

Bishara, Anthony J; Lanzo, Lauren A

2015-01-01

Previous research has shown that multiple choice tests often improve memory retention. However, the presence of incorrect lures often attenuates this memory benefit. The current research examined the effects of "all of the above" (AOTA) options. When such options are correct, no incorrect lures are present. In the first three experiments, a correct AOTA option on an initial test led to a larger memory benefit than no test and standard multiple choice test conditions. The benefits of a correct AOTA option occurred even without feedback on the initial test; for both 5-minute and 48-hour retention delays; and for both cued recall and multiple choice final test formats. In the final experiment, an AOTA question led to better memory retention than did a control condition that had identical timing and exposure to response options. However, the benefits relative to this control condition were similar regardless of the type of multiple choice test (AOTA or not). Results suggest that retrieval contributes to multiple choice testing effects. However, the extra testing effect from a correct AOTA option, rather than being due to more retrieval, might be due simply to more exposure to correct information.
Measurement of ethical food choice motives.

PubMed

Lindeman, M; Väänänen, M

2000-02-01

The two studies describe the development of three complementary scales to the Food Choice Questionnaire developed by Steptoe, Pollard & Wardle (1995). The new items address various ethical food choice motives and were derived from previous studies on vegetarianism and ethical food choice. The items were factor analysed in Study 1 (N=281) and the factor solution was confirmed in Study 2 (N=125), in which simple validity criteria were also included. Furthermore, test-retest reliability was assessed with a separate sample of subjects (N=36). The results indicated that the three new scales, Ecological Welfare (including subscales for Animal Welfare and Environment Protection), Political Values and Religion, are reliable and valid instruments for a brief screening of ethical food choice reasons. Copyright 2000 Academic Press.
Prueba de Ciencia Primer Grado (Science Test for the First Grade). [In Spanish

ERIC Educational Resources Information Center

Puerto Rico State Dept. of Education, Hato Rey.

This document consists of three parts: (1) a manual for administering the science test to first graders (in Spanish), (2) a copy of the test itself (pictorial), and (3) a list of expected competencies in science for the first three grades (in English). The test consists of 25, four-choice items. For each item, the administrator reads a statement…
Effects of Mayfield's Four Questions (M4Q) on Nursing Students' Self-Efficacy and Multiple-Choice Test Scores

ERIC Educational Resources Information Center

Mayfield, Linda Riggs

2010-01-01

This study examined the effects of being taught the Mayfield's Four Questions multiple-choice test-taking strategy on the perceived self-efficacy and multiple-choice test scores of nursing students in a two-year associate degree program. Experimental and control groups were chosen by stratified random sampling. Subjects completed the 10-statement…
Attention! Can choices for low value food over high value food be trained?

PubMed

Zoltak, Michael J; Veling, Harm; Chen, Zhang; Holland, Rob W

2018-05-01

People choose high value food items over low value food items, because food choices are guided by the comparison of values placed upon choice alternatives. This value comparison process is also influenced by the amount of attention people allocate to different items. Recent research shows that choices for food items can be increased by training attention toward these items, with a paradigm named cued-approach training (CAT). However, previous work till now has only examined the influence of CAT on choices between two equally valued items. It has remained unclear whether CAT can increase choices for low value items when people choose between a low and high value food item. To address this question in the current study participants were cued to make rapid responses in CAT to certain low and high value items. Next, they made binary choices between low and high value items, where we systematically varied whether the low and high value items were cued or uncued. In two experiments, we found that participants overall preferred high over low value food items for real consumption. More important, their choices for low value items increased when only the low value item had been cued in CAT compared to when both low and high value items had not been cued. Exploratory analyses revealed that this effect was more pronounced for participants with a relatively small value difference between low and high value items. The present research thus suggests that CAT may be used to boost the choice and consumption of low value items via enhanced attention toward these items, as long as the value difference is not too large. Implications for facilitating choices for healthy food are discussed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Criterion-Referenced Test Items for Welding.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
Assessment of High-school Students Engaged in the EarthLabs Climate Modules using the Climate Concept Inventory

NASA Astrophysics Data System (ADS)

McNeal, K.; Libarkin, J. C.; Ledley, T. S.; Gold, A. U.; Lynds, S. E.; Haddad, N.; Ellins, K.; Dunlap, C.; Bardar, E. W.; Youngman, E.

2015-12-01

Instructors must have on hand appropriate assessments that align with their teaching and learning goals in order to provide evidence of student learning. We have worked with curriculum developers and scientists to develop the Climate Concept Inventory (CCI), which meets goals of the EarthLabs Climate on-line curriculum. The developed concept inventory includes 19 content-driven multiple choice questions, six affective-based multiple choice questions, one confidence question, three open-ended questions, and eight demographic questions. Our analysis of the instrument applies item response theory and uses item characteristic curves. We have assessed over 500 students in nearly twenty high school classrooms in Mississippi and Texas that have engaged in the implementation of the EarthLabs curriculum and completed the CCI. Results indicate that students had pre-post gains on 9 out of 10 of the content-based multiple choice questions with positive gains in answer choice selection ranging from 1.72% to 42%. Students significantly reported increased confidence with 15% more students reporting that they were either very or fairly confident with their answers. Of the six affective questions posed, 5 out of 6 showed significant shifts towards gains in knowledge, awareness, and information about Earth's climate system. The research has resulted in a robust and validated climate concept inventory for use with advanced high school students, where we have been able to apply its use within the EarthLabs project.
Optimal Bayesian Adaptive Design for Test-Item Calibration.

PubMed

van der Linden, Wim J; Ren, Hao

2015-06-01

An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
The Effectiveness of learning materials based on multiple intelligence on the understanding of global warming

NASA Astrophysics Data System (ADS)

Liliawati, W.; Purwanto; Zulfikar, A.; Kamal, R. N.

2018-05-01

This study aims to examine the effectiveness of the use of teaching materials based on multiple intelligences on the understanding of high school students’ material on the theme of global warming. The research method used is static-group pretest-posttest design. Participants of the study were 60 high school students of XI class in one of the high schools in Bandung. Participants were divided into two classes of 30 students each for the experimental class and control class. The experimental class uses compound-based teaching materials while the experimental class does not use a compound intelligence-based teaching material. The instrument used is a test of understanding of the concept of global warming with multiple choices form amounted to 15 questions and 5 essay items. The test is given before and after it is applied to both classes. Data analysis using N-gain and effect size. The results obtained that the N-gain for both classes is in the medium category and the effectiveness of the use of teaching materials based on the results of effect-size test results obtained in the high category.
Confidence in Forced-Choice Recognition: What Underlies the Ratings?

ERIC Educational Resources Information Center

Zawadzka, Katarzyna; Higham, Philip A.; Hanczakowski, Maciej

2017-01-01

Two-alternative forced-choice recognition tests are commonly used to assess recognition accuracy that is uncontaminated by changes in bias. In such tests, participants are asked to endorse the studied item out of 2 presented alternatives. Participants may be further asked to provide confidence judgments for their recognition decisions. It is often…
A Multidimensional Scaling Approach to Dimensionality Assessment for Measurement Instruments Modeled by Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Toro, Maritsa

2011-01-01

The statistical assessment of dimensionality provides evidence of the underlying constructs measured by a survey or test instrument. This study focuses on educational measurement, specifically tests comprised of items described as multidimensional. That is, items that require examinee proficiency in multiple content areas and/or multiple cognitive…
Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor

ERIC Educational Resources Information Center

Shih, Ching-Lin; Wang, Wen-Chung

2009-01-01

The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general,…
Expanding the basic science debate: the role of physics knowledge in interpreting clinical findings.

PubMed

Goldszmidt, Mark; Minda, John Paul; Devantier, Sarah L; Skye, Aimee L; Woods, Nicole N

2012-10-01

Current research suggests a role for biomedical knowledge in learning and retaining concepts related to medical diagnosis. However, learning may be influenced by other, non-biomedical knowledge. We explored this idea using an experimental design and examined the effects of causal knowledge on the learning, retention, and interpretation of medical information. Participants studied a handout about several respiratory disorders and how to interpret respiratory exam findings. The control group received the information in standard "textbook" format and the experimental group was presented with the same information as well as a causal explanation about how sound travels through lungs in both the normal and disease states. Comprehension and memory of the information was evaluated with a multiple-choice exam. Several questions that were not related to the causal knowledge served as control items. Questions related to the interpretation of physical exam findings served as the critical test items. The experimental group outperformed the control group on the critical test items, and our study shows that a causal explanation can improve a student's memory for interpreting clinical details. We suggest an expansion of which basic sciences are considered fundamental to medical education.
An experimental study of a museum-based, science PD programme's impact on teachers and their students

NASA Astrophysics Data System (ADS)

Aaron Price, C.; Chiu, A.

2018-06-01

We present results of an experimental study of an urban, museum-based science teacher PD programme. A total of 125 teachers and 1676 of their students in grades 4-8 were tested at the beginning and end of the school year in which the PD programme took place. Teachers and students were assessed on subject content knowledge and attitudes towards science, along with teacher classroom behaviour. Subject content questions were mostly taken from standardised state tests and literature, with an 'Explain:' prompt added to some items. Teachers in the treatment group showed a 7% gain in subject content knowledge over the control group. Students of teachers in the treatment group showed a 4% gain in subject content knowledge over the control group on multiple-choice items and an 11% gain on the constructed response items. There was no overall change in science attitudes of teachers or students over the control groups but we did find differences in teachers' reported self-efficacy and teaching anxiety levels, plus PD teachers reported doing more student-centered science teaching activities than the control group. All teachers came into the PD with high initial excitement, perhaps reflecting its context within an informal learning environment.
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Endogenous Formation of Preferences: Choices Systematically Change Willingness-to-Pay for Goods

ERIC Educational Resources Information Center

Voigt, Katharina; Murawski, Carsten; Bode, Stefan

2017-01-01

Standard decision theory assumes that choices result from stable preferences. This position has been challenged by claims that the act of choosing between goods may alter preferences. To test this claim, we investigated in three experiments whether choices between equally valued snack food items can systematically shape preferences. We directly…
Comparison of paragraph comprehension test scores with reading versus listening-reading and multiple-choice versus nominal recall administration techniques: justification for the bypass approach.

PubMed

Weinberg, W A; McLean, A; Snider, R L; Rintelmann, J W; Brumback, R A

1989-12-01

Eight groups of learning disabled children (N = 100), categorized by the clinical Lexical Paradigm as good readers or poor readers, were individually administered the Gilmore Oral Reading Test, Form D, by one of four input/retrieval methods: (1) the standardized method of administration in which the child reads each paragraph aloud and then answers five questions relating to the paragraph [read/recall method]; (2) the child reads each paragraph aloud and then for each question selects the correct answer from among three choices read by the examiner [read/choice method]; (3) the examiner reads each paragraph aloud and reads each of the five questions to the child to answer [listen/recall method]; and (4) the examiner reads each paragraph aloud and then for each question reads three multiple-choice answers from which the child selects the correct answer [listen/choice method]. The major difference in scores was between the groups tested by the recall versus the orally read multiple-choice methods. This study indicated that poor readers who listened to the material and were tested by orally read multiple-choice format could perform as well as good readers. The performance of good readers was not affected by listening or by the method of testing. The multiple-choice testing improved the performance of poor readers independent of the input method. This supports the arguments made previously that a "bypass approach" to education of poor readers in which testing is accomplished using an orally read multiple-choice format can enhance the child's school performance on reading-related tasks. Using a listening while reading input method may further enhance performance.
Multicategorical Spline Model for Item Response Theory.

ERIC Educational Resources Information Center

Abrahamowicz, Michal; Ramsay, James O.

1992-01-01

A nonparametric multicategorical model for multiple-choice data is proposed as an extension of the binary spline model of J. O. Ramsay and M. Abrahamowicz (1989). Results of two Monte Carlo studies illustrate the model, which approximates probability functions by rational splines. (SLD)

Development of the Newtonian Gravity Concept Inventory

ERIC Educational Resources Information Center

Williamson, Kathryn E.; Willoughby, Shannon; Prather, Edward E.

2013-01-01

We introduce the Newtonian Gravity Concept Inventory (NGCI), a 26-item multiple-choice instrument to assess introductory general education college astronomy ("Astro 101") student understanding of Newtonian gravity. This paper describes the development of the NGCI through four phases: Planning, Construction, Quantitative Analysis, and…
Testing Collective Memory: Representing the Soviet Union on Multiple-Choice Questions

ERIC Educational Resources Information Center

Reich, Gabriel A.

2011-01-01

This article tests the assumption that state-mandated multiple-choice history exams are a cultural tool for disseminating an "official" collective memory. Findings from a qualitative study of a collection of multiple-choice questions that relate to the history of the Soviet Union are presented. The 263 questions all come from New York…
The Relationship of Deep and Surface Study Approaches on Factual and Applied Test-Bank Multiple-Choice Question Performance

ERIC Educational Resources Information Center

Yonker, Julie E.

2011-01-01

With the advent of online test banks and large introductory classes, instructors have often turned to textbook publisher-generated multiple-choice question (MCQ) exams in their courses. Multiple-choice questions are often divided into categories of factual or applied, thereby implicating levels of cognitive processing. This investigation examined…
DIF Detection Using Multiple-Group Categorical CFA with Minimum Free Baseline Approach

ERIC Educational Resources Information Center

Chang, Yu-Wei; Huang, Wei-Kang; Tsai, Rung-Ching

2015-01-01

The aim of this study is to assess the efficiency of using the multiple-group categorical confirmatory factor analysis (MCCFA) and the robust chi-square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all…
Psychometric characteristics of Clinical Reasoning Problems (CRPs) and its correlation with routine multiple choice question (MCQ) in Cardiology department.

PubMed

Derakhshandeh, Zahra; Amini, Mitra; Kojuri, Javad; Dehbozorgian, Marziyeh

2018-01-01

Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students' learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents' academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs.
Development and analysis of an instrument to assess student understanding of GOB chemistry knowledge relevant to clinical nursing practice.

PubMed

Brown, Corina E; Hyslop, Richard M; Barbera, Jack

2015-01-01

The General, Organic, and Biological Chemistry Knowledge Assessment (GOB-CKA) is a multiple-choice instrument designed to assess students' understanding of the chemistry topics deemed important to clinical nursing practice. This manuscript describes the development process of the individual items along with a psychometric evaluation of the final version of the items and instrument. In developing items for the GOB-CKA, essential topics were identified through a series of expert interviews (with practicing nurses, nurse educators, and GOB chemistry instructors) and confirmed through a national survey. Individual items were tested in qualitative studies with students from the target population for clarity and wording. Data from pilot and beta studies were used to evaluate each item and narrow the total item count to 45. A psychometric analysis performed on data from the 45-item final version was used to provide evidence of validity and reliability. The final version of the instrument has a Cronbach's alpha value of 0.76. Feedback from an expert panel provided evidence of face and content validity. Convergent validity was estimated by comparing the results from the GOB-CKA with the General-Organic-Biochemistry Exam (Form 2007) of the American Chemical Society. Instructors who wish to use the GOB-CKA for teaching and research may contact the corresponding author for a copy of the instrument. © 2014 Wiley Periodicals, Inc.
Validation of an Instrument for Measuring Students' Understanding of Interdisciplinary Science in Grades 4-8 over Multiple Semesters: A Rasch Measurement Study

ERIC Educational Resources Information Center

Yang, Yang; He, Peng; Liu, Xiufeng

2018-01-01

So far, not enough effort has been invested in developing reliable, valid, and engaging assessments in school science, especially assessment of interdisciplinary science based on the new Next Generation Science Standards (NGSS). Furthermore, previous tools rely mostly on multiple-choice items and evaluation of student outcome is linked only to…
Get it while it's hot: a peak-first bias in self-generated choice order in rhesus macaques.

PubMed

Jung, Kanghoon; Kralik, Jerald D

2013-01-01

Animals typically must make a number of successive choices to achieve a goal: e.g., eating multiple food items before becoming satiated. However, it is unclear whether choosing the best first or saving the best for last represents the best choice strategy to maximize overall reward. Specifically, since outcomes can be evaluated prospectively (with future rewards discounted and more immediate rewards preferred) or retrospectively (with prior rewards discounted and more recent rewards preferred), the conditions under which each are used remains unclear. On the one hand, humans and non-human animals clearly discount future reward, preferring immediate rewards to delayed ones, suggesting prospective evaluation; on the other hand, it has also been shown that a sequence that ends well, i.e., with the best event or item last, is often preferred, suggesting retrospective evaluation. Here we hypothesized that when individuals are allowed to build the sequence themselves they are more likely to evaluate each item individually and therefore build a sequence using prospective evaluation. We examined the relationship between self-generated choice order and preference in rhesus monkeys in two experiments in which the distinctiveness of options were relatively high and low, respectively. We observed a positive linear relationship between choice order and preference among highly distinct options, indicating that the rhesus monkeys chose their preferred food first: i.e., a peak-first order preference. Overall, choice order depended on the degree of relative preference among alternatives and a peak-first bias, providing evidence for prospective evaluation when choice order is self-generated.
A Multiple Objective Test Assembly Approach for Exposure Control Problems in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; Verschoor, Angela J.; Eggen, Theo J. H. M.

2010-01-01

Overexposure and underexposure of items in the bank are serious problems in operational computerized adaptive testing (CAT) systems. These exposure problems might result in item compromise, or point at a waste of investments. The exposure control problem can be viewed as a test assembly problem with multiple objectives. Information in the test has…
Multiple-choice pretesting potentiates learning of related information.

PubMed

Little, Jeri L; Bjork, Elizabeth Ligon

2016-10-01

Although the testing effect has received a substantial amount of empirical attention, such research has largely focused on the effects of tests given after study. The present research examines the effect of using tests prior to study (i.e., as pretests), focusing particularly on how pretesting influences the subsequent learning of information that is not itself pretested but that is related to the pretested information. In Experiment 1, we found that multiple-choice pretesting was better for the learning of such related information than was cued-recall pretesting or a pre-fact-study control condition. In Experiment 2, we found that the increased learning of non-pretested related information following multiple-choice testing could not be attributed to increased time allocated to that information during subsequent study. Last, in Experiment 3, we showed that the benefits of multiple-choice pretesting over cued-recall pretesting for the learning of related information persist over 48 hours, thus demonstrating the promise of multiple-choice pretesting to potentiate learning in educational contexts. A possible explanation for the observed benefits of multiple-choice pretesting for enhancing the effectiveness with which related nontested information is learned during subsequent study is discussed.
Developing a Better Understanding of the Process of Fat Absorption.

ERIC Educational Resources Information Center

Yip, Din Yan

2001-01-01

Performance on a multiple-choice item in a public examination indicates that most students do not understand how fat is absorbed through villi. A teaching strategy is suggested to overcome this problem by helping students review their own ideas critically. (Author/MM)
Wesleyan University Student Questionnaire.

ERIC Educational Resources Information Center

Haagen, C. Hess

This questionnaire assesses marijuana use practices in college students. The 30 items (multiple choice or free response) are concerned with personal and demographic data, marijuana smoking practices, use history, effects from smoking marijuana, present attitude toward the substance, and use of other drugs. The Questionnaire is untimed and…
Full Inclusion: The Least Restrictive Environment

ERIC Educational Resources Information Center

Mullings, Shirley E.

2011-01-01

The purpose of the phenomenological study was to examine elementary educators' perceptions of full inclusion as the least restrictive environment for students with disabilities. Thirty-six teachers and administrators participated in interviews and responded to multiple-choice survey items. The recorded data from the interviews were…
Comparing Item Performance on Three- Versus Four-Option Multiple Choice Questions in a Veterinary Toxicology Course.

PubMed

Royal, Kenneth; Dorman, David

2018-06-09

The number of answer options is an important element of multiple-choice questions (MCQs). Many MCQs contain four or more options despite the limited literature suggesting that there is little to no benefit beyond three options. The purpose of this study was to evaluate item performance on 3-option versus 4-option MCQs used in a core curriculum course in veterinary toxicology at a large veterinary medical school in the United States. A quasi-experimental, crossover design was used in which students in each class were randomly assigned to take one of two versions (A or B) of two major exams. Both the 3-option and 4-option MCQs resulted in similar psychometric properties. The findings of our study support earlier research in other medical disciplines and settings that likewise concluded there was no significant change in the psychometric properties of three option MCQs when compared to the traditional MCQs with four or more options.
Assessing Understanding of the Learning Cycle: The ULC

NASA Astrophysics Data System (ADS)

Marek, Edmund A.; Maier, Steven J.; McCann, Florence

2008-08-01

An 18-item, multiple choice, 2-tiered instrument designed to measure understanding of the learning cycle (ULC) was developed and field-tested from the learning cycle test (LCT) of Odom and Settlage ( Journal of Science Teacher Education, 7, 123 142, 1996). All question sets of the LCT were modified to some degree and 5 new sets were added, resulting in the ULC. The ULC measures (a) understandings and misunderstandings of the learning cycle, (b) the learning cycle’s association with Piaget’s ( Biology and knowledge theory: An essay on the relations between organic regulations and cognitive processes, 1975) theory of mental functioning, and (c) applications of the learning cycle. The resulting ULC instrument was evaluated for internal consistency with Cronbach’s alpha, yielding a coefficient of .791.
The language of science and the high school student: The recognition of concept definitions: A comparison between hindi speaking students in India and english speaking students in Australia

NASA Astrophysics Data System (ADS)

Lynch, P. P.; Chipman, H. H.; Pachaury, A. C.

Sixteen concept words (mass, length, area, volume, solid, liquid, gas, element, compound, mixture, electron, proton, neutron, atom, molecule, and ion) associated with the theme, the nature of matter were described as simple text book definitions after examination of classroom notes and school texts of the last three decades. Sixteen multiple-choice items all of the same form were constructed for each of the concept definitions. The English version of the sixteen item test was given to 1635 high school students in Tasmania (where the language of instruction and the home language is English) and the Hindi version of the test was given to 826 students from the Bhopal/Barwani region of India where the medium of instruction is Hindi. The English and Hindi speaking data are compared from the point of view of development, performance for individual items, and overall performance at grade 10. A number of linguistic hypotheses are examined and reported upon. Although the overall score at grade 10 was identical (10.8/16) for both groups there are differences in development overall and for individual items which are of interest. Overall, the science specificity of the Hindi words does not appear to confer any clearly defined advantage or disadvantage though again there are some interesting individual anomolies.
The Effectiveness of Problem-Based Learning Approach Based on Multiple Intelligences in Terms of Student’s Achievement, Mathematical Connection Ability, and Self-Esteem

NASA Astrophysics Data System (ADS)

Kartikasari, A.; Widjajanti, D. B.

2017-02-01

The aim of this study is to explore the effectiveness of learning approach using problem-based learning based on multiple intelligences in developing student’s achievement, mathematical connection ability, and self-esteem. This study is experimental research with research sample was 30 of Grade X students of MIA III MAN Yogyakarta III. Learning materials that were implemented consisting of trigonometry and geometry. For the purpose of this study, researchers designed an achievement test made up of 44 multiple choice questions with respectively 24 questions on the concept of trigonometry and 20 questions for geometry. The researcher also designed a connection mathematical test and self-esteem questionnaire that consisted of 7 essay questions on mathematical connection test and 30 items of self-esteem questionnaire. The learning approach said that to be effective if the proportion of students who achieved KKM on achievement test, the proportion of students who achieved a minimum score of high category on the results of both mathematical connection test and self-esteem questionnaire were greater than or equal to 70%. Based on the hypothesis testing at the significance level of 5%, it can be concluded that the learning approach using problem-based learning based on multiple intelligences was effective in terms of student’s achievement, mathematical connection ability, and self-esteem.
Use of Jackknifing to Evaluate Effects of Anchor Item Selection on Equating with the Nonequivalent Groups with Anchor Test (NEAT) Design. Research Report. ETS RR-15-10

ERIC Educational Resources Information Center

Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua

2015-01-01

In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Reducing the number of options on multiple-choice questions: response time, psychometrics and standard setting.

PubMed

Schneid, Stephen D; Armour, Chris; Park, Yoon Soo; Yudkowsky, Rachel; Bordage, Georges

2014-10-01

Despite significant evidence supporting the use of three-option multiple-choice questions (MCQs), these are rarely used in written examinations for health professions students. The purpose of this study was to examine the effects of reducing four- and five-option MCQs to three-option MCQs on response times, psychometric characteristics, and absolute standard setting judgements in a pharmacology examination administered to health professions students. We administered two versions of a computerised examination containing 98 MCQs to 38 Year 2 medical students and 39 Year 3 pharmacy students. Four- and five-option MCQs were converted into three-option MCQs to create two versions of the examination. Differences in response time, item difficulty and discrimination, and reliability were evaluated. Medical and pharmacy faculty judges provided three-level Angoff (TLA) ratings for all MCQs for both versions of the examination to allow the assessment of differences in cut scores. Students answered three-option MCQs an average of 5 seconds faster than they answered four- and five-option MCQs (36 seconds versus 41 seconds; p = 0.008). There were no significant differences in item difficulty and discrimination, or test reliability. Overall, the cut scores generated for three-option MCQs using the TLA ratings were 8 percentage points higher (p = 0.04). The use of three-option MCQs in a health professions examination resulted in a time saving equivalent to the completion of 16% more MCQs per 1-hour testing period, which may increase content validity and test score reliability, and minimise construct under-representation. The higher cut scores may result in higher failure rates if an absolute standard setting method, such as the TLA method, is used. The results from this study provide a cautious indication to health professions educators that using three-option MCQs does not threaten validity and may strengthen it by allowing additional MCQs to be tested in a fixed amount of testing time with no deleterious effect on the reliability of the test scores. © 2014 John Wiley & Sons Ltd.

Mathematics Assessment Sampler 3-5

ERIC Educational Resources Information Center

National Council of Teachers of Mathematics, 2005

2005-01-01

The sample assessment items in this volume are sorted according to the strands of number and operations, algebra, geometry, measurement, and data analysis and probability. Because one goal of assessment is to determine students' abilities to communicate mathematically, the writing team suggests ways to extend or modify multiple-choice and…
Examining Measurement Invariance and Differential Item Functioning with Discrete Latent Construct Indicators: A Note on a Multiple Testing Procedure

ERIC Educational Resources Information Center

Raykov, Tenko; Dimitrov, Dimiter M.; Marcoulides, George A.; Li, Tatyana; Menold, Natalja

2018-01-01

A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and…
Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

ERIC Educational Resources Information Center

He, Yong

2013-01-01

Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
Wrong Answers on Multiple-Choice Achievement Tests: Blind Guesses or Systematic Choices?.

ERIC Educational Resources Information Center

Powell, J. C.

A multi-faceted model for the selection of answers for multiple-choice tests was developed from the findings of a series of exploratory studies. This model implies that answer selection should be curvilinear. A series of models were tested for fit using the chi square procedure. Data were collected from 359 elementary school students ages 9-12.…
The Impact of Disclosure of Nutrition Information on Consumers' Behavioral Intention in Korea.

PubMed

Choi, Jinkyung

2015-01-01

To investigate the effect of nutritional information disclosure on consumers' nutritional perception, attitude, and behavioral intention to purchase the food item. Questionnaires were distributed measuring nutritional perception, attitude, and behavioral intention with different nutritional information about the food (no information, calories only, and six nutritional content information items: food weight(g), calories(kcal), protein(g), sugar(g), sodium(g), and saturated fat(g)). Food items shown to the respondents were hamburgers and bibimbap. Descriptive analysis, analysis of variance, and multiple regression were used in order to examine the effects of nutritional information levels and different food items on consumers' behavioral intentions. Nutritional perception, food attitude, and food choice intention were all affected by levels of nutritional information and different food items. Also, food attitude was a predictor of food choice behavioral intention and was affected by different food items as well. However, results of the study found that objective and subjective knowledge of individuals are not related to their nutritional perception, attitude, and behavioral intention. Results of this study would help restaurant managers to prepare for consumers' demand on disclosure of nutritional information and adjust their menu ingredients for consumers' healthy food inquiries in order to respond to consumers' interests in nutritional information and ensure consumers satisfaction with the perceived nutritional value of food.
Simultaneous modeling of visual saliency and value computation improves predictions of economic choice.

PubMed

Towal, R Blythe; Mormann, Milica; Koch, Christof

2013-10-01

Many decisions we make require visually identifying and evaluating numerous alternatives quickly. These usually vary in reward, or value, and in low-level visual properties, such as saliency. Both saliency and value influence the final decision. In particular, saliency affects fixation locations and durations, which are predictive of choices. However, it is unknown how saliency propagates to the final decision. Moreover, the relative influence of saliency and value is unclear. Here we address these questions with an integrated model that combines a perceptual decision process about where and when to look with an economic decision process about what to choose. The perceptual decision process is modeled as a drift-diffusion model (DDM) process for each alternative. Using psychophysical data from a multiple-alternative, forced-choice task, in which subjects have to pick one food item from a crowded display via eye movements, we test four models where each DDM process is driven by (i) saliency or (ii) value alone or (iii) an additive or (iv) a multiplicative combination of both. We find that models including both saliency and value weighted in a one-third to two-thirds ratio (saliency-to-value) significantly outperform models based on either quantity alone. These eye fixation patterns modulate an economic decision process, also described as a DDM process driven by value. Our combined model quantitatively explains fixation patterns and choices with similar or better accuracy than previous models, suggesting that visual saliency has a smaller, but significant, influence than value and that saliency affects choices indirectly through perceptual decisions that modulate economic decisions.
Simultaneous modeling of visual saliency and value computation improves predictions of economic choice

PubMed Central

Towal, R. Blythe; Mormann, Milica; Koch, Christof

2013-01-01

Many decisions we make require visually identifying and evaluating numerous alternatives quickly. These usually vary in reward, or value, and in low-level visual properties, such as saliency. Both saliency and value influence the final decision. In particular, saliency affects fixation locations and durations, which are predictive of choices. However, it is unknown how saliency propagates to the final decision. Moreover, the relative influence of saliency and value is unclear. Here we address these questions with an integrated model that combines a perceptual decision process about where and when to look with an economic decision process about what to choose. The perceptual decision process is modeled as a drift–diffusion model (DDM) process for each alternative. Using psychophysical data from a multiple-alternative, forced-choice task, in which subjects have to pick one food item from a crowded display via eye movements, we test four models where each DDM process is driven by (i) saliency or (ii) value alone or (iii) an additive or (iv) a multiplicative combination of both. We find that models including both saliency and value weighted in a one-third to two-thirds ratio (saliency-to-value) significantly outperform models based on either quantity alone. These eye fixation patterns modulate an economic decision process, also described as a DDM process driven by value. Our combined model quantitatively explains fixation patterns and choices with similar or better accuracy than previous models, suggesting that visual saliency has a smaller, but significant, influence than value and that saliency affects choices indirectly through perceptual decisions that modulate economic decisions. PMID:24019496
Changing value through cued approach: An automatic mechanism of behavior change

PubMed Central

Schonberg, Tom; Bakkour, Akram; Hover, Ashleigh M.; Mumford, Jeanette A.; Nagar, Lakshya; Perez, Jacob; Poldrack, Russell A.

2014-01-01

It is believed that choice behavior reveals the underlying value of goods. The subjective values of stimuli can be changed through reward-based learning mechanisms as well as by modifying the description of the decision problem, but it has yet to be shown that preferences can be manipulated by perturbing intrinsic values of individual items. Here we show that the value of food items can be modulated by the concurrent presentation of an irrelevant auditory cue to which subjects must make a simple motor response (i.e. cue-approach training). Follow-up tests show that the effects of this pairing on choice lasted at least two months after prolonged training. Eye-tracking during choice confirmed that cue-approach training increased attention to the cued items. Neuroimaging revealed the neural signature of a value change in the form of amplified preference-related activity in ventromedial prefrontal cortex. PMID:24609465
Optimizing Multiple-Choice Tests as Learning Events

ERIC Educational Resources Information Center

Little, Jeri Lynn

2011-01-01

Although generally used for assessment, tests can also serve as tools for learning--but different test formats may not be equally beneficial. Specifically, research has shown multiple-choice tests to be less effective than cued-recall tests in improving the later retention of the tested information (e.g., see meta-analysis by Hamaker, 1986),…
Food Choices of Minority and Low-Income Employees

PubMed Central

Levy, Douglas E.; Riis, Jason; Sonnenberg, Lillian M.; Barraclough, Susan J.; Thorndike, Anne N.

2012-01-01

Background Effective strategies are needed to address obesity, particularly among minority and low-income individuals. Purpose To test whether a two-phase point-of-purchase intervention improved food choices across racial, socioeconomic (job type) groups. Design A 9-month longitudinal study from 2009 to 2010 assessing person-level changes in purchases of healthy and unhealthy foods following sequentially introduced interventions. Data were analyzed in 2011. Setting/participants Participants were 4642 employees of a large hospital in Boston MA who were regular cafeteria patrons. Interventions The first intervention was a traffic light–style color-coded labeling system encouraging patrons to purchase healthy items (labeled green) and avoid unhealthy items (labeled red). The second intervention manipulated “choice architecture” by physically rearranging certain cafeteria items, making green-labeled items more accessible, red-labeled items less accessible. Main outcome measures Proportion of green- (or red-) labeled items purchased by an employee. Subanalyses tracked beverage purchases, including calories and price per beverage. Results Employees self-identified as white (73%), black (10%), Latino (7%), and Asian (10%). Compared to white employees, Latino and black employees purchased a higher proportion of red items at baseline (18%, 28%, and 33%, respectively, p<0.001) and a lower proportion of green (48%, 38%, and 33%, p<0.001). Labeling decreased all employees’ red item purchases (−11.2% [95% CI= −13.6%, −8.9%]) and increased green purchases (6.6% [95% CI=5.2%, 7.9%]). Red beverage purchases decreased most (−23.8% [95% CI= −28.1%, −19.6%]). The choice architecture intervention further decreased red purchases after the labeling. Intervention effects were similar across all race/ethnicity and job types (p>0.05 for interaction between race or job type and intervention). Mean calories per beverage decreased similarly over the study period for all racial groups and job types, with no increase in per-beverage spending. Conclusions Despite baseline differences in healthy food purchases, a simple color-coded labeling and choice architecture intervention improved food and beverage choices among employees from all racial and socioeconomic backgrounds. PMID:22898116
Food choices of minority and low-income employees: a cafeteria intervention.

PubMed

Levy, Douglas E; Riis, Jason; Sonnenberg, Lillian M; Barraclough, Susan J; Thorndike, Anne N

2012-09-01

Effective strategies are needed to address obesity, particularly among minority and low-income individuals. To test whether a two-phase point-of-purchase intervention improved food choices across racial, socioeconomic (job type) groups. A 9-month longitudinal study from 2009 to 2010 assessing person-level changes in purchases of healthy and unhealthy foods following sequentially introduced interventions. Data were analyzed in 2011. Participants were 4642 employees of a large hospital in Boston MA who were regular cafeteria patrons. The first intervention was a traffic light-style color-coded labeling system encouraging patrons to purchase healthy items (labeled green) and avoid unhealthy items (labeled red). The second intervention manipulated "choice architecture" by physically rearranging certain cafeteria items, making green-labeled items more accessible and red-labeled items less accessible. Proportion of green- (or red-) labeled items purchased by an employee. Subanalyses tracked beverage purchases, including calories and price per beverage. Employees self-identified as white (73%); black (10%); Latino (7%); and Asian (10%). Compared to white employees, Latino and black employees purchased a higher percentage of red items at baseline (18%, 28%, and 33%, respectively, p<0.001) and a lower percentage of green (48%, 38%, and 33%, p<0.001). Labeling decreased all employees' red item purchases (-11.2%, 95% CI= -13.6%, -8.9%) and increased green purchases (6.6%, 95% CI=5.2%, 7.9%). Red beverage purchases decreased most (-23.8%, 95% CI= -28.1%, -19.6%). The choice architecture intervention further decreased red purchases after the labeling. Intervention effects were similar across all race/ethnicity and job types (p>0.05 for interaction between race or job type and intervention). Mean calories per beverage decreased similarly over the study period for all racial groups and job types, with no increase in per-beverage spending. Despite baseline differences in healthy food purchases, a simple color-coded labeling and choice architecture intervention improved food and beverage choices among employees from all racial and socioeconomic backgrounds. Copyright © 2012 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Item Readability and Science Achievement in TIMSS 2003 in South Africa

ERIC Educational Resources Information Center

Dempster, Edith R.; Reddy, Vijay

2007-01-01

This study investigated the relationship between readability of 73 text-only multiple-choice questions from Trends in International Mathematics and Science Study (TIMSS) 2003 and performance of two groups of South African learners: those with limited English-language proficiency (learners attending African schools) and those with better…
Utah Drug Use Questionnaire.

ERIC Educational Resources Information Center

Governor's Citizen Advisory Committee on Drugs, Salt Lake City, UT.

This questionnaire assesses drug use practices in junior and senior high school students. The 21 multiple choice items pertain to drug use practices, use history, available of drugs, main reason for drug use, and demographic data. The questionnaire is untimed, group administered, and may be given by the classroom teacher in about 10 minutes. Item…
Utah Drop-Out Drug Use Questionnaire.

ERIC Educational Resources Information Center

Governor's Citizen Advisory Committee on Drugs, Salt Lake City, UT.

This questionnaire assesses drug use practices in high school drop-outs. The 79 items (multiple choice or apply/not apply) are concerned with demographic data and use, use history, reasons for use/nonuse, attitudes toward drugs, availability of drugs, and drug information with respect to narcotics, amphetamines, LSD, Marijuana, and barbiturates.…
Heubach Smoking Habits and Attitudes Questionnaire.

ERIC Educational Resources Information Center

Heubach, Philip Gilbert

This Questionnaire, consisting of 74 yes/no, multiple choice, and completion items, is designed to assess smoking practices and attitudes toward smoking in high school students. Questions pertain to personal data, family smoking practices and attitudes, personal smoking habits, reasons for smoking or not smoking, and opinions on smoking. Detailed…
Beta-Blockers and the Kidney: Implications for Renal Function and Renin Release.

ERIC Educational Resources Information Center

Epstein, Murray; And Others

1985-01-01

Reviews and discusses current information on the human renal response as related to beta-blockers (antihypertension agents). Topic areas considered include cardioselectivity, renal hemodynamics, systemic hemodynamics, changes with acute and chronic administration, influence of dose, and others. Implications and an 11-item multiple-choice self-quiz…
Using Web-Based Practice to Enhance Mathematics Learning and Achievement

ERIC Educational Resources Information Center

Nguyen, Diem M.; Kulm, Gerald

2005-01-01

This article describes 1) the special features and accessibility of an innovative web-based practice instrument (WebMA) designed with randomized short-answer, matching and multiple choice items incorporated with automatically adapted feedback for middle school students; and 2) an exploratory study that compares the effects and contributions of…
Michigan High School Student Drug Attitudes and Behavior Questionnaire.

ERIC Educational Resources Information Center

Bogg, Richard A.; And Others

This questionnaire assesses drug use practices and attitudes toward drugs in high school students. The instrument has 59 items (multiple choice or completion), some with several parts. The question pertain to aspirations for the future, general attitudes and opinions, biographic and demographic data, family background and relationships, alcohol…
Next-Generation Environments for Assessing and Promoting Complex Science Learning

ERIC Educational Resources Information Center

Quellmalz, Edys S.; Davenport, Jodi L.; Timms, Michael J.; DeBoer, George E.; Jordan, Kevin A.; Huang, Chun-Wei; Buckley, Barbara C.

2013-01-01

How can assessments measure complex science learning? Although traditional, multiple-choice items can effectively measure declarative knowledge such as scientific facts or definitions, they are considered less well suited for providing evidence of science inquiry practices such as making observations or designing and conducting investigations.…
Latent Image Processing Can Bolster the Value of Quizzes.

ERIC Educational Resources Information Center

Singer, David

1985-01-01

Latent image processing is a method which reveals hidden ink when marked with a special pen. Using multiple-choice items with commercially available latent image transfers can provide immediate feedback on take-home quizzes. Students benefitted from formative evaluation and were challenged to search for alternative solutions and explain unexpected…

Science Competencies That Go Unassessed

ERIC Educational Resources Information Center

Gilmer, Penny J.; Sherdan, Danielle M.; Oosterhof, Albert; Rohani, Faranak; Rouby, Aaron

2011-01-01

Present large-scale assessments require the use of item formats, such as multiple choice, that can be administered and scored efficiently. This limits competencies that can be measured by these assessments. An alternative approach to large-scale assessments is being investigated that would include the use of complex performance assessments. As…
Investigating Urban Eighth-Grade Students' Knowledge of Energy Resources

ERIC Educational Resources Information Center

Bodzin, Alec

2012-01-01

This study investigated urban eighth-grade students' knowledge of energy resources and associated issues including energy acquisition, energy generation, storage and transport, and energy consumption and conservation. A 39 multiple-choice-item energy resources knowledge assessment was completed by 1043 eighth-grade students in urban schools in two…
Development and Validation of the Homeostasis Concept Inventory

ERIC Educational Resources Information Center

McFarland, Jenny L.; Price, Rebecca M.; Wenderoth, Mary Pat; Martinková, Patrícia; Cliff, William; Michael, Joel; Modell, Harold; Wright, Ann

2017-01-01

We present the Homeostasis Concept Inventory (HCI), a 20-item multiple-choice instrument that assesses how well undergraduates understand this critical physiological concept. We used an iterative process to develop a set of questions based on elements in the Homeostasis Concept Framework. This process involved faculty experts and undergraduate…
Criterion-Referenced Test Items for Auto Body.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This test item bank on auto body repair contains criterion-referenced test questions based upon competencies found in the Missouri Auto Body Competency Profile. Some test items are keyed for multiple competencies. The tests cover the following 26 competency areas in the auto body curriculum: auto body careers; measuring and mixing; tools and…
Welch Science Process Inventory, Form D. Revised.

ERIC Educational Resources Information Center

Welch, Wayne W.

This inventory, developed for use with the Harvard Project Physics curriculum, consists of 135 two-choice (agree-disagree) items. Items cover perceptions of the role of scientists, the nature and functions of theories, underlying assumptions made by scientists, and other aspects of the scientific process. The test is suitable for high school…
A Novel Teaching Tool Combined With Active-Learning to Teach Antimicrobial Spectrum Activity.

PubMed

MacDougall, Conan

2017-03-25

Objective. To design instructional methods that would promote long-term retention of knowledge of antimicrobial pharmacology, particularly the spectrum of activity for antimicrobial agents, in pharmacy students. Design. An active-learning approach was used to teach selected sessions in a required antimicrobial pharmacology course. Students were expected to review key concepts from the course reader prior to the in-class sessions. During class, brief concept reviews were followed by active-learning exercises, including a novel schematic method for learning antimicrobial spectrum of activity ("flower diagrams"). Assessment. At the beginning of the next quarter (approximately 10 weeks after the in-class sessions), 360 students (three yearly cohorts) completed a low-stakes multiple-choice examination on the concepts in antimicrobial spectrum of activity. When data for students was pooled across years, the mean number of correct items was 75.3% for the items that tested content delivered with the active-learning method vs 70.4% for items that tested content delivered via traditional lecture (mean difference 4.9%). Instructor ratings on student evaluations of the active-learning approach were high (mean scores 4.5-4.8 on a 5-point scale) and student comments were positive about the active-learning approach and flower diagrams. Conclusion. An active-learning approach led to modestly higher scores in a test of long-term retention of pharmacology knowledge and was well-received by students.
A Novel Teaching Tool Combined With Active-Learning to Teach Antimicrobial Spectrum Activity

PubMed Central

2017-01-01

Objective. To design instructional methods that would promote long-term retention of knowledge of antimicrobial pharmacology, particularly the spectrum of activity for antimicrobial agents, in pharmacy students. Design. An active-learning approach was used to teach selected sessions in a required antimicrobial pharmacology course. Students were expected to review key concepts from the course reader prior to the in-class sessions. During class, brief concept reviews were followed by active-learning exercises, including a novel schematic method for learning antimicrobial spectrum of activity (“flower diagrams”). Assessment. At the beginning of the next quarter (approximately 10 weeks after the in-class sessions), 360 students (three yearly cohorts) completed a low-stakes multiple-choice examination on the concepts in antimicrobial spectrum of activity. When data for students was pooled across years, the mean number of correct items was 75.3% for the items that tested content delivered with the active-learning method vs 70.4% for items that tested content delivered via traditional lecture (mean difference 4.9%). Instructor ratings on student evaluations of the active-learning approach were high (mean scores 4.5-4.8 on a 5-point scale) and student comments were positive about the active-learning approach and flower diagrams. Conclusion. An active-learning approach led to modestly higher scores in a test of long-term retention of pharmacology knowledge and was well-received by students. PMID:28381885
The Australian Medical Schools Assessment Collaboration: benchmarking the preclinical performance of medical students.

PubMed

O'Mara, Deborah A; Canny, Ben J; Rothnie, Imogene P; Wilson, Ian G; Barnard, John; Davies, Llewelyn

2015-02-02

To report the level of participation of medical schools in the Australian Medical Schools Assessment Collaboration (AMSAC); and to measure differences in student performance related to medical school characteristics and implementation methods. Retrospective analysis of data using the Rasch statistical model to correct for missing data and variability in item difficulty. Linear model analysis of variance was used to assess differences in student performance. 6401 preclinical students from 13 medical schools that participated in AMSAC from 2011 to 2013. Rasch estimates of preclinical basic and clinical science knowledge. Representation of Australian medical schools and students in AMSAC more than doubled between 2009 and 2013. In 2013 it included 12 of 19 medical schools and 68% of medical students. Graduate-entry students scored higher than students entering straight from school. Students at large schools scored higher than students at small schools. Although the significance level was high (P < 0.001), the main effect sizes were small (4.5% and 2.3%, respectively). The time allowed per multiple choice question was not significantly associated with student performance. The effect on performance of multiple assessments compared with the test items as part of a single end-of-year examination was negligible. The variables investigated explain only 12% of the total variation in student performance. An increasing number of medical schools are participating in AMSAC to monitor student performance in preclinical sciences against an external benchmark. Medical school characteristics account for only a small part of overall variation in student performance. Student performance was not affected by the different methods of administering test items.
Intertemporal choice in lemurs.

PubMed

Stevens, Jeffrey R; Mühlhoff, Nelly

2012-02-01

Different species vary in their ability to wait for delayed rewards in intertemporal choice tasks. Models of rate maximization account for part of this variation, but other factors such as social structure and feeding ecology seem to underly some species differences. Though studies have evaluated intertemporal choice in several primate species, including Old World monkeys, New World monkeys, and apes, prosimians have not been tested. This study investigated intertemporal choices in three species of lemur (black-and-white ruffed lemurs, Varecia variegata, red ruffed lemurs, Varecia rubra, and black lemurs, Eulemur macaco) to assess how they compare to other primate species and whether their choices are consistent with rate maximization. We offered lemurs a choice between two food items available immediately and six food items available after a delay. We found that by adjusting the delay to the larger reward, the lemurs were indifferent between the two options at a mean delay of 17 s, ranging from 9 to 25 s. These data are comparable to data collected from common marmosets (Callithrix jacchus). The lemur data were not consistent with models of rate maximization. The addition of lemurs to the list of species tested in these tasks will help uncover the role of life history and socio-ecological factors influencing intertemporal choices. Copyright © 2011 Elsevier B.V. All rights reserved.
Recall in older cancer patients: measuring memory for medical information.

PubMed

Jansen, Jesse; van Weert, Julia; van der Meulen, Nienke; van Dulmen, Sandra; Heeren, Thea; Bensing, Jozien

2008-04-01

Remembering medical treatment information may be particularly taxing for older cancer patients, but to our knowledge this ability has never been assessed in this specific age group only. Our purpose in this study was to investigate older cancer patients' recall of information after patient education preceding chemotherapy. We constructed a recall questionnaire consisting of multiple-choice questions, completion items, and open-ended questions related to information about treatment and recommendations on how to handle side effects. Immediately after a nursing consultation preceding chemotherapy treatment, 69 older patients (M = 71.8 years, SD = 4.1) completed the questionnaire. We checked recall against the actual communication in video recordings of the consultations. On average, 82.2 items were discussed during the consultations. The mean percentage of information recalled correctly was 23.2% for open-ended questions, 68.0% for completion items, and 80.2% for multiple-choice questions. Older cancer patients are confronted with a lot of information. Recall of information strongly depended on question format; especially active reproduction appeared to be poor. To improve treatment outcomes, it is important that cancer patients are able to actively retrieve knowledge about how to prevent and recognize adverse side effects and that this is checked by the health professional. We make suggestions on how to make information more memorable for older cancer patients.
Evaluation of MIMIC-Model Methods for DIF Testing with Comparison to Two-Group Analysis

ERIC Educational Resources Information Center

Woods, Carol M.

2009-01-01

Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on the construct. This study focuses on the use of multiple-indicator multiple-cause (MIMIC) structural equation models for DIF testing, parameterized as item…
Introducing Standardized EFL/ESL Exams

ERIC Educational Resources Information Center

Laborda, Jesus Garcia

2007-01-01

This article presents the features, and a brief comparison, of some of the most well-known high-stakes exams. They are classified in the following fashion: tests that only include multiple-choice questions, tests that include writing and multiple-choice questions, and tests that include speaking questions. The tests reviewed are: BULATS, IELTS,…
Optimal foraging by birds: feeder-based experiments for secondary and post-secondary students

USDA-ARS?s Scientific Manuscript database

Optimal foraging theory attempts to explain the foraging patterns observed in animals, including their choice of particular food items and foraging locations. Here, we describe three exercises designed to test hypotheses about food choice and foraging habitat preference using bird feeders. These e...
Exploring undergraduates' understanding of photosynthesis using diagnostic question clusters.

PubMed

Parker, Joyce M; Anderson, Charles W; Heidemann, Merle; Merrill, John; Merritt, Brett; Richmond, Gail; Urban-Lurain, Mark

2012-01-01

We present a diagnostic question cluster (DQC) that assesses undergraduates' thinking about photosynthesis. This assessment tool is not designed to identify individual misconceptions. Rather, it is focused on students' abilities to apply basic concepts about photosynthesis by reasoning with a coordinated set of practices based on a few scientific principles: conservation of matter, conservation of energy, and the hierarchical nature of biological systems. Data on students' responses to the cluster items and uses of some of the questions in multiple-choice, multiple-true/false, and essay formats are compared. A cross-over study indicates that the multiple-true/false format shows promise as a machine-gradable format that identifies students who have a mixture of accurate and inaccurate ideas. In addition, interviews with students about their choices on three multiple-choice questions reveal the fragility of students' understanding. Collectively, the data show that many undergraduates lack both a basic understanding of the role of photosynthesis in plant metabolism and the ability to reason with scientific principles when learning new content. Implications for instruction are discussed.
Exploring Undergraduates' Understanding of Photosynthesis Using Diagnostic Question Clusters

PubMed Central

Parker, Joyce M.; Anderson, Charles W.; Heidemann, Merle; Merrill, John; Merritt, Brett; Richmond, Gail; Urban-Lurain, Mark

2012-01-01

We present a diagnostic question cluster (DQC) that assesses undergraduates' thinking about photosynthesis. This assessment tool is not designed to identify individual misconceptions. Rather, it is focused on students' abilities to apply basic concepts about photosynthesis by reasoning with a coordinated set of practices based on a few scientific principles: conservation of matter, conservation of energy, and the hierarchical nature of biological systems. Data on students' responses to the cluster items and uses of some of the questions in multiple-choice, multiple-true/false, and essay formats are compared. A cross-over study indicates that the multiple-true/false format shows promise as a machine-gradable format that identifies students who have a mixture of accurate and inaccurate ideas. In addition, interviews with students about their choices on three multiple-choice questions reveal the fragility of students' understanding. Collectively, the data show that many undergraduates lack both a basic understanding of the role of photosynthesis in plant metabolism and the ability to reason with scientific principles when learning new content. Implications for instruction are discussed. PMID:22383617
Improving nurse practitioners' competence with genetics: Effectiveness of an online course.

PubMed

Whitt, Karen J; Macri, Charles; O'Brien, Travis J; Wright, Stephanie

2016-03-01

The purpose of this study was to assess the effectiveness of an online genetics course for improving nurse practitioners' knowledge, competence, and comfort with genetic principles and their application to clinical practice. A genetics knowledge test and survey were administered to 232 nurse practitioner students, between 2011 and 2013, before and after completing a 15-week online genetics course taught by a multidisciplinary team of instructors at a private east coast U.S. university. The 65-item survey allowed participants to rate competence regarding genetic principles, diseases, and terminology, as well as comfort performing various clinical tasks related to genetics. The 21-item knowledge test contained multiple choice questions regarding core competencies in genetics. Paired t-tests were used to compare mean pre- and postscores. Participants significantly increased postcourse knowledge (p < .001) and comfort with genetic core competencies and clinical skills related to genetics (p < .001). This study demonstrates the effectiveness of an online genetics course for increasing nurse practitioners' knowledge, competence, and confidence with genetics and identifies specific topics educators should consider when designing curricula for nurse practitioners. Findings from this study can improve genetics education for nurse practitioners, which will in turn improve patient health. ©2015 American Association of Nurse Practitioners.
Development and Validation of the Conceptual Assessment of Natural Selection (CANS)

PubMed Central

Kalinowski, Steven T.; Leonard, Mary J.; Taper, Mark L.

2016-01-01

We developed and validated the Conceptual Assessment of Natural Selection (CANS), a multiple-choice test designed to assess how well college students understand the central principles of natural selection. The expert panel that reviewed the CANS concluded its questions were relevant to natural selection and generally did a good job sampling the specific concepts they were intended to assess. Student interviews confirmed questions on the CANS provided accurate reflections of how students think about natural selection. And, finally, statistical analysis of student responses using item response theory showed that the CANS did a very good job of estimating how well students understood natural selection. The empirical reliability of the CANS was substantially higher than the Force Concept Inventory, a highly regarded test in physics that has a similar purpose. PMID:27856552
Development of a research ethics knowledge and analytical skills assessment tool.

PubMed

Taylor, Holly A; Kass, Nancy E; Ali, Joseph; Sisson, Stephen; Bertram, Amanda; Bhan, Anant

2012-04-01

The goal of this project was to develop and validate a new tool to evaluate learners' knowledge and skills related to research ethics. A core set of 50 questions from existing computer-based online teaching modules were identified, refined and supplemented to create a set of 74 multiple-choice, true/false and short answer questions. The questions were pilot-tested and item discrimination was calculated for each question. Poorly performing items were eliminated or refined. Two comparable assessment tools were created. These assessment tools were administered as a pre-test and post-test to a cohort of 58 Indian junior health research investigators before and after exposure to a new course on research ethics. Half of the investigators were exposed to the course online, the other half in person. Item discrimination was calculated for each question and Cronbach's α for each assessment tool. A final version of the assessment tool that incorporated the best questions from the pre-/post-test phase was used to assess retention of research ethics knowledge and skills 3 months after course delivery. The final version of the REKASA includes 41 items and had a Cronbach's α of 0.837. The results illustrate, in one sample of learners, the successful, systematic development and use of a knowledge and skills assessment tool in research ethics capable of not only measuring basic knowledge in research ethics and oversight but also assessing learners' ability to apply ethics knowledge to the analytical task of reasoning through research ethics cases, without reliance on essay or discussion-based examination. These promising preliminary findings should be confirmed with additional groups of learners.
Incorporation of core competency questions into an annual national self-assessment examination for residents in physical medicine and rehabilitation: results and implications.

PubMed

Webster, Joseph B

2009-03-01

To determine the performance and change over time when incorporating questions in the core competency domains of practice-based learning and improvement (PBLI), systems-based practice (SBP), and professionalism (PROF) into the national PM&R Self-Assessment Examination for Residents (SAER). Prospective, longitudinal analysis. The national Self-Assessment Examination for Residents (SAER) in Physical Medicine and Rehabilitation, which is administered annually. Approximately 1100 PM&R residents who take the examination annually. Inclusion of progressively more challenging questions in the core competency domains of PBLI, SBP, and PROF. Individual test item level of difficulty (P value) and discrimination (point biserial index). Compared with the overall test, questions in the subtopic areas of PBLI, SBP, and PROF were relatively easier and less discriminating (correlation of resident performance on these domains compared with that on the total test). These differences became smaller during the 3-year time period. The difficulty level of the questions in each of the subtopic domains was raised during the 3 year period to a level close to the overall exam. Discrimination of the test items improved or remained stable. This study demonstrates that, with careful item writing and review, multiple-choice items in the PBLI, SBP, and PROF domains can be successfully incorporated into an annual, national self-assessment examination for residents. The addition of these questions had value in assessing competency while not compromising the overall validity and reliability of the exam. It is yet to be determined if resident performance on these questions corresponds to performance on other measures of competency in the areas of PBLI, SBP, and PROF.
Psychometric characteristics of Clinical Reasoning Problems (CRPs) and its correlation with routine multiple choice question (MCQ) in Cardiology department

PubMed Central

DERAKHSHANDEH, ZAHRA; AMINI, MITRA; KOJURI, JAVAD; DEHBOZORGIAN, MARZIYEH

2018-01-01

Introduction: Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students’ learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. Methods: This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents’ academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). Results: The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). Conclusion: The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs. PMID:29344528

Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Student Opinion Inventory. Instructions for Use. Part A. Part B.

ERIC Educational Resources Information Center

National Study of School Evaluation, Arlington, VA.

An important part of any school's self-evaluation is student input or feedback. This inventory was developed in order to accomplish two goals: assessing student attitudes toward many facets of the school, and providing an opportunity for students to make recommendations for improvement. Thirty-four multiple choice items collect information on…
Opinions of Female Juvenile Delinquents on Language-Based Literacy Activities

ERIC Educational Resources Information Center

Sanger, Dixie; Ritzman, Mitzi; Stremlau, Aliza; Fairchild, Lindsey; Brunken, Cindy

2009-01-01

A mixed methods study was conducted to examine female juvenile delinquents' opinions and reactions on nine language-based literacy activities. Forty-one participants ranging in age from 13 to 18 years responded to a survey consisting of nine multiple-choice items and one open-ended question concerning the usefulness of activities. Quantitative and…
The Precalculus Concept Assessment: A Tool for Assessing Students' Reasoning Abilities and Understandings

ERIC Educational Resources Information Center

Carlson, Marilyn; Oehrtman, Michael; Engelke, Nicole

2010-01-01

This article describes the development of the Precalculus Concept Assessment (PCA) instrument, a 25-item multiple-choice exam. The reasoning abilities and understandings central to precalculus and foundational for beginning calculus were identified and characterized in a series of research studies and are articulated in the PCA Taxonomy. These…
Modeling Incorrect Responses to Multiple-Choice Items with Multilinear Formula Score Theory.

DTIC Science & Technology

1987-08-01

Eisenhower Avenue University of Leyden Alexandria, VA 22333 Education Research Center Boerhaavelaan 2 Dr. John M. Eddins 2334 EN Leyden University of...22302-0268 Dr. William Montague NPRDC Code 13 Dr. William L. Maloy San Diego, CA 92152-6800 Chief of Naval Education and Training Ms. Kathleen Moreno
Toward the Development of a Model to Estimate the Readability of Credentialing-Examination Materials

ERIC Educational Resources Information Center

Badgett, Barbara A.

2010-01-01

The purpose of this study was to develop a set of procedures to establish readability, including an equation, that accommodates the multiple-choice item format and occupational-specific language related to credentialing examinations. The procedures and equation should be appropriate for learning materials, examination materials, and occupational…
University of Michigan Drug Education Questionnaire.

ERIC Educational Resources Information Center

Francis, John Bruce; Patch, David J.

This questionnaire assesses attitudes toward potential drug education programs and drug use practices in college students. The 87 items (multiple choice or free response) pertain to the history and extent of usage of 27 different drugs, including two non-existent drugs which may be utilized as a validity check; attitude toward the content, format,…
The Instructional Effects of Matching or Mismatching Lesson and Posttest Screen Color

ERIC Educational Resources Information Center

Clariana, Roy B.

2004-01-01

This investigation considers the instructional effects of color as an over-arching context variable when learning from computer displays. The purpose of this investigation is to examine the posttest retrieval effects of color as a local, extra-item non-verbal lesson context variable for constructed-response versus multiple-choice posttest…
Changing Ideas about the Periodic Table of Elements and Students' Alternative Concepts of Isotopes and Allotropes.

ERIC Educational Resources Information Center

Schmidt, Hans-Jurgen; Baumgartner, Tim; Eybe, Holger

2003-01-01

Investigates secondary school students' concepts of isotopes and allotropes and how the concepts are linked to the Periodic Table of Elements (PTE). Questions senior high school students with multiple choice items and interviews. Shows that students actively tried to make sense of what they had experienced. (KHR)
Research in the Automation of Teaching. Technical Report.

ERIC Educational Resources Information Center

Zuckerman, Carl B.; And Others

An experiment was designed to compare the value of the Skinner Teaching Machine with more traditional teaching methods and to compare various means of presenting material via the teaching machine. Material from the United States Navy Basic Electricity course was programed into three series of items: one completion, one multiple choice, and one…
Understanding Misconceptions: Teaching and Learning in Middle School Physical Science

ERIC Educational Resources Information Center

Sadler, Philip M.; Sonnert, Gerhard

2016-01-01

In this study the authors set out to better understand the relationship between teacher knowledge of science and student learning. The authors administered identical multiple-choice assessment items both to teachers of middle school physical science and to their students throughout the school year. The authors found that teachers who have strong…
It’s the season! Seasonal changes of MyPyramid food groups in weekly Sunday grocery store sale advertisements

USDA-ARS?s Scientific Manuscript database

Background: Faced with tens of thousands of food choices, consumers frequently turn to promotional advertising, such as Sunday sales circulars, to make purchasing decisions. To date, little research has examined the content of sales circulars over multiple seasons. Methods: Food items from 12 months...
SCHOOL ANXIETY AND THE FACILITATION OF PERFORMANCE.

ERIC Educational Resources Information Center

DUNN, JAMES A.; SCHELKUN, RUTH F.

THE RELATIONSHIPS BETWEEN SCHOOL GENERATED ANXIETY AND VARIOUS INDICES OF SCHOOL ACHIEVEMENT, CREATIVITY, AGE, AND IQ, ARE INVESTIGATED. A 160 ITEM, MULTIPLE-CHOICE, MULTI-SCALE, SCHOOL ANXIETY QUESTIONNAIRE WAS ADMINISTERED TO 56 FOURTH, FIFTH, AND SIXTH GRADE CHILDREN WITH A MEAN STANFORD BINET IQ OF 126 FROM AN UPPER MIDDLE CLASS COMMUNITY.…
Assessment of Electrochemical Concepts: A Comparative Study Involving Senior High-School Students in Indonesia and Japan

ERIC Educational Resources Information Center

Rahayu, Sri; Treagust, David F.; Chandrasegaran, A. L.; Kita, Masakazu; Ibnu, Suhadi

2011-01-01

Background and purpose: This study investigated Indonesian and Japanese senior high-school students' understanding of electrochemistry concepts. Sample: The questionnaire was administered to 244 Indonesian and 189 Japanese public senior high-school students. Design and methods: An 18-item multiple-choice questionnaire relating to five conceptual…
How Much Detail Needs to Be Elucidated in Self-Harm Research?

ERIC Educational Resources Information Center

Stanford, Sarah; Jones, Michael P.

2010-01-01

Assessing self-harm through brief multiple choice items is simple and less invasive than more detailed methods of assessment. However, there is currently little validation for brief methods of self-harm assessment. This study evaluates the extent to which adolescents' perceptions of self-harm agree with definitions in the literature, and what…
Technical Adequacy of the easyCBM Grade 2 Reading Measures. Technical Report #1004

ERIC Educational Resources Information Center

Jamgochian, Elisa; Park, Bitnara Jasmine; Nese, Joseph F. T.; Lai, Cheng-Fei; Saez, Leilani; Anderson, Daniel; Alonzo, Julie; Tindal, Gerald

2010-01-01

In this technical report, we provide reliability and validity evidence for the easyCBM[R] Reading measures for grade 2 (word and passage reading fluency and multiple choice reading comprehension). Evidence for reliability includes internal consistency and item invariance. Evidence for validity includes concurrent, predictive, and construct…
Interactive anatomical and surgical live stream lectures improve students' academic performance in applied clinical anatomy.

PubMed

Shiozawa, Thomas; Butz, Benjamin; Herlan, Stephan; Kramer, Andreas; Hirt, Bernhard

2017-01-01

Tuebingen's Sectio Chirurgica (TSC) is an innovative, interactive, multimedia, and transdisciplinary teaching method designed to complement dissection courses. The Tuebingen's Sectio Chirurgica (TSC) allows clinical anatomy to be taught via interactive live stream surgeries moderated by an anatomist. This method aims to provide an application-oriented approach to teaching anatomy that offers students a deeper learning experience. A cohort study was devised to determine whether students who participated in the TSC were better able to solve clinical application questions than students who did not participate. A total of 365 students participated in the dissection course during the winter term of the 2012/2013 academic year. The final examination contained 40 standard multiple-choice (S-MC) and 20 clinically-applied multiple-choice (CA-MC) items. The CA-MC items referred to clinical cases but could be answered solely using anatomical knowledge. Students who regularly participated in the TSC answered the CA-MC questions significantly better than the control group (75% and 65%, respectively; P < 0.05, Mann-Whitney U test). The groups exhibited no differences on the S-MC questions (85% and 82.5%, respectively; P > 0.05). The CA-MC questions had a slightly higher level of difficulty than the S-MC questions (0.725 and 0.801, respectively; P = 0.083). The discriminatory power of the items was comparable (S-MC median Pearson correlations: 0.321; CA-MC: 0.283). The TSC successfully teaches the clinical application of anatomical knowledge. Students who attended the TSC in addition to the dissection course were able to answer CA-MC questions significantly better than students who did not attend the TSC. Thus, attending the TSC in addition to the dissection course supported students' clinical learning goals. Anat Sci Educ 10: 46-52. © 2016 American Association of Anatomists. © 2016 American Association of Anatomists.
Latent class analysis of diagnostic science assessment data using Bayesian networks

NASA Astrophysics Data System (ADS)

Steedle, Jeffrey Thomas

2008-10-01

Diagnostic science assessments seek to draw inferences about student understanding by eliciting evidence about the mental models that underlie students' reasoning about physical systems. Measurement techniques for analyzing data from such assessments embody one of two contrasting assessment programs: learning progressions and facet-based assessments. Learning progressions assume that students have coherent theories that they apply systematically across different problem contexts. In contrast, the facet approach makes no such assumption, so students should not be expected to reason systematically across different problem contexts. A systematic comparison of these two approaches is of great practical value to assessment programs such as the National Assessment of Educational Progress as they seek to incorporate small clusters of related items in their tests for the purpose of measuring depth of understanding. This dissertation describes an investigation comparing learning progression and facet models. Data comprised student responses to small clusters of multiple-choice diagnostic science items focusing on narrow aspects of understanding of Newtonian mechanics. Latent class analysis was employed using Bayesian networks in order to model the relationship between students' science understanding and item responses. Separate models reflecting the assumptions of the learning progression and facet approaches were fit to the data. The technical qualities of inferences about student understanding resulting from the two models were compared in order to determine if either modeling approach was more appropriate. Specifically, models were compared on model-data fit, diagnostic reliability, diagnostic certainty, and predictive accuracy. In addition, the effects of test length were evaluated for both models in order to inform the number of items required to obtain adequately reliable latent class diagnoses. Lastly, changes in student understanding over time were studied with a longitudinal model in order to provide educators and curriculum developers with a sense of how students advance in understanding over the course of instruction. Results indicated that expected student response patterns rarely reflected the assumptions of the learning progression approach. That is, students tended not to systematically apply a coherent set of ideas across different problem contexts. Even those students expected to express scientifically-accurate understanding had substantial probabilities of reporting certain problematic ideas. The learning progression models failed to make as many substantively-meaningful distinctions among students as the facet models. In statistical comparisons, model-data fit was better for the facet model, but the models were quite comparable on all other statistical criteria. Studying the effects of test length revealed that approximately 8 items are needed to obtain adequate diagnostic certainty, but more items are needed to obtain adequate diagnostic reliability. The longitudinal analysis demonstrated that students either advance in their understanding (i.e., switch to the more advanced latent class) over a short period of instruction or stay at the same level. There was no significant relationship between the probability of changing latent classes and time between testing occasions. In all, this study is valuable because it provides evidence informing decisions about modeling and reporting on student understanding, it assesses the quality of measurement available from short clusters of diagnostic multiple-choice items, and it provides educators with knowledge of the paths that student may take as they advance from novice to expert understanding over the course of instruction.
Profile of science process skills of Preservice Biology Teacher in General Biology Course

NASA Astrophysics Data System (ADS)

Susanti, R.; Anwar, Y.; Ermayanti

2018-04-01

This study aims to obtain portrayal images of science process skills among preservice biology teacher. This research took place in Sriwijaya University and involved 41 participants. To collect the data, this study used multiple choice test comprising 40 items to measure the mastery of science process skills. The data were then analyzed in descriptive manner. The results showed that communication aspect outperfomed the other skills with that 81%; while the lowest one was identifying variables and predicting (59%). In addition, basic science process skills was 72%; whereas for integrated skills was a bit lower, 67%. In general, the capability of doing science process skills varies among preservice biology teachers.
Optimal Foraging by Birds: Experiments for Secondary & Postsecondary Students

ERIC Educational Resources Information Center

Pecor, Keith W.; Lake, Ellen C.; Wund, Matthew A.

2015-01-01

Optimal foraging theory attempts to explain the foraging patterns observed in animals, including their choice of particular food items and foraging locations. We describe three experiments designed to test hypotheses about food choice and foraging habitat preference using bird feeders. These experiments can be used alone or in combination and can…

Relationship between affective determinants and achievement in science for seventeen-year-olds

NASA Astrophysics Data System (ADS)

Napier, John D.; Riley, Joseph P.

Data collected in the 1976-1977 NAEP survey of seventeen-year-olds was used to reanalyze the hypothesis that there are affective determinates of science achievement. Factor and item analysis procedures were used to examine affective and cognitive items from Booklet 4. Eight affective scales and one cognitive achievement scale were identified. Using stepwise multiple regression procedures, the four affective scales of Motivation, Anxiety, Student Choice, and Teacher Support were found to account for the majority of the correlation between the affective determinants and achievement.
Efficient Methods of Estimating the Operating Characteristics of Item Response Categories and Challenge to a New Model for the Multiple-Choice Item

DTIC Science & Technology

1981-11-01

i very little effort has been put upon the model validation, which is essential in any scientific research. T’-- -rientation we aim at in the present...better than the former to the target function. This implies that, although the interval of ability e of our interest is even a little smaller than [-3.0...approaches turned out to be similar, with some deviations, i.e., some of them are a little closer to the theoretical density function, and some of
The Environment Makes a Difference: The Impact of Explicit and Implicit Attitudes as Precursors in Different Food Choice Tasks

PubMed Central

König, Laura M.; Giese, Helge; Schupp, Harald T.; Renner, Britta

2016-01-01

Studies show that implicit and explicit attitudes influence food choice. However, precursors of food choice often are investigated using tasks offering a very limited number of options despite the comparably complex environment surrounding real life food choice. In the present study, we investigated how the assortment impacts the relationship between implicit and explicit attitudes and food choice (confectionery and fruit), assuming that a more complex choice architecture is more taxing on cognitive resources. Specifically, a binary and a multiple option choice task based on the same stimulus set (fake food items) were presented to ninety-seven participants. Path modeling revealed that both explicit and implicit attitudes were associated with relative food choice (confectionery vs. fruit) in both tasks. In the binary option choice task, both explicit and implicit attitudes were significant precursors of food choice, with explicit attitudes having a greater impact. Conversely, in the multiple option choice task, the additive impact of explicit and implicit attitudes was qualified by an interaction indicating that, even if explicit and implicit attitudes toward confectionery were inconsistent, more confectionery was chosen than fruit if either was positive. This compensatory ‘one is sufficient’-effect indicates that the structure of the choice environment modulates the relationship between attitudes and choice. The study highlights that environmental constraints, such as the number of choice options, are an important boundary condition that need to be included when investigating the relationship between psychological precursors and behavior. PMID:27621719
The Environment Makes a Difference: The Impact of Explicit and Implicit Attitudes as Precursors in Different Food Choice Tasks.

PubMed

König, Laura M; Giese, Helge; Schupp, Harald T; Renner, Britta

2016-01-01

Studies show that implicit and explicit attitudes influence food choice. However, precursors of food choice often are investigated using tasks offering a very limited number of options despite the comparably complex environment surrounding real life food choice. In the present study, we investigated how the assortment impacts the relationship between implicit and explicit attitudes and food choice (confectionery and fruit), assuming that a more complex choice architecture is more taxing on cognitive resources. Specifically, a binary and a multiple option choice task based on the same stimulus set (fake food items) were presented to ninety-seven participants. Path modeling revealed that both explicit and implicit attitudes were associated with relative food choice (confectionery vs. fruit) in both tasks. In the binary option choice task, both explicit and implicit attitudes were significant precursors of food choice, with explicit attitudes having a greater impact. Conversely, in the multiple option choice task, the additive impact of explicit and implicit attitudes was qualified by an interaction indicating that, even if explicit and implicit attitudes toward confectionery were inconsistent, more confectionery was chosen than fruit if either was positive. This compensatory 'one is sufficient'-effect indicates that the structure of the choice environment modulates the relationship between attitudes and choice. The study highlights that environmental constraints, such as the number of choice options, are an important boundary condition that need to be included when investigating the relationship between psychological precursors and behavior.
Making the Most of Multiple Choice

ERIC Educational Resources Information Center

Brookhart, Susan M.

2015-01-01

Multiple-choice questions draw criticism because many people perceive they test only recall or atomistic, surface-level objectives and do not require students to think. Although this can be the case, it does not have to be that way. Susan M. Brookhart suggests that multiple-choice questions are a useful part of any teacher's questioning repertoire…
Using Multiple-Choice Questions to Evaluate In-Depth Learning of Economics

ERIC Educational Resources Information Center

Buckles, Stephen; Siegfried, John J.

2006-01-01

Multiple-choice questions are the basis of a significant portion of assessment in introductory economics courses. However, these questions, as found in course assessments, test banks, and textbooks, often fail to evaluate students' abilities to use and apply economic analysis. The authors conclude that multiple-choice questions can be used to…
Format of Options in Multiple Choice Test vis-a-vis Test Performance

ERIC Educational Resources Information Center

Bendulo, Hermabeth O.; Tibus, Erlinda D.; Bande, Rhodora A.; Oyzon, Voltaire Q.; Milla, Norberto E.; Macalinao, Myrna L.

2017-01-01

Testing or evaluation in an educational context is primarily used to measure or evaluate and authenticate the academic readiness, learning advancement, acquisition of skills, or instructional needs of learners. This study tried to determine whether the varied combinations of arrangements of options and letter cases in a Multiple-Choice Test (MCT)…
Equal Opportunity in the Classroom: Test Construction in a Diversity-Sensitive Environment.

ERIC Educational Resources Information Center

Ghorpade, Jai; Lackritz, James R.

1998-01-01

Two multiple-choice tests and one essay test were taken by 231 students (50/50 male/female, 192 White, 39 East Asian, Black, Mexican American, or Middle Eastern). Multiple-choice tests showed no significant differences in equal employment opportunity terms; women and men scored about the same on essays, but minority students had significantly…
Measuring the Consistency in Change in Hepatitis B Knowledge among Three Different Types of Tests: True/False, Multiple Choice, and Fill in the Blanks Tests.

ERIC Educational Resources Information Center

Sahai, Vic; Demeyere, Petra; Poirier, Sheila; Piro, Felice

1998-01-01

The recall of information about Hepatitis B demonstrated by 180 seventh graders was tested with three test types: (1) short-answer; (2) true/false; and (3) multiple-choice. Short answer testing was the most reliable. Suggestions are made for the use of short-answer tests in evaluating student knowledge. (SLD)
Training impulsive choices for healthy and sustainable food.

PubMed

Veling, Harm; Chen, Zhang; Tombrock, Merel C; Verpaalen, Iris A M; Schmitz, Laura I; Dijksterhuis, Ap; Holland, Rob W

2017-06-01

Many people find it hard to change their dietary choices. Food choice often occurs impulsively, without deliberation, and it has been unclear whether impulsive food choice can be experimentally created. Across 3 exploratory and 2 confirmatory preregistered experiments we examined whether impulsive food choice can be trained. Participants were cued to make motor responses upon the presentation of, among others, healthy and sustainable food items. They subsequently selected these food items more often for actual consumption when they needed to make their choices impulsively as a result of time pressure. This effect disappeared when participants were asked to think about their choices, merely received more time to make their choices, or when choosing required attention to alternatives. Participants preferred high to low valued food items under time pressure and without time pressure, suggesting that the impulsive choices reflect valid preferences. These findings demonstrate that it is possible to train impulsive choices for food items while leaving deliberative choices for these items unaffected, and connect research on attention training to dual-process theories of decision making. The present research suggests that attention training may lead to behavioral change only when people behave impulsively. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Orthorexia nervosa: validation of a diagnosis questionnaire.

PubMed

Donini, L M; Marsili, D; Graziani, M P; Imbriale, M; Cannella, C

2005-06-01

To validate a questionnaire for the diagnosis of orhorexia oervosa, an eating disorder defined as "maniacal obsession for healthy food". 525 subjects were enrolled. Then they were randomized into two samples (sample of 404 subjects for the construction of the test for the diagnosis of orthorexia ORTO-15; sample of 121 subjects for the validation of the test). The ORTO-15 questionnaire, validated for the diagnosis of orthorexia, is made-up of 15 multiple-choice items. The test we proposed for the diagnosis of orthorexia (ORTO 15) showed a good predictive capability at a threshold value of 40 (efficacy 73.8%, sensitivity 55.6% and specificity 75.8%) also on verification with a control sample. However, it has a limit in identifying the obsessive disorder. For this reason we maintain that further investigation is necessary and that new questions useful for the evaluation of the obsessive-compulsive behavior should be added to the ORTO-15 questionnaire.
How do STEM-interested students pursue multiple interests in their higher educational choice?

NASA Astrophysics Data System (ADS)

Vulperhorst, Jonne Pieter; Wessels, Koen Rens; Bakker, Arthur; Akkerman, Sanne Floor

2018-05-01

Interest in science, technology, engineering and mathematics (STEM) has lately received attention in research due to a gap between the number of STEM students and the needs of the labour market. As interest seems to be one of the most important factors in deciding what to study, we focus in the present study on how STEM-interested students weigh multiple interests in making educational choices. A questionnaire with both open-ended and closed-ended items was administered to 91 STEM-interested students enrolled in a STEM programme of a Dutch University for secondary school students. Results indicate that students find it important that a study programme allows them to pursue multiple interests. Some students pursued multiple interests by choosing to enrol in two programmes at the same time. Most students chose one programme that enabled them to combine multiple interests. Combinations of pursued interests were dependent on the disciplinary range of interests of students. Students who were interested in diverse domains combined interests in an educational programme across academic and non-academic domains, whilst students who were mainly interested in STEM combined only STEM-focused interests. Together these findings stress the importance of taking a multiple interest perspective on interest development and educational choice.
Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

NASA Astrophysics Data System (ADS)

Sukkarieh, Jana Z.

2011-03-01

Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.
Assessing Multiple Choice Question (MCQ) Tests--A Mathematical Perspective

ERIC Educational Resources Information Center

Scharf, Eric M.; Baldwin, Lynne P.

2007-01-01

The reasoning behind popular methods for analysing the raw data generated by multiple choice question (MCQ) tests is not always appreciated, occasionally with disastrous results. This article discusses and analyses three options for processing the raw data produced by MCQ tests. The article shows that one extreme option is not to penalize a…
Piloting a Polychotomous Partial-Credit Scoring Procedure in a Multiple-Choice Test

ERIC Educational Resources Information Center

Tsopanoglou, Antonios; Ypsilandis, George S.; Mouti, Anna

2014-01-01

Multiple-choice (MC) tests are frequently used to measure language competence because they are quick, economical and straightforward to score. While degrees of correctness have been investigated for partially correct responses in combined-response MC tests, degrees of incorrectness in distractors and the role they play in determining the…
Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

ERIC Educational Resources Information Center

van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

2007-01-01

The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…
Dual processing theory and experts' reasoning: exploring thinking on national multiple-choice questions.

PubMed

Durning, Steven J; Dong, Ting; Artino, Anthony R; van der Vleuten, Cees; Holmboe, Eric; Schuwirth, Lambert

2015-08-01

An ongoing debate exists in the medical education literature regarding the potential benefits of pattern recognition (non-analytic reasoning), actively comparing and contrasting diagnostic options (analytic reasoning) or using a combination approach. Studies have not, however, explicitly explored faculty's thought processes while tackling clinical problems through the lens of dual process theory to inform this debate. Further, these thought processes have not been studied in relation to the difficulty of the task or other potential mediating influences such as personal factors and fatigue, which could also be influenced by personal factors such as sleep deprivation. We therefore sought to determine which reasoning process(es) were used with answering clinically oriented multiple-choice questions (MCQs) and if these processes differed based on the dual process theory characteristics: accuracy, reading time and answering time as well as psychometrically determined item difficulty and sleep deprivation. We performed a think-aloud procedure to explore faculty's thought processes while taking these MCQs, coding think-aloud data based on reasoning process (analytic, nonanalytic, guessing or combination of processes) as well as word count, number of stated concepts, reading time, answering time, and accuracy. We also included questions regarding amount of work in the recent past. We then conducted statistical analyses to examine the associations between these measures such as correlations between frequencies of reasoning processes and item accuracy and difficulty. We also observed the total frequencies of different reasoning processes in the situations of getting answers correctly and incorrectly. Regardless of whether the questions were classified as 'hard' or 'easy', non-analytical reasoning led to the correct answer more often than to an incorrect answer. Significant correlations were found between self-reported recent number of hours worked with think-aloud word count and number of concepts used in the reasoning but not item accuracy. When all MCQs were included, 19 % of the variance of correctness could be explained by the frequency of expression of these three think-aloud processes (analytic, nonanalytic, or combined). We found evidence to support the notion that the difficulty of an item in a test is not a systematic feature of the item itself but is always a result of the interaction between the item and the candidate. Use of analytic reasoning did not appear to improve accuracy. Our data suggest that individuals do not apply either System 1 or System 2 but instead fall along a continuum with some individuals falling at one end of the spectrum.
Crows spontaneously exhibit analogical reasoning.

PubMed

Smirnova, Anna; Zorina, Zoya; Obozova, Tanya; Wasserman, Edward

2015-01-19

Analogical reasoning is vital to advanced cognition and behavioral adaptation. Many theorists deem analogical thinking to be uniquely human and to be foundational to categorization, creative problem solving, and scientific discovery. Comparative psychologists have long been interested in the species generality of analogical reasoning, but they initially found it difficult to obtain empirical support for such thinking in nonhuman animals (for pioneering efforts, see [2, 3]). Researchers have since mustered considerable evidence and argument that relational matching-to-sample (RMTS) effectively captures the essence of analogy, in which the relevant logical arguments are presented visually. In RMTS, choice of test pair BB would be correct if the sample pair were AA, whereas choice of test pair EF would be correct if the sample pair were CD. Critically, no items in the correct test pair physically match items in the sample pair, thus demanding that only relational sameness or differentness is available to support accurate choice responding. Initial evidence suggested that only humans and apes can successfully learn RMTS with pairs of sample and test items; however, monkeys have subsequently done so. Here, we report that crows too exhibit relational matching behavior. Even more importantly, crows spontaneously display relational responding without ever having been trained on RMTS; they had only been trained on identity matching-to-sample (IMTS). Such robust and uninstructed relational matching behavior represents the most convincing evidence yet of analogical reasoning in a nonprimate species, as apes alone have spontaneously exhibited RMTS behavior after only IMTS training. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cognitive dissonance resolution is related to episodic memory.

PubMed

Salti, Moti; El Karoui, Imen; Maillet, Mathurin; Naccache, Lionel

2014-01-01

The notion that our past choices affect our future behavior is certainly one of the most influential concepts of social psychology since its first experimental report in the 50 s, and its initial theorization by Festinger within the "cognitive dissonance" framework. Using the free choice paradigm (FCP), it was shown that choosing between two similarly rated items made subjects reevaluate the chosen items as more attractive and the rejected items as less attractive. However, in 2010 a major work by Chen and Risen revealed a severe statistical flaw casting doubt on most previous studies. Izuma and colleagues (2010) supplemented the traditional FCP with original control conditions and concluded that the effect observed could not be solely attributed to this methodological flaw. In the present work we aimed at establishing the existence of genuine choice-induced preference change and characterizing this effect. To do so, we replicated Izuma et al.' study and added a new important control condition which was absent from the original study. Moreover, we added a memory test in order to measure the possible relation between episodic memory of choices and observed behavioral effects. In two experiments we provide experimental evidence supporting genuine choice-induced preference change obtained with FCP. We also contribute to the understanding of the phenomenon by showing that choice-induced preference change effects are strongly correlated with episodic memory.
Cognitive Dissonance Resolution Is Related to Episodic Memory

PubMed Central

Maillet, Mathurin; Naccache, Lionel

2014-01-01

The notion that our past choices affect our future behavior is certainly one of the most influential concepts of social psychology since its first experimental report in the 50 s, and its initial theorization by Festinger within the “cognitive dissonance” framework. Using the free choice paradigm (FCP), it was shown that choosing between two similarly rated items made subjects reevaluate the chosen items as more attractive and the rejected items as less attractive. However, in 2010 a major work by Chen and Risen revealed a severe statistical flaw casting doubt on most previous studies. Izuma and colleagues (2010) supplemented the traditional FCP with original control conditions and concluded that the effect observed could not be solely attributed to this methodological flaw. In the present work we aimed at establishing the existence of genuine choice-induced preference change and characterizing this effect. To do so, we replicated Izuma et al.’ study and added a new important control condition which was absent from the original study. Moreover, we added a memory test in order to measure the possible relation between episodic memory of choices and observed behavioral effects. In two experiments we provide experimental evidence supporting genuine choice-induced preference change obtained with FCP. We also contribute to the understanding of the phenomenon by showing that choice-induced preference change effects are strongly correlated with episodic memory. PMID:25264950

Post-Graduate Student Performance in "Supervised In-Class" vs. "Unsupervised Online" Multiple Choice Tests: Implications for Cheating and Test Security

ERIC Educational Resources Information Center

Ladyshewsky, Richard K.

2015-01-01

This research explores differences in multiple choice test (MCT) scores in a cohort of post-graduate students enrolled in a management and leadership course. A total of 250 students completed the MCT in either a supervised in-class paper and pencil test or an unsupervised online test. The only statistically significant difference between the nine…
Development of a cross-cultural item bank for measuring quality of life related to mental health in multiple sclerosis patients.

PubMed

Michel, Pierre; Auquier, Pascal; Baumstarck, Karine; Pelletier, Jean; Loundou, Anderson; Ghattas, Badih; Boyer, Laurent

2015-09-01

Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL). Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features. A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics. This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.
Do Sequentially-Presented Answer Options Prevent the Use of Testwiseness Cues on Continuing Medical Education Tests?

ERIC Educational Resources Information Center

Willing, Sonja; Ostapczuk, Martin; Musch, Jochen

2015-01-01

Testwiseness--that is, the ability to find subtle cues towards the solution by the simultaneous comparison of the available answer options--threatens the validity of multiple-choice (MC) tests. Discrete-option multiple-choice (DOMC) has recently been proposed as a computerized alternative testing format for MC tests, and presumably allows for a…
Backwash Effects of Language-Testing in Primary and Secondary Education.

ERIC Educational Resources Information Center

Wesdorp, H.

A debate has been carried on in Dutch educational circles about the widespread use of multiple-choice tests, and a number of objections have been raised against the use of such tests. This paper reports on research into the validity of the objections, in particular with respect to the possible effect of multiple-choice tests on the teaching of…
Second Language Reading Topic Familiarity and Test Score: Test-Taking Strategies for Multiple-Choice Comprehension Questions

ERIC Educational Resources Information Center

Lee, Jia-Ying

2011-01-01

The main purpose of this study was to compare the strategies used by Chinese-speaking students when confronted with familiar versus unfamiliar topics in a multiple-choice format reading comprehension test. The focus was on describing what students do when they are taking reading comprehension tests by asking students to verbalize their thoughts.…
A multi-level differential item functioning analysis of trends in international mathematics and science study: Potential sources of gender and minority difference among U.S. eighth graders' science achievement

NASA Astrophysics Data System (ADS)

Qian, Xiaoyu

Science is an area where a large achievement gap has been observed between White and minority, and between male and female students. The science minority gap has continued as indicated by the National Assessment of Educational Progress and the Trends in International Mathematics and Science Studies (TIMSS). TIMSS also shows a gender gap favoring males emerging at the eighth grade. Both gaps continue to be wider in the number of doctoral degrees and full professorships awarded (NSF, 2008). The current study investigated both minority and gender achievement gaps in science utilizing a multi-level differential item functioning (DIF) methodology (Kamata, 2001) within fully Bayesian framework. All dichotomously coded items from TIMSS 2007 science assessment at eighth grade were analyzed. Both gender DIF and minority DIF were studied. Multi-level models were employed to identify DIF items and sources of DIF at both student and teacher levels. The study found that several student variables were potential sources of achievement gaps. It was also found that gender DIF favoring male students was more noticeable in the content areas of physics and earth science than biology and chemistry. In terms of item type, the majority of these gender DIF items were multiple choice than constructed response items. Female students also performed less well on items requiring visual-spatial ability. Minority students performed significantly worse on physics and earth science items as well. A higher percentage of minority DIF items in earth science and biology were constructed response than multiple choice items, indicating that literacy may be the cause of minority DIF. Three-level model results suggested that some teacher variables may be the cause of DIF variations from teacher to teacher. It is essential for both middle school science teachers and science educators to find instructional methods that work more effectively to improve science achievement of both female and minority students. Physics and earth science are two areas to be improved for both groups. Curriculum and instruction need to enhance female students' learning interests and give them opportunities to improve their visual perception skills. Science instruction should address improving minority students' literacy skills while teaching science.
Effect of differing PowerPoint slide design on multiple-choice test scores for assessment of knowledge and retention in a theriogenology course.

PubMed

Root Kustritz, Margaret V

2014-01-01

Third-year veterinary students in a required theriogenology diagnostics course were allowed to self-select attendance at a lecture in either the evening or the next morning. One group was presented with PowerPoint slides in a traditional format (T group), and the other group was presented with PowerPoint slides in the assertion-evidence format (A-E group), which uses a single sentence and a highly relevant graphic on each slide to ensure attention is drawn to the most important points in the presentation. Students took a multiple-choice pre-test, attended lecture, and then completed a take-home assignment. All students then completed an online multiple-choice post-test and, one month later, a different online multiple-choice test to evaluate retention. Groups did not differ on pre-test, assignment, or post-test scores, and both groups showed significant gains from pre-test to post-test and from pre-test to retention test. However, the T group showed significant decline from post-test to retention test, while the A-E group did not. Short-term differences between slide designs were most likely unaffected due to required coursework immediately after lecture, but retention of material was superior with the assertion-evidence slide design.
The Impact of Television on Public Environmental Knowledge Concerning the Great Lakes.

ERIC Educational Resources Information Center

Brothers, Christine C.

The purpose of this study was to collect baseline information about public knowledge of and opinions toward the Great Lakes and to measure the impact of a television news program in educating adults about the Great Lakes. Survey questionnaires containing multiple-choice knowledge items and Likert scale opinion statements were completed by 570…
An Odds Ratio Approach for Detecting DDF under the Nested Logit Modeling Framework

ERIC Educational Resources Information Center

Terzi, Ragip; Suh, Youngsuk

2015-01-01

An odds ratio approach (ORA) under the framework of a nested logit model was proposed for evaluating differential distractor functioning (DDF) in multiple-choice items and was compared with an existing ORA developed under the nominal response model. The performances of the two ORAs for detecting DDF were investigated through an extensive…
Data Collection Design for Equivalent Groups Equating: Using a Matrix Stratification Framework for Mixed-Format Assessment

ERIC Educational Resources Information Center

Mbella, Kinge Keka

2012-01-01

Mixed-format assessments are increasingly being used in large scale standardized assessments to measure a continuum of skills ranging from basic recall to higher order thinking skills. These assessments are usually comprised of a combination of (a) multiple-choice items which can be efficiently scored, have stable psychometric properties, and…
Youth Risk Behavior Survey Results, 1995. Executive Summary.

ERIC Educational Resources Information Center

New Hampshire State Dept. of Education, Concord.

An 84-item multiple choice Youth Risk Behavior Survey was administered to 2,092 students in 62 public high schools in New Hampshire during the spring of 1995. The survey covered behaviors in six categories: (1) behaviors that result in unintentional or intentional injuries; (2) tobacco use; (3) alcohol and other drug use; (4) sexual behaviors that…
Development and Analysis of an Instrument to Assess Student Understanding of GOB Chemistry Knowledge Relevant to Clinical Nursing Practice

ERIC Educational Resources Information Center

Brown, Corina E.; Hyslop, Richard M.; Barbera, Jack

2015-01-01

The General, Organic, and Biological Chemistry Knowledge Assessment (GOB-CKA) is a multiple-choice instrument designed to assess students' understanding of the chemistry topics deemed important to clinical nursing practice. This manuscript describes the development process of the individual items along with a psychometric evaluation of the…
Drawing and Using Free Body Diagrams: Why It May Be Better Not to Decompose Forces

ERIC Educational Resources Information Center

Aviani, Ivica; Erceg, Nataša; Mešic, Vanes

2015-01-01

In this study we investigated how two different approaches to drawing free body diagrams influence the development of students' understanding of Newton's laws, including their ability to identify real forces. For this purpose we developed a 12-item two-tier multiple choice survey and conducted a quasiexperiment. This experiment included two groups…
Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 5. Technical Report #1204

ERIC Educational Resources Information Center

Park, Bitnara Jasmine; Irvin, P. Shawn; Lai, Cheng-Fei; Alonzo, Julie; Tindal, Gerald

2012-01-01

In this technical report, we present the results of a reliability study of the fifth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 2. Technical Report #1201

ERIC Educational Resources Information Center

Lai, Cheng-Fei; Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Tindal, Gerald

2012-01-01

In this technical report, we present the results of a reliability study of the second-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 4. Technical Report #1203

ERIC Educational Resources Information Center

Park, Bitnara Jasmine; Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Tindal, Gerald

2012-01-01

In this technical report, we present the results of a reliability study of the fourth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 6. Technical Report #1205

ERIC Educational Resources Information Center

Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Lai, Cheng-Fei; Tindal, Gerald

2012-01-01

In this technical report, we present the results of a reliability study of the sixth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 7. Technical Report #1206

ERIC Educational Resources Information Center

Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Park, Bitnara Jasmine; Tindal, Gerald

2012-01-01

In this technical report, we present the results of a reliability study of the seventh-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 3. Technical Report #1202

ERIC Educational Resources Information Center

Lai, Cheng-Fei; Irvin, P. Shawn; Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald

2012-01-01

In this technical report, we present the results of a reliability study of the third-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Uncovering Students' Incorrect Ideas about Foundational Concepts for Biochemistry

ERIC Educational Resources Information Center

Villafane, Sachel M.; Loertscher, Jennifer; Minderhout, Vicky; Lewis, Jennifer E.

2011-01-01

This paper presents preliminary data on how an assessment instrument with a unique structure can be used to identify common incorrect ideas from prior coursework at the beginning of a biochemistry course, and to determine whether these ideas have changed by the end of the course. The twenty-one multiple-choice items address seven different…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.