Science.gov

Sample records for assessment item format

  1. Formative Assessment in High School Chemistry Teaching: Investigating the Alignment of Teachers' Goals with Their Items

    ERIC Educational Resources Information Center

    Sandlin, Benjamin; Harshman, Jordan; Yezierski, Ellen

    2015-01-01

    A 2011 report by the Department of Education states that understanding how teachers use results from formative assessments to guide their practice is necessary to improve instruction. Chemistry teachers have goals for items in their formative assessments, but the degree of alignment between what is assessed by these items and the teachers' goals…

  2. Formative Assessment in High School Chemistry Teaching: Investigating the Alignment of Teachers' Goals with Their Items

    ERIC Educational Resources Information Center

    Sandlin, Benjamin; Harshman, Jordan; Yezierski, Ellen

    2015-01-01

    A 2011 report by the Department of Education states that understanding how teachers use results from formative assessments to guide their practice is necessary to improve instruction. Chemistry teachers have goals for items in their formative assessments, but the degree of alignment between what is assessed by these items and the teachers' goals…

  3. An Evaluation of Forced-Choice and True-False Item Formats in Personality Assessment.

    ERIC Educational Resources Information Center

    Jackson, Douglas N.; And Others

    In a comparative evaluation of a standard true-false format for personality assessment and a forced-choice format, subjects from college residential units were assigned randomly to respond either to the forced-choice or standard true-false form of the Personality Research Form (PRF). All subjects also rated themselves and the members of their…

  4. An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

    ERIC Educational Resources Information Center

    Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

    2013-01-01

    Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…

  5. Assessment of differential item functioning.

    PubMed

    Wang, Wen-Chung

    2008-01-01

    This study addresses several important issues in assessment of differential item functioning (DIF). It starts with the definition of DIF, effectiveness of using item fit statistics to detect DIF, and linear modeling of DIF in dichotomous items, polytomous items, facets, and testlet-based items. Because a common metric over groups of test-takers is a prerequisite in DIF assessment, this study reviews three such methods of establishing a common metric: the equal-mean-difficulty method, the all-other-item method, and the constant-item (CI) method. A small simulation demonstrates the superiority of the CI method over the others. As the CI method relies on a correct specification of DIF-free items to serve as anchors, a method of identifying such items is recommended and its effectiveness is illustrated through a simulation. Finally, this study discusses how to assess practical significance of DIF at both item and test levels.

  6. Caries Risk Assessment Item Importance

    PubMed Central

    Chaffee, B.W.; Featherstone, J.D.B.; Gansky, S.A.; Cheng, J.; Zhan, L.

    2016-01-01

    Caries risk assessment (CRA) is widely recommended for dental caries management. Little is known regarding how practitioners use individual CRA items to determine risk and which individual items independently predict clinical outcomes in children younger than 6 y. The objective of this study was to assess the relative importance of pediatric CRA items in dental providers’ decision making regarding patient risk and in association with clinically evident caries, cross-sectionally and longitudinally. CRA information was abstracted retrospectively from electronic patient records of children initially aged 6 to 72 mo at a university pediatric dentistry clinic (n = 3,810 baseline; n = 1,315 with follow-up). The 17-item CRA form included caries risk indicators, caries protective items, and clinical indicators. Conditional random forests classification trees were implemented to identify and assign variable importance to CRA items independently associated with baseline high-risk designation, baseline evident tooth decay, and follow-up evident decay. Thirteen individual CRA items, including all clinical indicators and all but 1 risk indicator, were independently and statistically significantly associated with student/resident providers’ caries risk designation. Provider-assigned baseline risk category was strongly associated with follow-up decay, which increased from low (20.4%) to moderate (30.6%) to high/extreme risk patients (68.7%). Of baseline CRA items, before adjustment, 12 were associated with baseline decay and 7 with decay at follow-up; however, in the conditional random forests models, only the clinical indicators (evident decay, dental plaque, and recent restoration placement) and 1 risk indicator (frequent snacking) were independently and statistically significantly associated with future disease, for which baseline evident decay was the strongest predictor. In this predominantly high-risk population under caries-preventive care, more individual CRA items

  7. MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin

    2010-01-01

    Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…

  8. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    PubMed

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

  9. Assessing the acquisition of requesting a variety of preferred items using different speech generating device formats for children with autism spectrum disorder.

    PubMed

    Gevarter, Cindy; O'Reilly, Mark F; Kuhn, Michelle; Watkins, Laci; Ferguson, Raechal; Sammarco, Nicolette; Rojeski, Laura; Sigafoos, Jeff

    2016-07-22

    Five children with autism spectrum disorder (ASD) were taught to request preferred items using four different augmentative and alternative communication (AAC) displays on an iPad(®)-based speech-generating device (SGD). Acquisition was compared using multi-element designs. Displays included a symbol-based grid, a photo image with embedded hotspots, a hybrid (photo image with embedded hotspots and symbols), and a pop-up symbol grid. Three participants mastered requesting items from a field of four with at least three displays, and one mastered requesting items in a field of two. The fifth participant did not acquire requests in a field of preferred items. Individualized display effects were present, and the photo image appeared to have provided the most consistent advantages for three participants. Some errors were more or less common with specific displays and/or participants. The results have important implications for AAC assessment and implementation protocols.

  10. Determining the Representation of Constructed Response Items in Mixed-Item Format Exams.

    ERIC Educational Resources Information Center

    Sykes, Robert C.; Truskosky, Denise; White, Hillory

    The purpose of this research was to study the effect of the three different ways of increasing the number of points contributed by constructed response (CR) items on the reliability of test scores from mixed-item-format tests. The assumption of unidimensionality that underlies the accuracy of item response theory model-based standard error…

  11. Item Response Theory for Peer Assessment

    ERIC Educational Resources Information Center

    Uto, Masaki; Ueno, Maomi

    2016-01-01

    As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

  12. Estimating the Reliability of a Test Containing Multiple Item Formats.

    ERIC Educational Resources Information Center

    Qualls, Audrey L.

    1995-01-01

    Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)

  13. A Multilevel Assessment of Differential Item Functioning.

    ERIC Educational Resources Information Center

    Shen, Linjun

    A multilevel approach was proposed for the assessment of differential item functioning and compared with the traditional logistic regression approach. Data from the Comprehensive Osteopathic Medical Licensing Examination for 2,300 freshman osteopathic medical students were analyzed. The multilevel approach used three-level hierarchical generalized…

  14. The Fantastic Four of Mathematics Assessment Items

    ERIC Educational Resources Information Center

    Greenlees, Jane

    2011-01-01

    In this article, the author makes reference to four comic book characters to make the point that together they are a formidable team, but on their own they are vulnerable. She examines the four components of mathematics assessment items and the need for implicit instruction within the classroom for student success. Just like the "Fantastic Four"…

  15. Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

    ERIC Educational Resources Information Center

    Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

    2013-01-01

    We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…

  16. Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

    ERIC Educational Resources Information Center

    Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

    2013-01-01

    We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…

  17. Systematic item selection process applied to developing item pools for assessing multiple mental health problems.

    PubMed

    Batterham, Philip J; Brewer, Jacqueline L; Tjhin, Angeline; Sunderland, Matthew; Carragher, Natacha; Calear, Alison L

    2015-08-01

    Given high rates of comorbidity among mental disorders, better methods to rapidly screen across multiple mental disorders are needed. Building on existing Patient Reported Outcomes Measurement Information System (PROMIS) item banks, the present study aimed to select items to assess panic disorder, social anxiety disorder, obsessive-compulsive disorder, posttraumatic stress disorder, adult attention-deficit hyperactivity disorder, substance use disorder, suicidal thoughts and behaviors, and psychosis. A four-stage process to select items involved systematic literature searches, item refinement and standardization, obtaining feedback from consumers and experts, and reduction of item pools in preparation for calibration in a population-based sample. From 6,900 items collected across the eight mental health conditions, 2,002 were standardized and rated by small groups of consumers and experts. Expert ratings of item relevance tended to correlate moderately with consumer ratings, with variation across conditions. An algorithm was used to generate final item pools ranging from 45 to 75 items. The study successfully applied a systematic process to select items for assessing a range of mental disorders. This process for item selection may be applied to additional mental and physical health conditions. The calibration of the present item pools into final item banks will enable the development of flexible measures to assess risk of mental health problems, although more effectively accounting for comorbidity. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Descriptive and Inferential Procedures for Assessing Differential Item Functioning in Polytomous Items.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Thayer, Dorothy T.; Mazzeo, John

    1997-01-01

    Differential item functioning (DIF) assessment procedures for items with more than two ordered score categories, referred to as polytomous items, were evaluated. Three descriptive statistics (standardized mean difference and two procedures based on the SIBTEST computer program) and five inferential procedures were used. Conditions under which the…

  19. Descriptive and Inferential Procedures for Assessing Differential Item Functioning in Polytomous Items.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Thayer, Dorothy T.; Mazzeo, John

    1997-01-01

    Differential item functioning (DIF) assessment procedures for items with more than two ordered score categories, referred to as polytomous items, were evaluated. Three descriptive statistics (standardized mean difference and two procedures based on the SIBTEST computer program) and five inferential procedures were used. Conditions under which the…

  20. Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd

    The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…

  1. Using Mutual Information for Adaptive Item Comparison and Student Assessment

    ERIC Educational Resources Information Center

    Liu, Chao-Lin

    2005-01-01

    The author analyzes properties of mutual information between dichotomous concepts and test items. The properties generalize some common intuitions about item comparison, and provide principled foundations for designing item-selection heuristics for student assessment in computer-assisted educational systems. The proposed item-selection strategies…

  2. Effects of Question Formats on Student and Item Performance

    PubMed Central

    Pate, Adam N.

    2013-01-01

    Objective. To determine the effect of 3 variations in test item format on item statistics and student performance. Methods. Fifteen pairs of directly comparable test questions were written to adhere to (standard scale) or deviate from (nonstandard scale) 3 specific item-writing guidelines. Differences in item difficulty and discrimination were measured between the 2 scales as a whole and for each guideline individually. Student performance was also compared between the 2 scales. Results. The nonstandard scale was 12.7 points more difficult than the standard scale (p=0.03). The guideline to avoid “none of the above” was the only 1 of the 3 guidelines to demonstrate significance. Students scored 53.6% and 41.3% (p<0.001) of total points on the standard and nonstandard scales, respectively. Conclusions. Nonstandard test items were more difficult for students to answer correctly than the standard test items, provided no enhanced ability to discriminate between higher- and lower-performing students, and resulted in poorer student performance. Item-writing guidelines should be considered during test construction. PMID:23716739

  3. Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts

    ERIC Educational Resources Information Center

    Boo, Hong Kwen

    2007-01-01

    Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…

  4. Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts

    ERIC Educational Resources Information Center

    Boo, Hong Kwen

    2007-01-01

    Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…

  5. Item Response Methods for Educational Assessment.

    ERIC Educational Resources Information Center

    Mislevy, Robert J.; Rieser, Mark R.

    Multiple matrix sampling (MMS) theory indicates how data may be gathered to most efficiently convey information about levels of attainment in a population, but standard analyses of these data require random sampling of items from a fixed pool of items. This assumption proscribes the retirement of flawed or obsolete items from the pool as well as…

  6. Item Response Methods for Educational Assessment.

    ERIC Educational Resources Information Center

    Mislevy, Robert J.; Rieser, Mark R.

    Multiple matrix sampling (MMS) theory indicates how data may be gathered to most efficiently convey information about levels of attainment in a population, but standard analyses of these data require random sampling of items from a fixed pool of items. This assumption proscribes the retirement of flawed or obsolete items from the pool as well as…

  7. Assessing the Psychometric Properties of Alternative Items for Certification.

    PubMed

    Krogh, Mary Anne; Muckle, Timothy

    Alternative items were added as scored items to the National Certification Examination for Nurse Anesthetists (NCE) in 2010. A common concern related to the new items has been their measurement attributes. This study was undertaken to evaluate the psychometric impact of adding these items to the examination. Candidates had a significantly higher ability estimate in alternative items than in multiple choice questions and 6.7 percent of test candidates performed significantly differently in alternative item formats. The ability estimates of multiple choice questions correlated at r = .58. The alternative items took significantly longer time to answer than standard multiple choice questions and discriminated to a higher degree than MCQs. The alternative items exhibited unidimensionality to the same degree as MCQs and the BIC confirmed the Rasch model as acceptable for scoring. The new item types were found to have acceptable attributes for inclusion in the certification program.

  8. Classification Accuracy of Mixed Format Tests: A Bi-Factor Item Response Theory Approach

    PubMed Central

    Wang, Wei; Drasgow, Fritz; Liu, Liwen

    2016-01-01

    Mixed format tests (e.g., a test consisting of multiple-choice [MC] items and constructed response [CR] items) have become increasingly popular. However, the latent structure of item pools consisting of the two formats is still equivocal. Moreover, the implications of this latent structure are unclear: For example, do constructed response items tap reasoning skills that cannot be assessed with multiple choice items? This study explored the dimensionality of mixed format tests by applying bi-factor models to 10 tests of various subjects from the College Board's Advanced Placement (AP) Program and compared the accuracy of scores based on the bi-factor analysis with scores derived from a unidimensional analysis. More importantly, this study focused on a practical and important question—classification accuracy of the overall grade on a mixed format test. Our findings revealed that the degree of multidimensionality resulting from the mixed item format varied from subject to subject, depending on the disattenuated correlation between scores from MC and CR subtests. Moreover, remarkably small decrements in classification accuracy were found for the unidimensional analysis when the disattenuated correlations exceeded 0.90. PMID:26973568

  9. Test item linguistic complexity and assessments for deaf students.

    PubMed

    Cawthon, Stephanie

    2011-01-01

    Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.

  10. Analysis of Differential Item Functioning in the NAEP History Assessment.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Ercikan, Kadriye

    The Mantel-Haenszel approach for investigating differential item functioning (DIF) was applied to U.S. history items that were administered as part of the National Assessment of Educational Progress (NAEP). DIF analyses were based on the responses of 7,743 students in grade 11. On some items, Blacks, Hispanics, and females performed more poorly…

  11. Assessing Scientific Reasoning: A Comprehensive Evaluation of Item Features That Affect Item Difficulty

    ERIC Educational Resources Information Center

    Stiller, Jurik; Hartmann, Stefan; Mathesius, Sabrina; Straube, Philipp; Tiemann, Rüdiger; Nordmeier, Volkhard; Krüger, Dirk; Upmeier zu Belzen, Annette

    2016-01-01

    The aim of this study was to improve the criterion-related test score interpretation of a text-based assessment of scientific reasoning competencies in higher education by evaluating factors which systematically affect item difficulty. To provide evidence about the specific demands which test items of various difficulty make on pre-service…

  12. Assessing Scientific Reasoning: A Comprehensive Evaluation of Item Features That Affect Item Difficulty

    ERIC Educational Resources Information Center

    Stiller, Jurik; Hartmann, Stefan; Mathesius, Sabrina; Straube, Philipp; Tiemann, Rüdiger; Nordmeier, Volkhard; Krüger, Dirk; Upmeier zu Belzen, Annette

    2016-01-01

    The aim of this study was to improve the criterion-related test score interpretation of a text-based assessment of scientific reasoning competencies in higher education by evaluating factors which systematically affect item difficulty. To provide evidence about the specific demands which test items of various difficulty make on pre-service…

  13. Applying Item Response Theory methods to design a learning progression-based science assessment

    NASA Astrophysics Data System (ADS)

    Chen, Jing

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all

  14. Development and assessment of floor and ceiling items for the PROMIS physical function item bank

    PubMed Central

    2013-01-01

    Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at

  15. Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

    ERIC Educational Resources Information Center

    Arendasy, Martin E.; Sommer, Markus

    2012-01-01

    The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…

  16. Assessing the quality of multiple-choice test items.

    PubMed

    Clifton, Sandra L; Schriner, Cheryl L

    2010-01-01

    With the focus of nursing education geared toward teaching students to think critically, faculty need to assure that test items require students to use a high level of cognitive processing. To evaluate their examinations, the authors assessed multiple-choice test items on final nursing examinations. The assessment included determining cognitive learning levels and frequency of items among 3 adult health courses, comparing difficulty values with cognitive learning levels, and examining discrimination values and the relationship to distracter performance.

  17. A generalized item response tree model for psychological assessments.

    PubMed

    Jeon, Minjeong; De Boeck, Paul

    2016-09-01

    A new item response theory (IRT) model with a tree structure has been introduced for modeling item response processes with a tree structure. In this paper, we present a generalized item response tree model with a flexible parametric form, dimensionality, and choice of covariates. The utilities of the model are demonstrated with two applications in psychological assessments for investigating Likert scale item responses and for modeling omitted item responses. The proposed model is estimated with the freely available R package flirt (Jeon et al., 2014b).

  18. Scaling Performance Assessments: Strategies for Managing Local Item Dependence.

    ERIC Educational Resources Information Center

    Yen, Wendy M.

    1993-01-01

    Results from the Maryland School Performance Assessment Program for 5,392 elementary school students and from the Comprehensive Tests of Basic Skills (multiple choice) for a national sample are used to explore local item independence (LID) of test items. Some strategies are suggested for measuring LID in performance assessments. (SLD)

  19. Item Feature Effects in Evolution Assessment

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Ha, Minsu

    2011-01-01

    Despite concerted efforts by science educators to understand patterns of evolutionary reasoning in science students and teachers, the vast majority of evolution education studies have failed to carefully consider or control for item feature effects in knowledge measurement. Our study explores whether robust contextualization patterns emerge within…

  20. Item Feature Effects in Evolution Assessment

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Ha, Minsu

    2011-01-01

    Despite concerted efforts by science educators to understand patterns of evolutionary reasoning in science students and teachers, the vast majority of evolution education studies have failed to carefully consider or control for item feature effects in knowledge measurement. Our study explores whether robust contextualization patterns emerge within…

  1. The Relationship between State High School Exit Exams and Mathematical Proficiency: Analyses of the Complexity, Content, and Format of Items and Assessment Protocols

    ERIC Educational Resources Information Center

    Regan, Blake B.

    2012-01-01

    This study examined the relationship between high school exit exams and mathematical proficiency. With the No Child Left Behind (NCLB) Act requiring all students to be proficient in mathematics by 2014, it is imperative that high-stakes assessments accurately evaluate all aspects of student achievement, appropriately set the yardstick by which…

  2. Using cognitive interviewing for test items to assess physical function in children with cerebral palsy.

    PubMed

    Dumas, Helene M; Watson, Kyle; Fragala-Pinkham, Maria A; Haley, Stephen M; Bilodeau, Nathalie; Montpetit, Kathleen; Gorton, George E; Mulcahey, M J; Tucker, Carole A

    2008-01-01

    The purpose of this study was to assess the content, format, and comprehension of test items and responses developed for use in a computer adaptive test (CAT) of physical function for children with cerebral palsy (CP). After training in cognitive interviewing techniques, investigators defined item intent and developed questions for each item. Parents of children with CP (n = 27) participated in interviews probing item meaning, item wording, and response choice adequacy and appropriateness. Qualitative analysis identified 3 themes: item clarity; relevance, context, and attribution; and problems with wording or tone. Parents reported the importance of delineating task components, assistance amount, and environmental context. Cognitive interviewing provided valuable information about the validity of new items and insight to improve relevance and context. We believe that the development of CATs in pediatric rehabilitation may ultimately reduce the impact of the issues identified.

  3. Multi-item direct behavior ratings: Dependability of two levels of assessment specificity.

    PubMed

    Volpe, Robert J; Briesch, Amy M

    2015-09-01

    Direct Behavior Rating-Multi-Item Scales (DBR-MIS) have been developed as formative measures of behavioral assessment for use in school-based problem-solving models. Initial research has examined the dependability of composite scores generated by summing all items comprising the scales. However, it has been argued that DBR-MIS may offer assessment of 2 levels of behavioral specificity (i.e., item-level, global composite-level). Further, it has been argued that scales can be individualized for each student to improve efficiency without sacrificing technical characteristics. The current study examines the dependability of 5 items comprising a DBR-MIS designed to measure classroom disruptive behavior. A series of generalizability theory and decision studies were conducted to examine the dependability of each item (calls out, noisy, clowns around, talks to classmates and out of seat), as well as a 3-item composite that was individualized for each student. Seven graduate students rated the behavior of 9 middle-school students on each item over 3 occasions. Ratings were based on 10-min video clips of students during mathematics instruction. Separate generalizability and decision studies were conducted for each item and for a 3-item composite that was individualized for each student based on the highest rated items on the first rating occasion. Findings indicate favorable dependability estimates for 3 of the 5 items and exceptional dependability estimates for the individualized composite.

  4. Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

    ERIC Educational Resources Information Center

    Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

    2013-01-01

    Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…

  5. Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

    ERIC Educational Resources Information Center

    Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

    2013-01-01

    Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…

  6. Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

    ERIC Educational Resources Information Center

    Wang, Wei

    2013-01-01

    Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…

  7. Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

    ERIC Educational Resources Information Center

    Wang, Wei

    2013-01-01

    Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…

  8. Analysis of Differential Item Functioning in the NAEP History Assessment.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Ercikan, Kadriye

    1989-01-01

    The Mantel-Haenszel approach for investigating differential item functioning (DIF) was applied to United States history items within the 1986 National Assessment of Educational Progress administered to 7,812 11th graders. DIF analyses were based on responses of 7,743 11th graders. Results concerning sex and racial differences and ethnicity are…

  9. Assessment of Differential Item Functioning for Performance Tasks.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; And Others

    1993-01-01

    Two extensions of the Mantel Haenszel procedure that may be useful in assessing differential item functioning (DIF) are explored. Simulation results showed that, for both inferential procedures, the studied item should be included in the matching variable, as in the dichotomous case. (SLD)

  10. Cooperative Industrial/Vocational Education. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; Elias, Julie Whitaker

    This document contains multiple-choice test items and assessment techniques in the form of instructional management plans for Missouri's cooperative industrial-vocational education core curriculum. The test items and techniques are relevant to these 15 occupational duties: (1) career research and planning; (2) computer awareness; (3) employment…

  11. Factor Analytic Procedures for Assessing Social Desirability in Binary Items

    ERIC Educational Resources Information Center

    Ferrando, Pere J.

    2005-01-01

    This article proposes and describes factor-analytic procedures for assessing and controlling socially desirable responding in binary personality items. The basic procedures are applications of the restricted (confirmatory) item factor analysis model for ordered-categorical variables. Orthogonal and oblique solutions based on marker variables are…

  12. Assessing Existing Item Bank Depth for Computer Adaptive Testing.

    ERIC Educational Resources Information Center

    Bergstrom, Betty A.; Stahl, John A.

    This paper reports a method for assessing the adequacy of existing item banks for computer adaptive testing. The method takes into account content specifications, test length, and stopping rules, and can be used to determine if an existing item bank is adequate to administer a computer adaptive test efficiently across differing levels of examinee…

  13. Item Response Theory Models for Wording Effects in Mixed-Format Scales

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

    2015-01-01

    Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…

  14. Item Response Theory Models for Wording Effects in Mixed-Format Scales

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

    2015-01-01

    Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…

  15. Assessing Differential Item Functioning in Performance Tests.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; And Others

    Although the belief has been expressed that performance assessments are intrinsically more fair than multiple-choice measures, some forms of performance assessment may in fact be more likely than conventional tests to tap construct-irrelevant factors. As performance assessment grows in popularity, it will be increasingly important to monitor the…

  16. The Effects of Item Preview on Video-Based Multiple-Choice Listening Assessments

    ERIC Educational Resources Information Center

    Koyama, Dennis; Sun, Angela; Ockey, Gary J.

    2016-01-01

    Multiple-choice formats remain a popular design for assessing listening comprehension, yet no consensus has been reached on how multiple-choice formats should be employed. Some researchers argue that test takers must be provided with a preview of the items prior to the input (Buck, 1995; Sherman, 1997); others argue that a preview may decrease the…

  17. The Effects of Item Preview on Video-Based Multiple-Choice Listening Assessments

    ERIC Educational Resources Information Center

    Koyama, Dennis; Sun, Angela; Ockey, Gary J.

    2016-01-01

    Multiple-choice formats remain a popular design for assessing listening comprehension, yet no consensus has been reached on how multiple-choice formats should be employed. Some researchers argue that test takers must be provided with a preview of the items prior to the input (Buck, 1995; Sherman, 1997); others argue that a preview may decrease the…

  18. The Impact of Test Dimensionality, Common-Item Set Format, and Scale Linking Methods on Mixed-Format Test Equating

    ERIC Educational Resources Information Center

    Öztürk-Gübes, Nese; Kelecioglu, Hülya

    2016-01-01

    The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…

  19. Formative Assessment Probes

    ERIC Educational Resources Information Center

    Eberle, Francis; Keeley, Page

    2008-01-01

    Formative assessment probes can be effective tools to help teachers build a bridge between students' initial ideas and scientific ones. In this article, the authors describe how using two formative assessment probes can help teachers determine the extent to which students make similar connections between developing a concept of matter and a…

  20. Formative Assessment in Context

    ERIC Educational Resources Information Center

    Oxenford-O'Brian, Julie

    2013-01-01

    This dissertation responds to critical gaps in current research on formative assessment practice which could limit successful implementation of this practice within the K-12 classroom context. The study applies a socio cultural perspective of learning to interpret a cross-case analysis of formative assessment practice occurring during one…

  1. Formative Assessment Probes

    ERIC Educational Resources Information Center

    Eberle, Francis; Keeley, Page

    2008-01-01

    Formative assessment probes can be effective tools to help teachers build a bridge between students' initial ideas and scientific ones. In this article, the authors describe how using two formative assessment probes can help teachers determine the extent to which students make similar connections between developing a concept of matter and a…

  2. Formative Assessment in Context

    ERIC Educational Resources Information Center

    Oxenford-O'Brian, Julie

    2013-01-01

    This dissertation responds to critical gaps in current research on formative assessment practice which could limit successful implementation of this practice within the K-12 classroom context. The study applies a socio cultural perspective of learning to interpret a cross-case analysis of formative assessment practice occurring during one…

  3. The 4-Item Negative Symptom Assessment (NSA-4) Instrument

    PubMed Central

    Morlock, Robert; Coon, Cheryl; van Willigenburg, Arjen; Panagides, John

    2010-01-01

    Objective. To assess the ability of mental health professionals to use the 4-item Negative Symptom Assessment instrument, derived from the Negative Symptom Assessment-16, to rapidly determine the severity of negative symptoms of schizophrenia. Design. Open participation. Setting. Medical education conferences. Participants. Attendees at two international psychiatry conferences. Measurements. Participants read a brief set of the 4-item Negative Symptom Assessment instructions and viewed a videotape of a patient with schizophrenia. Using the 1 to 6 4-item Negative Symptom Assessment severity rating scale, they rated four negative symptom items and the overall global negative symptoms. These ratings were compared with a consensus rating determination using frequency distributions and Chi-square tests for the proportion of participant ratings that were within one point of the expert rating. Results. More than 400 medical professionals (293 physicians, 50% with a European practice, and 55% who reported past utilization of schizophrenia ratings scales) participated. Between 82.1 and 91.1 percent of the 4-items and the global rating determinations by the participants were within one rating point of the consensus expert ratings. The differences between the percentage of participant rating scores that were within one point versus the percentage that were greater than one point different from those by the consensus experts was significant (p<0.0001). Participants rating of negative symptoms using the 4-item Negative Symptom Assessment did not generally differ among the geographic regions of practice, the professional credentialing, or their familiarity with the use of schizophrenia symptom rating instruments. Conclusion. These findings suggest that clinicians from a variety of geographic practices can, after brief training, use the 4-item Negative Symptom Assessment effectively to rapidly assess negative symptoms in patients with schizophrenia. PMID:20805916

  4. Test Item Linguistic Complexity and Assessments for Deaf Students

    ERIC Educational Resources Information Center

    Cawthon, Stephanie

    2011-01-01

    Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64…

  5. Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

    ERIC Educational Resources Information Center

    Wan, Lei; Henly, George A.

    2012-01-01

    Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

  6. Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

    ERIC Educational Resources Information Center

    Wan, Lei; Henly, George A.

    2012-01-01

    Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

  7. Alignment of Content and Effectiveness of Mathematics Assessment Items

    ERIC Educational Resources Information Center

    Kulm, Gerald; Dager Wilson, Linda; Kitchen, Richard

    2005-01-01

    Alignment has taken on increased importance given the current high-stakes nature of assessment. To make well-informed decisions about student learning on the basis of test results, assessment items need to be well aligned with standards. Project 2061 of the American Association for the Advancement of Science (AAAS) has developed a procedure for…

  8. Alignment of Content and Effectiveness of Mathematics Assessment Items

    ERIC Educational Resources Information Center

    Kulm, Gerald; Dager Wilson, Linda; Kitchen, Richard

    2005-01-01

    Alignment has taken on increased importance given the current high-stakes nature of assessment. To make well-informed decisions about student learning on the basis of test results, assessment items need to be well aligned with standards. Project 2061 of the American Association for the Advancement of Science (AAAS) has developed a procedure for…

  9. Demonstrating Local Item Dependence for Recognition and Supply Format Tests.

    ERIC Educational Resources Information Center

    Bastick, Tony

    This study tested the hypothesis that the common approach to test construction in which recognition questions (RQs), such as multiple-choice items, are followed by constructed response questions (CRQs) encourages students to use the informationally rich RQs to gain marks on the CRQs, thus introducing Local Item Dependence (LID) and inflating the…

  10. Factors Influencing the Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Su, Ya-Hui

    2004-01-01

    Eight independent variables (differential item functioning [DIF] detection method, purification procedure, item response model, mean latent trait difference between groups, test length, DIF pattern, magnitude of DIF, and percentage of DIF items) were manipulated, and two dependent variables (Type I error and power) were assessed through…

  11. Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; And Others

    This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…

  12. Disentangling Sources of Differential Item Functioning in Multilanguage Assessments.

    ERIC Educational Resources Information Center

    Ercikan, Kadriye

    2002-01-01

    Disentangled sources of differential item functioning (DIF) in a multilanguage assessment for which multiple factors were expected to be causing DIF. Data for the Third International Mathematics and Science study for four countries and two languages (3,000 to 11,000 cases in each comparison group) reveal amounts and sources of DIF. (SLD)

  13. Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; And Others

    This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…

  14. Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment

    ERIC Educational Resources Information Center

    Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas

    2015-01-01

    The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…

  15. Goodness-of-Fit Assessment of Item Response Theory Models

    ERIC Educational Resources Information Center

    Maydeu-Olivares, Alberto

    2013-01-01

    The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…

  16. A Framework for Dimensionality Assessment for Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Svetina, Dubravka; Levy, Roy

    2014-01-01

    A framework is introduced for considering dimensionality assessment procedures for multidimensional item response models. The framework characterizes procedures in terms of their confirmatory or exploratory approach, parametric or nonparametric assumptions, and applicability to dichotomous, polytomous, and missing data. Popular and emerging…

  17. A Framework for Dimensionality Assessment for Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Svetina, Dubravka; Levy, Roy

    2014-01-01

    A framework is introduced for considering dimensionality assessment procedures for multidimensional item response models. The framework characterizes procedures in terms of their confirmatory or exploratory approach, parametric or nonparametric assumptions, and applicability to dichotomous, polytomous, and missing data. Popular and emerging…

  18. TIFAID: A Test Item Format Selection Job Aid for Use by Instructional Developers.

    ERIC Educational Resources Information Center

    Llaneras, Robert E.; And Others

    1993-01-01

    Presents a job aid for determining test-item format called TIFAID (Test Item Format Job Aid), based on adequately constructed instructional objectives. The four sections of the job aid are described: (1) a task classification system; (2) task-related questions; (3) a flowchart; and (4) a tips and techniques guide. (Contains four references.) (LRW)

  19. The Importance of the Item Format with Respect to Gender Differences in Test Performance: A Study of Open-Format Items in the DTM Test.

    ERIC Educational Resources Information Center

    Wester, Anita

    1995-01-01

    The effect of different item formats (multiple choice and open) on gender differences in test performance was studied for the Swedish Diagrams, Tables, and Maps (DTM) test with 90 secondary school students. The change to open format resulted in no reduction in gender differences on the DTM. (SLD)

  20. IRT-Estimated Reliability for Tests Containing Mixed Item Formats

    ERIC Educational Resources Information Center

    Shu, Lianghua; Schwarz, Richard D.

    2014-01-01

    As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

  1. IRT-Estimated Reliability for Tests Containing Mixed Item Formats

    ERIC Educational Resources Information Center

    Shu, Lianghua; Schwarz, Richard D.

    2014-01-01

    As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

  2. The Impact of Reading Self-Efficacy and Task Value on Reading Comprehension Scores in Different Item Formats

    ERIC Educational Resources Information Center

    Solheim, Oddny Judith

    2011-01-01

    It has been hypothesized that students with low self-efficacy will struggle with complex reading tasks in assessment situations. In this study we examined whether perceived reading self-efficacy and reading task value uniquely predicted reading comprehension scores in two different item formats in a sample of fifth-grade students. Results showed…

  3. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  4. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  5. Assessing Differential Step Functioning in Polytomous Items Using a Common Odds Ratio Estimator

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2007-01-01

    Many statistics used in the assessment of differential item functioning (DIF) in polytomous items yield a single item-level index of measurement invariance that collapses information across all response options of the polytomous item. Utilizing a single item-level index of DIF can, however, be misleading if the magnitude or direction of the DIF…

  6. Relation of field independence and test-item format to student performance on written piagetian tests

    NASA Astrophysics Data System (ADS)

    Ló; Pez-Rupérez, F.; Palacios, C.; Sanchez, J.

    In this study we have investigated the relationship between the field-dependence-independence (FDI) dimension as measured by the Group Embedded Figures Test (GEFT) and subject performance on the Longeot test, a pencil-and-paper Piagetian test, through the open or closed format of its items. The sample consisted of 141 high school students. Correlation and variance analysis show that the FDI dimension and GEFT correlate significantly on only those items on the Longeot test that require formal reasoning. The effect of open- or closed-item format is found exclusively for formal items; only the open format discriminates significantly (at the 0.01 level) between the field-dependent and -independent subjects performing on this type of item. Some implications of these results for science education are discussed.

  7. The Impact of Item Format and Examinee Characteristics on Response Times

    ERIC Educational Resources Information Center

    Hess, Brian J.; Johnston, Mary M.; Lipner, Rebecca S.

    2013-01-01

    Current research on examination response time has focused on tests comprised of traditional multiple-choice items. Consequently, the impact of other innovative or complex item formats on examinee response time is not understood. The present study used multilevel growth modeling to investigate examinee characteristics associated with response time…

  8. The Impact of Item Format and Examinee Characteristics on Response Times

    ERIC Educational Resources Information Center

    Hess, Brian J.; Johnston, Mary M.; Lipner, Rebecca S.

    2013-01-01

    Current research on examination response time has focused on tests comprised of traditional multiple-choice items. Consequently, the impact of other innovative or complex item formats on examinee response time is not understood. The present study used multilevel growth modeling to investigate examinee characteristics associated with response time…

  9. Formative Assessment: Simply, No Additives

    ERIC Educational Resources Information Center

    Roskos, Kathleen; Neuman, Susan B.

    2012-01-01

    Among the types of assessment the closest to daily reading instruction is formative assessment. In contrast to summative assessment, which occurs after instruction, formative assessment involves forming judgments frequently in the flow of instruction. Key features of formative assessment include identifying gaps between where students are and…

  10. Formative Assessment: Simply, No Additives

    ERIC Educational Resources Information Center

    Roskos, Kathleen; Neuman, Susan B.

    2012-01-01

    Among the types of assessment the closest to daily reading instruction is formative assessment. In contrast to summative assessment, which occurs after instruction, formative assessment involves forming judgments frequently in the flow of instruction. Key features of formative assessment include identifying gaps between where students are and…

  11. The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding Their Impact

    ERIC Educational Resources Information Center

    Lissitz, Robert W.; Hou, Xiaodong; Slater, Sharon Cadman

    2012-01-01

    This article investigates several questions regarding the impact of different item formats on measurement characteristics. Constructed response (CR) items and multiple choice (MC) items obviously differ in their formats and in the resources needed to score them. As such, they have been the subject of considerable discussion regarding the impact of…

  12. Mathematics Strategy Use in Solving Test Items in Varied Formats

    ERIC Educational Resources Information Center

    Bonner, Sarah M.

    2013-01-01

    Although test scores from similar tests in multiple choice and constructed response formats are highly correlated, equivalence in rankings may mask differences in substantive strategy use. The author used an experimental design and participant think-alouds to explore cognitive processes in mathematical problem solving among undergraduate examinees…

  13. Rasch Based Analysis of Oral Presentation Assessment for Item Banking.

    ERIC Educational Resources Information Center

    Nakamura, Yuji

    The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful, irrespective of…

  14. Descriptive Study of High-Stakes Science Assessments: Prevalence, Content, and the Possible Effect of Incorporating Innovative Item Types

    NASA Astrophysics Data System (ADS)

    Keller, Shani Malaika

    Framed by a discussion of the heightened importance of science education in the U.S., this paper describes the prevalence, content, and format of high-stakes science assessments in the U.S. and explores the possibility that differences in assessment format may affect score gaps among student subgroups. An analysis of proficiency rates for 2010-11 high school exit exams in science was inconclusive; however, score gaps among ethnic subgroups on the 2009 grade 12 NAEP science assessment were larger for multiple choice items than for performance-based components. Further, a comparison of subgroup score gaps on the 2009 NAEP science assessment and those on the ACT science subtest suggest that the assessment with more diverse and innovative items resulted in a smaller gap in subgroup test scores. These findings point to the need for greater investigation of the extent to which item type affects subgroup score differences on science assessments.

  15. Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

    ERIC Educational Resources Information Center

    El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

    2017-01-01

    Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…

  16. Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

    ERIC Educational Resources Information Center

    El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

    2017-01-01

    Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…

  17. Primary Science Assessment Item Setters' Misconceptions Concerning the State Changes of Water

    ERIC Educational Resources Information Center

    Boo, Hong Kwen

    2006-01-01

    Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the questions setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…

  18. Missouri Assessment Program (MAP), Spring 2000: Intermediate Science, Released Items, Grade 7.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This assessment sample provides information on the Missouri Assessment Program (MAP) for grade 7 science. The sample consists of seven items taken from the test booklet and scoring guides for the seven items. The items assess heat, minerals, graphing, and plant growth. (MM)

  19. Influence of Item Direction on Student Responses in Attitude Assessment.

    ERIC Educational Resources Information Center

    Campbell, Noma Jo; Grissom, Stephen

    To investigate the effects of wording in attitude test items, a five-point Likert-type rating scale was administered to 173 undergraduate education majors. The test measured attitudes toward college and self, and contained 38 positively-worded items. Thirty-eight negatively-worded items were also written to parallel the positive statements.…

  20. Cognitive Processing Requirements of Constructed Figural Response and Multiple-Choice Items in Architecture Assessment.

    ERIC Educational Resources Information Center

    Martinez, Michael E.; Katz, Irvin R.

    Contrasts between constructed response items and stem-equivalent multiple-choice counterparts typically have involved averaging item characteristics, and this aggregation has masked differences in statistical properties at the item level. Moreover, even aggregated format differences have not been explained in terms of differential cognitive…

  1. Constructing Better Second Language Assessments Based on Differential Item Functioning Analysis

    ERIC Educational Resources Information Center

    Allalouf, Avi; Abramzon, Andrea

    2008-01-01

    Differential item functioning (DIF) analysis can be used to great advantage in second language (L2) assessments. This study examined the differences in performance on L2 test items between groups from different first language backgrounds and suggested ways of improving L2 assessments. The study examined DIF on L2 (Hebrew) test items for two…

  2. Formative Assessment: A Critical Review

    ERIC Educational Resources Information Center

    Bennett, Randy Elliot

    2011-01-01

    This paper covers six interrelated issues in formative assessment (aka, "assessment for learning"). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under-representation of measurement principles in…

  3. A Comparison of Equating/Linking Using the Stocking-Lord Method and Concurrent Calibration with Mixed-Format Tests in the Non-Equivalent Groups Common-Item Design under IRT

    ERIC Educational Resources Information Center

    Tian, Feng

    2011-01-01

    There has been a steady increase in the use of mixed-format tests, that is, tests consisting of both multiple-choice items and constructed-response items in both classroom and large-scale assessments. This calls for appropriate equating methods for such tests. As Item Response Theory (IRT) has rapidly become mainstream as the theoretical basis for…

  4. Formative Assessment Probes: Is It a Rock? Continuous Formative Assessment

    ERIC Educational Resources Information Center

    Keeley, Page

    2013-01-01

    A lesson plan is provided for a formative assessment probe entitled "Is It a Rock?" This probe is designed for teaching elementary school students about rocks through the use of a formative assessment classroom technique (FACT) known as the group Frayer Model. FACT activates students' thinking about a concept and can be used to…

  5. Formative Assessment Probes: Is It a Rock? Continuous Formative Assessment

    ERIC Educational Resources Information Center

    Keeley, Page

    2013-01-01

    A lesson plan is provided for a formative assessment probe entitled "Is It a Rock?" This probe is designed for teaching elementary school students about rocks through the use of a formative assessment classroom technique (FACT) known as the group Frayer Model. FACT activates students' thinking about a concept and can be used to…

  6. Item generation and design testing of a questionnaire to assess degenerative joint disease-associated pain in cats.

    PubMed

    Zamprogno, Helia; Hansen, Bernie D; Bondell, Howard D; Sumrell, Andrea Thomson; Simpson, Wendy; Robertson, Ian D; Brown, James; Pease, Anthony P; Roe, Simon C; Hardie, Elizabeth M; Wheeler, Simon J; Lascelles, B Duncan X

    2010-12-01

    To determine the items (question topics) for a subjective instrument to assess degenerative joint disease (DJD)-associated chronic pain in cats and determine the instrument design most appropriate for use by cat owners. 100 randomly selected client-owned cats from 6 months to 20 years old. Cats were evaluated to determine degree of radiographic DJD and signs of pain throughout the skeletal system. Two groups were identified: high DJD pain and low DJD pain. Owner-answered questions about activity and signs of pain were compared between the 2 groups to define items relating to chronic DJD pain. Interviews with 45 cat owners were performed to generate items. Fifty-three cat owners who had not been involved in any other part of the study, 19 veterinarians, and 2 statisticians assessed 6 preliminary instrument designs. 22 cats were selected for each group; 19 important items were identified, resulting in 12 potential items for the instrument; and 3 additional items were identified from owner interviews. Owners and veterinarians selected a 5-point descriptive instrument design over 11-point or visual analogue scale formats. Behaviors relating to activity were substantially different between healthy cats and cats with signs of DJD-associated pain. Fifteen items were identified as being potentially useful, and the preferred instrument design was identified. This information could be used to construct an owner-based questionnaire to assess feline DJD-associated pain. Once validated, such a questionnaire would assist in evaluating potential analgesic treatments for these patients.

  7. Adaptive testing for psychological assessment: how many items are enough to run an adaptive testing algorithm?

    PubMed

    Wagner-Menghin, Michaela M; Masters, Geoff N

    2013-01-01

    Although the principles of adaptive testing were established in the psychometric literature many years ago (e.g., Weiss, 1977), and practice of adaptive testing is established in educational assessment, it not yet widespread in psychological assessment. One obstacle to adaptive psychological testing is a lack of clarity about the necessary number of items to run an adaptive algorithm. The study explores the relationship between item bank size, test length and measurement precision. Simulated adaptive test runs (allowing a maximum of 30 items per person) out of an item bank with 10 items per ability level (covering .5 logits, 150 items total) yield a standard error of measurement (SEM) of .47 (.39) after an average of 20 (29) items for 85-93% (64-82%) of the simulated rectangular sample. Expanding the bank to 20 items per level (300 items total) did not improve the algorithm's performance significantly. With a small item bank (5 items per ability level, 75 items total) it is possible to reach the same SEM as with a conventional test, but with fewer items or a better SEM with the same number of items.

  8. Assessing the Efficiency of Item Selection in Computerized Adaptive Testing.

    ERIC Educational Resources Information Center

    Weissman, Alexander

    This study investigated the efficiency of item selection in a computerized adaptive test (CAT), where efficiency was defined in terms of the accumulated test information at an examinee's true ability level. A simulation methodology compared the efficiency of 2 item selection procedures with 5 ability estimation procedures for CATs of 5, 10, 15,…

  9. A Comparative Analysis of Several Methods of Assessing Item Bias.

    ERIC Educational Resources Information Center

    Ironson, Gail H.

    Four statistical methods for identifying biased test items were used with data from two ethnic groups (1,691 black and 1,794 white high school seniors). The data were responses to 150 items in five subtests including two traditional tests (reading and mathematics) and three nontraditional tests (picture number test of associative memory, letter…

  10. Reading Grade Levels and Mathematics Assessment: An Analysis of Texas Mathematics Assessment Items and Their Reading Difficulty

    ERIC Educational Resources Information Center

    Lamb, John H.

    2010-01-01

    Increased reading difficulty of mathematics assessment items has been shown to negatively affect student performance. The advent of high-stakes testing, which has serious ramifications for students' futures and teachers' careers, necessitates analysis of reading difficulty on state assessment items and student performance on those items. Using…

  11. Developing a Taxonomy of Item Model Types to Promote Assessment Engineering

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Zhou, Jiawen; Alves, Cecila

    2008-01-01

    An item model serves as an explicit representation of the variables in an assessment task. An item model includes the "stem", "options", and "auxiliary information". The "stem" is the part of an item which formulates context, content, and/or the question the examinee is required to answer. The "options" contain the alternative answers with one…

  12. Analysing Item Position Effects due to Test Booklet Design within Large-Scale Assessment

    ERIC Educational Resources Information Center

    Hohensinn, Christine; Kubinger, Klaus D.; Reif, Manuel; Schleicher, Eva; Khorramdel, Lale

    2011-01-01

    For large-scale assessments, usually booklet designs administering the same item at different positions within a booklet are used. Therefore, the occurrence of position effects influencing the difficulty of the item is a crucial issue. Not taking learning or fatigue effects into account would result in a bias of estimated item difficulty. The…

  13. Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…

  14. Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…

  15. Modified Multiple-Choice Items for Alternate Assessments: Reliability, Difficulty, and Differential Boost

    ERIC Educational Resources Information Center

    Kettler, Ryan J.; Rodriguez, Michael C.; Bolt, Daniel M.; Elliott, Stephen N.; Beddow, Peter A.; Kurz, Alexander

    2011-01-01

    Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce…

  16. Calibration of an Item Bank for the Assessment of Basque Language Knowledge

    ERIC Educational Resources Information Center

    Lopez-Cuadrado, Javier; Perez, Tomas A.; Vadillo, Jose A.; Gutierrez, Julian

    2010-01-01

    The main requisite for a functional computerized adaptive testing system is the need of a calibrated item bank. This text presents the tasks carried out during the calibration of an item bank for assessing knowledge of Basque language. It has been done in terms of the 3-parameter logistic model provided by the item response theory. Besides, this…

  17. Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Wang, Ning; Lane, Suzanne

    This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…

  18. Using polytomous item response models to assess death anxiety.

    PubMed

    Gómez, Juana; Hidalgo, M Dolores; Tomás-Sábado, Joaquín

    2007-01-01

    : The study of human attitudes toward death has given rise to a substantial body of empirical research. Psychometric instruments have been developed to measure fear of death, or death anxiety, and its psychological consequences in people who continually come into contact with stimuli related to mortality. : To analyze the 20-item Death Anxiety Inventory (DAI) within the framework of item response theory (IRT) and using the generalized partial credit model. : The sample comprised 154 men and 550 women and was drawn from nurses, doctors, industrial workers, teachers, undergraduates, and retired persons. Subjects completed the DAI, a self-administered, Likert-type questionnaire of 20 items, each with six response options. : The DAI showed a relatively adequate fit to the generalized partial credit model. Thus, 4 of the 20 items presented a poor fit to the model. The analysis of item information and test information functions revealed that the 20-item test was appropriate for differentiating subjects with medium or high levels of death anxiety. The test information function was higher in this range of scores, indicating greater precision in the estimate of death anxiety for these subjects. : The generalized partial credit model can be used to obtain detailed information about a clinical test and its items, and there are advantages to this approach when working with polytomous tests.

  19. The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin; Sun, Guo-Wei

    2012-01-01

    The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items…

  20. The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin; Sun, Guo-Wei

    2012-01-01

    The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items…

  1. Development of the Assessment Items of Debris Flow Using the Delphi Method

    NASA Astrophysics Data System (ADS)

    Byun, Yosep; Seong, Joohyun; Kim, Mingi; Park, Kyunghan; Yoon, Hyungkoo

    2016-04-01

    In recent years in Korea, Typhoon and the localized extreme rainfall caused by the abnormal climate has increased. Accordingly, debris flow is becoming one of the most dangerous natural disaster. This study aimed to develop the assessment items which can be used for conducting damage investigation of debris flow. Delphi method was applied to classify the realms of assessment items. As a result, 29 assessment items which can be classified into 6 groups were determined.

  2. Formative Assessment as Mediation

    ERIC Educational Resources Information Center

    De Vos, Mark; Belluigi, Dina Zoe

    2011-01-01

    Whilst principles of validity, reliability and fairness should be central concerns for the assessment of student learning in higher education, simplistic notions of "transparency" and "explicitness" in terms of assessment criteria should be critiqued more rigorously. This article examines the inherent tensions resulting from CRA's links to both…

  3. Formative Assessment: A Cybernetic Viewpoint

    ERIC Educational Resources Information Center

    Roos, Bertil; Hamilton, David

    2005-01-01

    This paper considers alternative assessment, feedback and cybernetics. For more than 30 years, debates about the bi-polarity of formative and summative assessment have served as surrogates for discussions about the workings of the mind, the social implications of assessment and, as important, the role of instruction in the advancement of learning.…

  4. Assessment of health-related quality of life in arthritis: conceptualization and development of five item banks using item response theory

    PubMed Central

    Kopec, Jacek A; Sayre, Eric C; Davis, Aileen M; Badley, Elizabeth M; Abrahamowicz, Michal; Sherlock, Lesley; Williams, J Ivan; Anis, Aslam H; Esdaile, John M

    2006-01-01

    Background Modern psychometric methods based on item response theory (IRT) can be used to develop adaptive measures of health-related quality of life (HRQL). Adaptive assessment requires an item bank for each domain of HRQL. The purpose of this study was to develop item banks for five domains of HRQL relevant to arthritis. Methods About 1,400 items were drawn from published questionnaires or developed from focus groups and individual interviews and classified into 19 domains of HRQL. We selected the following 5 domains relevant to arthritis and related conditions: Daily Activities, Walking, Handling Objects, Pain or Discomfort, and Feelings. Based on conceptual criteria and pilot testing, 219 items were selected for further testing. A questionnaire was mailed to patients from two hospital-based clinics and a stratified random community sample. Dimensionality of the domains was assessed through factor analysis. Items were analyzed with the Generalized Partial Credit Model as implemented in Parscale. We used graphical methods and a chi-square test to assess item fit. Differential item functioning was investigated using logistic regression. Results Data were obtained from 888 individuals with arthritis. The five domains were sufficiently unidimensional for an IRT-based analysis. Thirty-one items were deleted due to lack of fit or differential item functioning. Daily Activities had the narrowest range for the item location parameter (-2.24 to 0.55) and Handling Objects had the widest range (-1.70 to 2.27). The mean (median) slope parameter for the items ranged from 1.15 (1.07) in Feelings to 1.73 (1.75) in Walking. The final item banks are comprised of 31–45 items each. Conclusion We have developed IRT-based item banks to measure HRQL in 5 domains relevant to arthritis. The items in the final item banks provide adequate psychometric information for a wide range of functional levels in each domain. PMID:16749932

  5. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

    ERIC Educational Resources Information Center

    Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

    2017-01-01

    We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…

  6. Accessible or Not?: Academics' Hand Written Comments on Assessment Items Written by Students with Print Disabilities

    ERIC Educational Resources Information Center

    Harpur, Paul

    2010-01-01

    Most university courses involve students sitting examinations and submitting written research papers. Many universities provide each student with individual comments on their assessment items. Generally these comments are written throughout the assessment item by the marker to provide the student with guidance on where they can improve and what…

  7. Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

    ERIC Educational Resources Information Center

    Li, Xiaomin; Wang, Wen-Chung

    2015-01-01

    The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…

  8. Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

    ERIC Educational Resources Information Center

    Li, Xiaomin; Wang, Wen-Chung

    2015-01-01

    The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…

  9. Development and community-based validation of eight item banks to assess mental health.

    PubMed

    Batterham, Philip J; Sunderland, Matthew; Carragher, Natacha; Calear, Alison L

    2016-09-30

    There is a need for precise but brief screening of mental health problems in a range of settings. The development of item banks to assess depression and anxiety has resulted in new adaptive and static screeners that accurately assess severity of symptoms. However, expansion to a wider array of mental health problems is required. The current study developed item banks for eight mental health problems: social anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder, adult attention-deficit hyperactivity disorder, drug use, psychosis and suicidality. The item banks were calibrated in a population-based Australian adult sample (N=3175) by administering large item pools (45-75 items) and excluding items on the basis of local dependence or measurement non-invariance. Item Response Theory parameters were estimated for each item bank using a two-parameter graded response model. Each bank consisted of 19-47 items, demonstrating excellent fit and precision across a range of -1 to 3 standard deviations from the mean. No previous study has developed such a broad range of mental health item banks. The calibrated item banks will form the basis of a new system of static and adaptive measures to screen for a broad array of mental health problems in the community.

  10. Posterior Predictive Assessment of Item Response Theory Models

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Johnson, Matthew S.; Stern, Hal S.

    2006-01-01

    Model checking in item response theory (IRT) is an underdeveloped area. There is no universally accepted tool for checking IRT models. The posterior predictive model-checking method is a popular Bayesian model-checking tool because it has intuitive appeal, is simple to apply, has a strong theoretical basis, and can provide graphical or numerical…

  11. Assessing Personality Traits through Response Latencies Using Item Response Theory

    ERIC Educational Resources Information Center

    Ranger, Jochen; Ortner, Tuulia M.

    2011-01-01

    Recent studies have revealed a relation between the given response and the response latency for personality questionnaire items in the form of an inverted-U effect, which has been interpreted in light of schema-driven behavior. In general, more probable responses are given faster. In the present study, the relationship between the probability of…

  12. Formative Assessment in Dance Education

    ERIC Educational Resources Information Center

    Andrade, Heidi; Lui, Angela; Palma, Maria; Hefferen, Joanna

    2015-01-01

    Feedback is crucial to students' growth as dancers. When used within the framework of formative assessment, or assessment for learning, feedback results in actionable next steps that dancers can use to improve their performances. This article showcases the work of two dance specialists, one elementary and one middle school teacher, who have…

  13. Formative Assessment in Dance Education

    ERIC Educational Resources Information Center

    Andrade, Heidi; Lui, Angela; Palma, Maria; Hefferen, Joanna

    2015-01-01

    Feedback is crucial to students' growth as dancers. When used within the framework of formative assessment, or assessment for learning, feedback results in actionable next steps that dancers can use to improve their performances. This article showcases the work of two dance specialists, one elementary and one middle school teacher, who have…

  14. Development of an item list to assess the forgotten joint concept in shoulder patients.

    PubMed

    Giesinger, Johannes M; Kesterke, Nicolas; Hamilton, David F; Holzner, Bernhard; Jost, Bernhard; Giesinger, Karlmeinrad

    2015-03-24

    To generate an item list for the assessment of joint awareness in shoulder patients and to collect patient feedback on the comprehensibility of the items and the forgotten joint concept. Item content was generated on the basis of literature search and expert ratings following a stepwise refinement procedure, including final evaluation by an international expert board (n = 12) including members with various professional backgrounds. Items were translated from English to German and evaluated in 30 German-speaking shoulder patients in Switzerland and 30 shoulder patients in the UK. Literature search identified 45 questionnaires covering 805 issues potentially relevant for the assessment of joint awareness. Stepwise item selection resulted in 97 items to be evaluated by the international expert board leaving 70 items for collecting patient feedback. The majority of patients indicated that the introductory text explaining the forgotten joint concept was easy or very easy to understand (79.3%) and that the items were clear (91.4%). We developed a list of 70 questions for the assessment of joint awareness in shoulder patients and obtained positive patient feedback for these. In a next step, we will administer the items to a large international patient sample to obtain data for psychometric analysis and development of a measurement model, which is the basis for creation of computer-adaptive assessments or static short-forms.

  15. Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

    ERIC Educational Resources Information Center

    Kim, Sooyeon; Moses, Tim

    2013-01-01

    The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…

  16. Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

    ERIC Educational Resources Information Center

    Kim, Sooyeon; Moses, Tim

    2013-01-01

    The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…

  17. Assessment of perfluoroalkyl substances in food items at global scale.

    PubMed

    Pérez, Francisca; Llorca, Marta; Köck-Schulmeyer, Marianne; Škrbić, Biljana; Oliveira, Luis Silva; da Boit Martinello, Kátia; Al-Dhabi, Naif A; Antić, Igor; Farré, Marinella; Barceló, Damià

    2014-11-01

    This study assessed the levels of 21 perfluoroalkyl substances (PFASs) in 283 food items (38 from Brazil, 35 from Saudi Arabia, 174 from Spain and 36 from Serbia) among the most widely consumed foodstuffs in these geographical areas. These countries were chosen as representatives of the diet in South America, Western Asia, Mediterranean countries and South-Eastern Europe. The analysis of foodstuffs was carried out by turbulent flow chromatography (TFC) combined with liquid chromatography with triple quadrupole mass spectrometry (LC-QqQ-MS) using electrospray ionization (ESI) in negative mode. The analytical method was validated for the analysis of different foodstuff classes (cereals, fish, fruit, milk, ready-to-eat foods, oil and meat). The analytical parameters of the method fulfill the requirements specified in the Commission Recommendation 2010/161/EU. Recovery rates were in the range between 70% and 120%. For all the selected matrices, the method limits of detection (MLOD) and the method limits of quantification (MLOQ) were in the range of 5 to 650 pg/g and 17 to 2000 pg/g, respectively. In general trends, the concentrations of PFASs were in the pg/g or pg/mL levels. The more frequently detected compounds were perfluorooctane sulfonic acid (PFOS), perfluorooctanoic acid (PFOA) and perfluorobutanoic acid (PFBA). The prevalence of the eight-carbon chain compounds in biota indicates the high stability and bioaccumulation potential of these compounds. But, at the same time, the high frequency of the shorter chain compounds is also an indication of the use of replacement compounds in the new fluorinated materials. When comparing the compounds profile and their relative abundances in the samples from diverse origin, differences were identified. However, in absolute amounts of total PFASs no large differences were found between the studied countries. Fish and seafood were identified as the major PFASs contributors to the diet in all the countries. The total sum of

  18. Assessment of the Item Selection and Weighting in the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis

    PubMed Central

    MAHR, ALFRED D.; NEOGI, TUHINA; LAVALLEY, MICHAEL P.; DAVIS, JOHN C.; HOFFMAN, GARY S.; MCCUNE, W. JOSEPH; SPECKS, ULRICH; SPIERA, ROBERT F.; ST.CLAIR, E. WILLIAM; STONE, JOHN H.; MERKEL, PETER A.

    2013-01-01

    Objective To assess the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis (BVAS/WG) with respect to its selection and weighting of items. Methods This study used the BVAS/WG data from the Wegener's Granulomatosis Etanercept Trial. The scoring frequencies of the 34 predefined items and any “other” items added by clinicians were calculated. Using linear regression with generalized estimating equations in which the physician global assessment (PGA) of disease activity was the dependent variable, we computed weights for all predefined items. We also created variables for clinical manifestations frequently added as other items, and computed weights for these as well. We searched for the model that included the items and their generated weights yielding an activity score with the highest R2 to predict the PGA. Results We analyzed 2,044 BVAS/WG assessments from 180 patients; 734 assessments were scored during active disease. The highest R2 with the PGA was obtained by scoring WG activity based on the following items: the 25 predefined items rated on ≥5 visits, the 2 newly created fatigue and weight loss variables, the remaining minor other and major other items, and a variable that signified whether new or worse items were present at a specific visit. The weights assigned to the items ranged from 1 to 21. Compared with the original BVAS/WG, this modified score correlated significantly more strongly with the PGA. Conclusion This study suggests possibilities to enhance the item selection and weighting of the BVAS/WG. These changes may increase this instrument's ability to capture the continuum of disease activity in WG. PMID:18512722

  19. Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Albano, Anthony D.

    2015-01-01

    This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…

  20. Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

    ERIC Educational Resources Information Center

    Min, Shangchao; He, Lianzhen

    2014-01-01

    This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the testlet-based…

  1. A Multidimensional Scaling Approach to Dimensionality Assessment for Measurement Instruments Modeled by Multidimensional Item Response Theory

    ERIC Educational Resources Information Center

    Toro, Maritsa

    2011-01-01

    The statistical assessment of dimensionality provides evidence of the underlying constructs measured by a survey or test instrument. This study focuses on educational measurement, specifically tests comprised of items described as multidimensional. That is, items that require examinee proficiency in multiple content areas and/or multiple cognitive…

  2. An Investigation of Alternative Methods for Item Mapping in the National Assessment of Educational Progress.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Senturk, Deniz; Wang, Joyce; Loomis, Susan Cooper

    2001-01-01

    Compared four mapping item methods using data from the physical science test of the National Assessment of Educational Progress and studied the opinions of science content area experts about the difficulty of the items through a survey completed by 148 science teachers or scientists. Results of model-based mapping methods were more concordant with…

  3. Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

    ERIC Educational Resources Information Center

    Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

    2012-01-01

    We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…

  4. A Multidimensional Scaling Approach to Dimensionality Assessment for Measurement Instruments Modeled by Multidimensional Item Response Theory

    ERIC Educational Resources Information Center

    Toro, Maritsa

    2011-01-01

    The statistical assessment of dimensionality provides evidence of the underlying constructs measured by a survey or test instrument. This study focuses on educational measurement, specifically tests comprised of items described as multidimensional. That is, items that require examinee proficiency in multiple content areas and/or multiple cognitive…

  5. Assessing the Validity of a Single-Item HIV Risk Stage-of-Change Measure

    ERIC Educational Resources Information Center

    Napper, Lucy E.; Branson, Catherine M.; Fisher, Dennis G.; Reynolds, Grace L.; Wood, Michelle M.

    2008-01-01

    This study examined the validity of a single-item measure of HIV risk stage of change that HIV prevention contractors were required to collect by the California State Office of AIDS. The single-item measure was compared to the more conventional University of Rhode Island Change Assessment (URICA). Participants were members of Los Angeles…

  6. An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments

    ERIC Educational Resources Information Center

    Dimitrov, Dimiter M.

    2016-01-01

    This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…

  7. Using Kernel Equating to Assess Item Order Effects on Test Scores

    ERIC Educational Resources Information Center

    Moses, Tim; Yang, Wen-Ling; Wilson, Christine

    2007-01-01

    This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test…

  8. Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

    ERIC Educational Resources Information Center

    Min, Shangchao; He, Lianzhen

    2014-01-01

    This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the testlet-based…

  9. Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Albano, Anthony D.

    2015-01-01

    This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…

  10. An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments

    ERIC Educational Resources Information Center

    Dimitrov, Dimiter M.

    2016-01-01

    This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…

  11. International Assessment: A Rasch Model and Teachers' Evaluation of TIMSS Science Achievement Items

    ERIC Educational Resources Information Center

    Glynn, Shawn M.

    2012-01-01

    The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform…

  12. A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

    ERIC Educational Resources Information Center

    Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

    2011-01-01

    The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…

  13. Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

    ERIC Educational Resources Information Center

    Alsadaawi, Abdullah Saleh

    2017-01-01

    The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…

  14. Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

    ERIC Educational Resources Information Center

    Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

    2012-01-01

    We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…

  15. An instrument to assess quality of life in relation to nutrition: item generation, item reduction and initial validation

    PubMed Central

    2010-01-01

    Background It is arguable that modification of diet, given its potential for positive health outcomes, should be widely advocated and adopted. However, food intake, as a basic human need, and its modification may be accompanied by sensations of both pleasure and despondency and may consequently affect to quality of life (QoL). Thus, the feasibility and success of dietary changes will depend, at least partly, on whether potential negative influences on QoL can be avoided. This is of particular importance in the context of dietary intervention studies and in the development of new food products to improve health and well being. Instruments to measure the impact of nutrition on quality of life in the general population, however, are few and far between. Therefore, the aim of this project was to develop an instrument for measuring QoL related to nutrition in the general population. Methods and results We recruited participants from the general population and followed standard methodology for quality of life instrument development (identification of population, item selection, n = 24; item reduction, n = 81; item presentation, n = 12; pretesting of questionnaire and initial validation, n = 2576; construct validation n = 128; and test-retest reliability n = 20). Of 187 initial items, 29 were selected for final presentation. Factor analysis revealed an instrument with 5 domains. The instrument demonstrated good cross-sectional divergent and convergent construct validity when correlated with scores of the 8 domains of the SF-36 (ranging from -0.078 to 0.562, 19 out of 40 tested correlations were statistically significant and 24 correlations were predicted correctly) and good test-retest reliability (intra-class correlation coefficients from 0.71 for symptoms to 0.90). Conclusions We developed and validated an instrument with 29 items across 5 domains to assess quality of life related to nutrition and other aspects of food intake. The instrument demonstrated good face and

  16. Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

    NASA Astrophysics Data System (ADS)

    Greenberg, Ariela Caren

    Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

  17. A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Thurman, Carol

    2009-01-01

    The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…

  18. Formative Assessment in Primary Science

    ERIC Educational Resources Information Center

    Loughland, Tony; Kilpatrick, Laetitia

    2015-01-01

    This action learning study in a year three classroom explored the implementation of five formative assessment principles to assist students' understandings of the scientific topic of liquids and solids. These principles were employed to give students a greater opportunity to express their understanding of the concepts. The study found that the…

  19. Formative Assessment in Primary Science

    ERIC Educational Resources Information Center

    Loughland, Tony; Kilpatrick, Laetitia

    2015-01-01

    This action learning study in a year three classroom explored the implementation of five formative assessment principles to assist students' understandings of the scientific topic of liquids and solids. These principles were employed to give students a greater opportunity to express their understanding of the concepts. The study found that the…

  20. A Classification Matrix of Examination Items to Promote Transformative Assessment

    ERIC Educational Resources Information Center

    McMahon, Mark; Garrett, Michael

    2016-01-01

    The ability to assess learning hinges on the quality of the instruments that are used. This paper reports on the first stage of the design of software to assist educators in ensuring assessment questions meet educational outcomes. A review of the literature within the field of instructional psychology was undertaken with a view towards…

  1. Dimensionality Assessment of Ordered Polytomous Items with Parallel Analysis

    ERIC Educational Resources Information Center

    Timmerman, Marieke E.; Lorenzo-Seva, Urbano

    2011-01-01

    Parallel analysis (PA) is an often-recommended approach for assessment of the dimensionality of a variable set. PA is known in different variants, which may yield different dimensionality indications. In this article, the authors considered the most appropriate PA procedure to assess the number of common factors underlying ordered polytomously…

  2. A Classification Matrix of Examination Items to Promote Transformative Assessment

    ERIC Educational Resources Information Center

    McMahon, Mark; Garrett, Michael

    2016-01-01

    The ability to assess learning hinges on the quality of the instruments that are used. This paper reports on the first stage of the design of software to assist educators in ensuring assessment questions meet educational outcomes. A review of the literature within the field of instructional psychology was undertaken with a view towards…

  3. Dimensionality Assessment of Ordered Polytomous Items with Parallel Analysis

    ERIC Educational Resources Information Center

    Timmerman, Marieke E.; Lorenzo-Seva, Urbano

    2011-01-01

    Parallel analysis (PA) is an often-recommended approach for assessment of the dimensionality of a variable set. PA is known in different variants, which may yield different dimensionality indications. In this article, the authors considered the most appropriate PA procedure to assess the number of common factors underlying ordered polytomously…

  4. A Q3 Statistic for Unfolding Item Response Theory Models: Assessment of Unidimensionality with Two Factors and Simple Structure

    ERIC Educational Resources Information Center

    Habing, Brian; Finch, Holmes; Roberts, James S.

    2005-01-01

    Although there are many methods available for dimensionality assessment for items with monotone item response functions, there are few methods available for unfolding item response theory models. In this study, a modification of Yen's Q3 statistic is proposed for the case of these nonmonotone item response models. Through a simulation study, the…

  5. A New Method for Assessing the Statistical Significance in the Differential Functioning of Items and Tests (DFIT) Framework

    ERIC Educational Resources Information Center

    Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.

    2006-01-01

    A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…

  6. A Q3 Statistic for Unfolding Item Response Theory Models: Assessment of Unidimensionality with Two Factors and Simple Structure

    ERIC Educational Resources Information Center

    Habing, Brian; Finch, Holmes; Roberts, James S.

    2005-01-01

    Although there are many methods available for dimensionality assessment for items with monotone item response functions, there are few methods available for unfolding item response theory models. In this study, a modification of Yen's Q3 statistic is proposed for the case of these nonmonotone item response models. Through a simulation study, the…

  7. A New Method for Assessing the Statistical Significance in the Differential Functioning of Items and Tests (DFIT) Framework

    ERIC Educational Resources Information Center

    Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.

    2006-01-01

    A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…

  8. Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

    PubMed

    LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

    2015-04-01

    Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Teacher Learning of Technology Enhanced Formative Assessment

    NASA Astrophysics Data System (ADS)

    Feldman, Allan; Capobianco, Brenda M.

    2008-02-01

    This study examined the integration of technology enhanced formative assessment (FA) into teachers' practice. Participants were high school physics teachers interested in improving their use of a classroom response system (CRS) to promote FA. Data were collected using interviews, direct classroom observations, and collaborative discussions. The physics teachers engaged in collaborative action research (AR) to learn how to use FA and CRS to promote student and teacher learning. Data were analyzed using open coding, cross-case analysis, and content analysis. Results from data analysis allowed researchers to construct a model for knowledge skills necessary for the integration of technology enhanced FA into teachers' practice. The model is as a set of four technologies: hardware and software; methods for constructing FA items; pedagogical methods; and curriculum integration. The model is grounded in the idea that teachers must develop these respective technologies as they interact with the CRS (i.e., hardware and software, item construction) and their existing practice (i.e., pedagogical methods, curriculum). Implications are that for teachers to make FA an integral part of their practice using CRS, they must: 1) engage in the four technologies; 2) understand the nature of FA; and 3) collaborate with other interested teachers through AR.

  10. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    PubMed Central

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2016-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568

  11. Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

    PubMed

    Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

    2016-04-07

    This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments. © The Author(s) 2016.

  12. Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

    ERIC Educational Resources Information Center

    Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

    2012-01-01

    Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…

  13. Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

    ERIC Educational Resources Information Center

    Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

    2012-01-01

    Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…

  14. Issues in Grouping Items from the Neonatal Behavioral Assessment Scale.

    ERIC Educational Resources Information Center

    Sameroff, Arnold J.; And Others

    1978-01-01

    Discusses the structure, reliability, stability, validity and usefulness of the Brazelton Neonatal Behavioral Assessment Scale (NBAS) and the results of factor and regression analyses of data collected using the NBAS. (Author/BH)

  15. Issues in Grouping Items from the Neonatal Behavioral Assessment Scale.

    ERIC Educational Resources Information Center

    Sameroff, Arnold J.; And Others

    1978-01-01

    Discusses the structure, reliability, stability, validity and usefulness of the Brazelton Neonatal Behavioral Assessment Scale (NBAS) and the results of factor and regression analyses of data collected using the NBAS. (Author/BH)

  16. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

    PubMed Central

    Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

    2017-01-01

    We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182

  17. Comparing concurrent versus fixed parameter equating with common items: using the dichotomous and partial credit models in a mixed-item format test.

    PubMed

    Taherbhai, Husein M; Seo, Daer Yong

    2007-01-01

    There has been some discussion among researchers as to the benefits of using one calibration process over the other during equating. Although literature is rife with the pros and cons of the different methods, hardly any research has been done on anchoring (i.e., fixing item parameters to their pre-determined values on an established scale) as a method that is commonly used by psychometricians in large-scale assessments. This simulation research compares the fixed form of calibration with the concurrent method (where calibration of the different forms on the same scale is accomplished by a single run of the calibration process, treating all non-included items on the forms as missing or not reached), using the dichotomous Rasch (Rasch, 1960) and the Rasch partial credit (Masters, 1982) models, and the WINSTEPS (Linacre, 2003) computer program. Contrary to the belief and some researchers' contention that the concurrent run with larger n-counts for the common items would provide greater accuracy in the estimation of item parameters, the results of this paper indicate that the greater accuracy of one method over the other is confounded by the sample-size, the number of common items, etc., and there is no real benefit in using one method over the other in the calibration and equating of parallel tests forms.

  18. Development and Calibration of an Item Bank for PE Metrics Assessments: Standard 1

    ERIC Educational Resources Information Center

    Zhu, Weimo; Fox, Connie; Park, Youngsik; Fisette, Jennifer L.; Dyson, Ben; Graber, Kim C.; Avery, Marybell; Franck, Marian; Placek, Judith H.; Rink, Judy; Raynes, De

    2011-01-01

    The purpose of this study was to develop and calibrate an assessment system, or bank, using the latest measurement theories and methods to promote valid and reliable student assessment in physical education. Using an anchor-test equating design, a total of 30 items or assessments were administered to 5,021 (2,568 boys and 2,453 girls) students in…

  19. Development and Calibration of an Item Bank for PE Metrics Assessments: Standard 1

    ERIC Educational Resources Information Center

    Zhu, Weimo; Fox, Connie; Park, Youngsik; Fisette, Jennifer L.; Dyson, Ben; Graber, Kim C.; Avery, Marybell; Franck, Marian; Placek, Judith H.; Rink, Judy; Raynes, De

    2011-01-01

    The purpose of this study was to develop and calibrate an assessment system, or bank, using the latest measurement theories and methods to promote valid and reliable student assessment in physical education. Using an anchor-test equating design, a total of 30 items or assessments were administered to 5,021 (2,568 boys and 2,453 girls) students in…

  20. An Examination of Differential Item Functioning on the Vanderbilt Assessment of Leadership in Education

    ERIC Educational Resources Information Center

    Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph

    2009-01-01

    The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…

  1. Identifying Items to Assess Methodological Quality in Physical Therapy Trials: A Factor Analysis

    PubMed Central

    Cummings, Greta G.; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd

    2014-01-01

    Background Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. Objective The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). Design A methodological research design was used, and an EFA was performed. Methods Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Results Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Limitation Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. Conclusions To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor

  2. Assessing the validity of single-item life satisfaction measures: results from three large samples.

    PubMed

    Cheung, Felix; Lucas, Richard E

    2014-12-01

    The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS)-a more psychometrically established measure. Two large samples from Washington (N = 13,064) and Oregon (N = 2,277) recruited by the Behavioral Risk Factor Surveillance System and a representative German sample (N = 1,312) recruited by the Germany Socio-Economic Panel were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62-0.64; disattenuated r = 0.78-0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001-0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS was very small (average absolute difference = 0.015-0.042). Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use.

  3. Assessing the Validity of Single-item Life Satisfaction Measures: Results from Three Large Samples

    PubMed Central

    Cheung, Felix; Lucas, Richard E.

    2014-01-01

    Purpose The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS) - a more psychometrically established measure. Methods Two large samples from Washington (N=13,064) and Oregon (N=2,277) recruited by the Behavioral Risk Factor Surveillance System (BRFSS) and a representative German sample (N=1,312) recruited by the Germany Socio-Economic Panel (GSOEP) were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Results Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62 – 0.64; disattenuated r = 0.78 – 0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001 – 0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS were very small (average absolute difference = 0.015 −0.042). Conclusions Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use. PMID:24890827

  4. Identifying items to assess methodological quality in physical therapy trials: a factor analysis.

    PubMed

    Armijo-Olivo, Susan; Cummings, Greta G; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd

    2014-09-01

    Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). A methodological research design was used, and an EFA was performed. Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor analysis of these results are needed to validate these items.

  5. Development of a 32-item scale to assess postoperative dysfunction after upper gastrointestinal cancer resection.

    PubMed

    Nakamura, Misuzu; Kido, Yoshihiro; Egawa, Takako

    2008-06-01

    The purpose of this study was to develop a 32-item scale to assess postoperative dysfunction in patients who underwent surgery for gastric and oesophageal cancer and to evaluate its reliability and validity. For the objective assessment of postoperative dysfunction in patients with upper gastointestinal cancer, we performed a preliminary survey by mail using a 34-item questionnaire as a initial version. The results of the survey were assessed by item analysis of the scale. The scale items were further refined by researchers and specialists, and a 32-item scale for the assessment of postoperative dysfunction (initial scale) was developed. Using this 32-item scale (initial scale), a mail survey was performed of 379 subjects selected by random sampling. The questionnaire was returned by 292 patients (77.1%) and 283 responses (74.7%) were valid. Of these, 221 respondents had gastric cancer and 62 oesophageal cancer. The mean age of respondents was 64.9 SD 9.8 (range 35-89) years. The mean total score of the 32-items on the initial version for the assessment of postoperative dysfunction was 60.8 SD 16.7. The mean total score for gastric cancer patients and oesophageal cancer patients was 58.1 SD 15.8 and 70.1 SD 16.7 respectively. After the elimination of scale items regarded as irrelevant based on statistical considerations and the judgement of experts, factor analysis was performed. Seven factors were valid: 'regurgitation reflux', 'limited activity because of decreased food consumption', 'passage dysfunction immediately after eating', 'dumping-like symptoms', 'transfer dysfunction', 'hypoglycaemic symptoms' and 'diarrhoea-like symptoms'. The cumulative proportion of variance by scale reliability was confirmed by a Cronbach's alpha-coefficient of 0.926. The Cronbach's alpha-coefficient for all 32 items on the initial version was 0.926, the Cronbach's alpha-coefficient for sub-items was 0.705-0.856, and Pearson's correlation coefficient of re-test for the total score

  6. Examination of useful items for the assessment of fall risk in the community-dwelling elderly Japanese population.

    PubMed

    Demura, Shinichi; Sato, Susumu; Yokoya, Tomohisa; Sato, Toshiro

    2010-05-01

    The aim of this study was to select useful items for assessing fall risk in healthy elderly Japanese individuals. A total of 965 healthy elderly Japanese subjects aged ≥60 years (349 males 70.4 ± 7.1 years, 616 females 69.9 ± 7.1 years) participated in this study. Of these, 16.6% had suffered from a previous fall. We assumed five fall risk factors: symptoms of falling, physical function, disease and physical symptoms, environment, and behavior and character. Eighty-six items were selected to represent these factors. To confirm the component items for each risk factor, we performed factor analysis (principle factor solution and varimax rotation). The high-fall risk response rate was also calculated for each item, and significant differences in this rate were examined between groups of those who had and not had experienced a fall. Useful items were selected using the following criteria: (1) items showing a significant difference in high fall risk response rate between faller and non-faller groups were selected as useful items; (2) items showing low factor loading (<0.4) for any factor were deleted as inappropriate items; (3) the top two items showing a greater amount of the difference in high fall risk response rate among the representative items for each factor. A total of 50 items were selected from each fall risk factor (symptoms of falling, 3 items; physical function, 22 items; disease and physical symptom, 13 items; environment, 4 items; behavior and character, 8 items). Based on our results, the selected items can comprehensively assess the fall risk of a healthy elderly Japanese population. In addition, the assessment items for physical function comprised items of different levels of difficulty, and these are able to gradually and comprehensively assess physical function.

  7. Item response theory analysis of the Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL).

    PubMed

    Elston, Beth; Goldstein, Marc; Makambi, Kepher H

    2013-05-01

    The Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL) instrument was created to assess the perceived ability of patients receiving physical therapy in adult outpatient settings to perform actions or movements. Its properties must be studied to determine whether it accomplishes this goal. The objective of this study was to investigate the item properties of OPTIMAL with item response theory. This investigation was a retrospective cross-sectional item calibration study. Data were obtained from the American Physical Therapy Association, which collected information from outpatient physical therapy clinics through electronic charting databases that included OPTIMAL responses. Item response theory analyses were performed on the trunk, lower-extremity, and upper-extremity subscales of the Difficulty Scale of OPTIMAL. In total, 3,138 patients completed the Difficulty Scale of OPTIMAL at the baseline assessment. The subscale analyses met all item response theory assumptions. The items in each subscale showed fair discrimination. In all analyses, the subscales measured a narrow range of ability levels at the low end of the physical functioning spectrum. OPTIMAL was originally intended to be administered as a whole. In the present study, each subscale was analyzed separately, indicating how the subscales perform individually but not as a whole. Another limitation is that only the Difficulty Scale of OPTIMAL was analyzed, without consideration of the Confidence Scale. OPTIMAL best measures low physical functioning at the baseline assessment in adult outpatient physical therapy settings. The addition of categories to each item and the addition of more challenging items are recommended to allow measurements for a broader range of patients.

  8. Refining a Web-based goal assessment interview: item reduction based on reliability and predictive validity.

    PubMed

    Schwartz, Carolyn E; Li, Jei; Rapkin, Bruce D

    2016-09-01

    Goals are an important basis for patients' cognitive appraisal processes underlying quality-of-life (QOL) assessment because they are the foundation to one's frame of reference. We sought to identify the best of six goal delineation items and relevant themes for two new versions of the QOL Appraisal Profile: an interview tool using a subset of the best open-ended goal delineation items, and a shorter close-ended version for use in survey research. This is a secondary analysis of longitudinal data (n = 1126) of participants in the North American Research Committee on Multiple Sclerosis (MS) registry. The open-ended data were coded by at least two trained coders with moderately high inter-rater agreement. There were 31 themes reflecting goal content such as health, interpersonal, independence, mental health, and financial themes. Descriptive statistics identified most prevalent themes. Reliability analysis (alpha, item-total correlations) and hierarchical linear modeling identified the best goal items. Based on these qualitative and quantitative analyses, Solve (item 2) is the best single item because it is clear anchor for about a third of the goal themes, and explains the most variance in outcomes and demographic characteristics, suggesting that it taps into and reveals diversity in the sample. The next best items are Accomplish and Maintain (items 1 and 4), which are useful in tapping into and revealing diversity among people reporting cognitive deficits (Accomplish), and demographic factors (both Accomplish and Maintain items). The goal delineation items identified as best performers in this study will be used to develop a shorter open-ended version of the QOL Appraisal Profile, and an entirely close-ended version of the QOL Appraisal Profile for use in more standard survey research settings. These tools will enable coaching patients in medical decision making as well as investigations of appraisal and response shift in QOL research.

  9. Dimensionality assessment of ordered polytomous items with parallel analysis.

    PubMed

    Timmerman, Marieke E; Lorenzo-Seva, Urbano

    2011-06-01

    Parallel analysis (PA) is an often-recommended approach for assessment of the dimensionality of a variable set. PA is known in different variants, which may yield different dimensionality indications. In this article, the authors considered the most appropriate PA procedure to assess the number of common factors underlying ordered polytomously scored variables. They proposed minimum rank factor analysis (MRFA) as an extraction method, rather than the currently applied principal component analysis (PCA) and principal axes factoring. A simulation study, based on data with major and minor factors, showed that all procedures consistently point at the number of major common factors. A polychoric-based PA slightly outperformed a Pearson-based PA, but convergence problems may hamper its empirical application. In empirical practice, PA-MRFA with a 95% threshold based on polychoric correlations or, in case of nonconvergence, Pearson correlations with mean thresholds appear to be a good choice for identification of the number of common factors. PA-MRFA is a common-factor-based method and performed best in the simulation experiment. PA based on PCA with a 95% threshold is second best, as this method showed good performances in the empirically relevant conditions of the simulation experiment.

  10. Differential Effects of Question Formats in Math Assessment on Metacognition and Affect.

    ERIC Educational Resources Information Center

    O'Neil, Harold F., Jr.; Brown, Richard S.

    1998-01-01

    The effect of item format on metacognitive and affective processes of children in a large-scale mathematics assessment program were studied. Results from 1032 eighth graders indicate that open-ended and multiple choice items have differential effects, although these did not vary substantially as a function of gender and ethnicity. (SLD)

  11. Assessing Dimensionality of Noncompensatory Multidimensional Item Response Theory with Complex Structures

    ERIC Educational Resources Information Center

    Svetina, Dubravka

    2013-01-01

    The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…

  12. Informed and Uninformed Naïve Assessment Constructors' Strategies for Item Selection

    ERIC Educational Resources Information Center

    Fives, Helenrose; Barnes, Nicole

    2017-01-01

    We present a descriptive analysis of 53 naïve assessment constructors' explanations for selecting test items to include on a summative assessment. We randomly assigned participants to an informed and uninformed condition (i.e., informed participants read an article describing a Table of Specifications). Through recursive thematic analyses of…

  13. Gender-Related Differential Item Functioning on a Middle-School Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    This study examined gender-related differential item functioning (DIF) using a mathematics performance assessment, the QUASAR Cognitive Assessment Instrument (QCAI), administered to middle school students. The QCAI was developed for the Quantitative Understanding: Amplifying Student Achievement and Reading (QUASAR) project, which focuses on…

  14. Informed and Uninformed Naïve Assessment Constructors' Strategies for Item Selection

    ERIC Educational Resources Information Center

    Fives, Helenrose; Barnes, Nicole

    2017-01-01

    We present a descriptive analysis of 53 naïve assessment constructors' explanations for selecting test items to include on a summative assessment. We randomly assigned participants to an informed and uninformed condition (i.e., informed participants read an article describing a Table of Specifications). Through recursive thematic analyses of…

  15. An Anthropologist among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi

    ERIC Educational Resources Information Center

    Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin

    2015-01-01

    This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…

  16. An Anthropologist among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi

    ERIC Educational Resources Information Center

    Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin

    2015-01-01

    This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…

  17. Assessing Dimensionality of Noncompensatory Multidimensional Item Response Theory with Complex Structures

    ERIC Educational Resources Information Center

    Svetina, Dubravka

    2013-01-01

    The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…

  18. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    PubMed

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. Copyright © 2014 John Wiley & Sons, Ltd.

  19. Development of a questionnaire to assess patient satisfaction with allergen-specific immunotherapy in adults: item generation, item reduction, and preliminary validation

    PubMed Central

    Justícia, Jose Luis; Baró, Eva; Cardona, Victoria; Guardia, Pedro; Ojeda, Pedro; Olaguíbel, José Maria; Vega, José Maria; Vidal, Carmen

    2011-01-01

    Background: Allergen-specific immunotherapy (SIT) is a treatment capable of modifying the natural course of allergy, so ensuring good adherence to SIT is fundamental. Up until now there has not existed an instrument specifically developed to measure patient satisfaction with SIT, although its assessment could help us to comprehend better and improve treatment adherence and effectiveness. The aim of this study was to develop an instrument to measure adult patient satisfaction with SIT. Methods: Items were generated from a literature review, focus groups with allergic adult patients undergoing SIT, and a meeting with experts. Potential items were administered to allergic patients undergoing SIT in an observational, cross-sectional, multicenter study. Item reduction was based on quantitative and qualitative criteria. A preliminary assessment of feasibility, reliability, and validity of the retained items was performed. Results: An initial pool of 70 items was administered to 257 patients undergoing SIT. Fifty-four items were eliminated resulting in a provisional instrument with 16 items. Factor analysis yielded four factors that were identified as perceived efficacy, activities and environment, cost-benefit balance, and overall satisfaction, explaining 74.8% of variance. Ceiling and floor effects were negligible for overall score. Overall score was associated with the type and intensity of symptoms. Conclusion: This is the first attempt to develop a satisfaction with SIT measure from the perspective of the allergic patient, and evidence has been found in favor of its reliability and validity. PMID:21660106

  20. Development of a questionnaire to assess patient satisfaction with allergen-specific immunotherapy in adults: item generation, item reduction, and preliminary validation.

    PubMed

    Justícia, Jose Luis; Baró, Eva; Cardona, Victoria; Guardia, Pedro; Ojeda, Pedro; Olaguíbel, José Maria; Vega, José Maria; Vidal, Carmen

    2011-01-01

    Allergen-specific immunotherapy (SIT) is a treatment capable of modifying the natural course of allergy, so ensuring good adherence to SIT is fundamental. Up until now there has not existed an instrument specifically developed to measure patient satisfaction with SIT, although its assessment could help us to comprehend better and improve treatment adherence and effectiveness. The aim of this study was to develop an instrument to measure adult patient satisfaction with SIT. Items were generated from a literature review, focus groups with allergic adult patients undergoing SIT, and a meeting with experts. Potential items were administered to allergic patients undergoing SIT in an observational, cross-sectional, multicenter study. Item reduction was based on quantitative and qualitative criteria. A preliminary assessment of feasibility, reliability, and validity of the retained items was performed. An initial pool of 70 items was administered to 257 patients undergoing SIT. Fifty-four items were eliminated resulting in a provisional instrument with 16 items. Factor analysis yielded four factors that were identified as perceived efficacy, activities and environment, cost-benefit balance, and overall satisfaction, explaining 74.8% of variance. Ceiling and floor effects were negligible for overall score. Overall score was associated with the type and intensity of symptoms. This is the first attempt to develop a satisfaction with SIT measure from the perspective of the allergic patient, and evidence has been found in favor of its reliability and validity.

  1. The Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Cultural Competence (CC) Item Set

    PubMed Central

    Weech-Maldonado, Robert; Carle, Adam; Weidmer, Beverly; Hurtado, Margarita; Ngo-Metzger, Quyen; Hays, Ron D.

    2013-01-01

    Background There is a need for reliable and valid measures of cultural competence from the patient’s perspective. Objective This paper evaluates the reliability and validity of the Consumer Assessments of Healthcare Providers and Systems (CAHPS®) Cultural Competence (CC) item set. Research Design Using 2008 survey data, we assessed the internal consistency of the CAHPS CC scales using Cronbach alphas, and examined the validity of the measures using exploratory and confirmatory factor analysis, multitrait scaling analysis, and regression analysis. Subjects A random stratified sample (based on race/ethnicity and language) of 991 enrollees, less than 65 years old, from two Medicaid managed care plans in California and New York. Measures CAHPS CC item set after excluding screener items and ratings. Results Confirmatory factor analysis (CFI= 0.98; TLI= 0.98; RMSEA= 0.06) provided support for a seven-factor structure: Doctor Communication-Positive Behaviors; Doctor Communication-Negative Behaviors; Doctor Communication-Health Promotion; Doctor Communication-Alternative Medicine; Shared Decision Making; Equitable Treatment; and Trust. Item--total correlations (corrected for item overlap) for the 7 scales exceeded 0.40. Exploratory factor analysis showed support for one additional factor: Access to Interpreter Services. Internal consistency reliability estimates ranged from 0.58 (Alternative Medicine) to 0.92 (Positive Behaviors), and was 0.70 or higher for four of the eight composites. All composites were positively and significantly associated with the overall doctor rating. Conclusions The CAHPS CC 26-item set demonstrates adequate measurement properties, and can be used as a supplemental item set to the CAHPS Clinician and Group Surveys in assessing culturally competent care from the patient’s perspective. PMID:22895226

  2. Successful Student Writing through Formative Assessment

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover

    2010-01-01

    Use formative assessment to dramatically improve your students' writing. In "Successful Student Writing Through Formative Assessment", educator and international speaker Harry G. Tuttle shows you how to guide middle and high school students through the prewriting, writing, and revision processes using formative assessment techniques that work.…

  3. Successful Student Writing through Formative Assessment

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover

    2010-01-01

    Use formative assessment to dramatically improve your students' writing. In "Successful Student Writing Through Formative Assessment", educator and international speaker Harry G. Tuttle shows you how to guide middle and high school students through the prewriting, writing, and revision processes using formative assessment techniques that work.…

  4. Formative Assessment: Responding to Your Students

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover

    2009-01-01

    This "how-to" book on formative assessment is filled with practical suggestions for teachers who want to use formative assessment in their classrooms. With practical strategies, tools, and examples for teachers of all subjects and grade levels, this book shows you how to use formative assessment to promote successful student learning. Topics…

  5. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments.

    PubMed

    Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A; McFarland, Jenny L; Price, Rebecca M

    2017-01-01

    We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups' total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable-and therefore more valid-scores from conceptual assessments. © 2017 P. Martinková et al. CBE—Life Sciences Education © 2017 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  6. Assessing the clinical significance of single items relative to summated scores.

    PubMed

    Sloan, Jeff A; Aaronson, Neil; Cappelleri, Joseph C; Fairclough, Diane L; Varricchio, Claudette

    2002-05-01

    How many items are needed to measure an individual's quality of life (QOL)? This article describes the strengths and weaknesses of single items and summated scores (from multiple items) as QOL measures. We also address the use of single global measures vs multiple subindices as measures of QOL. The primary themes that recur throughout this article are the relationships between well-defined research objectives, the research setting, and the choice single item vs summated scores to measure QOL. The conceptual framework of the study, the conceptual fit with the measure, and the purpose of the assessment should all be considered when choosing a measure of QOL. No "gold standard" QOL measure can be recommended because no "one size fits all." Single items have the advantage of simplicity at the cost of detail. Multiple-item indices have the advantage of providing a complete profile of QOL component constructs at the cost of increased burden and of asking potentially irrelevant questions. The 2 types of indices are not mutually exclusive and can be used together in a single research study or in the clinical setting.

  7. Is Rasch model analysis applicable in small sample size pilot studies for assessing item characteristics? An example using PROMIS pain behavior item bank data.

    PubMed

    Chen, Wen-Hung; Lenderking, William; Jin, Ying; Wyrwich, Kathleen W; Gelhorn, Heather; Revicki, Dennis A

    2014-03-01

    Large samples are generally considered necessary for Rasch model to obtain robust item parameter estimates. Recently, small sample Rasch analysis was suggested as preliminary assessment of items' psychometric properties. This study is to evaluate the Rasch analysis results using small sample sizes. Ten PROMIS pain behavior items were used. Random samples of 30, 50, 100, and 250, and a targeted sample of 30 were drawn 10 times each from a total of 800 subjects. Rasch analysis was conducted for each of these samples and the full sample. In the full sample, there were 104 cases of extreme scores, no null categories, two incorrectly ordered items, and four misfit items. For samples of 250, 100, 50, 30, and targeted 30, the average numbers of extreme scores were 42.2, 17.1, 9.6, 6.1, and 1.2; the average numbers of null categories were 1.0, 3.2, 8.7, 13.4, and 8.3; the average numbers of items with incorrectly ordered item parameters were 0.1, 0.8, 2.9, 4.7, and 3.7; and the average numbers of items with fit residuals exceeding ± 2.5 were 0.8, 0.3, 0.1, 0.2, and 0.3, respectively. Rasch analysis based on small samples (≤ 50) identified a greater number of items with incorrectly ordered parameters than larger samples (≥ 100). However, fewer items were identified as misfitting. Results from small samples led to opposite conclusions from those based on larger samples. Rasch analysis based on small samples should be used for exploratory purposes with extreme caution.

  8. Psychometrical assessment and item analysis of the General Health Questionnaire in victims of terrorism.

    PubMed

    Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

    2013-03-01

    There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and 729 relatives of the victims. All participants were evaluated using the 28-item version of the GHQ (GHQ-28). We examined the reliability and external validity of scores on the scale using Cronbach's alpha and Pearson correlation with the State-Trait Anxiety Inventory (STAI), respectively. The factor structure of the scale was analyzed with varimax rotation. Samejima's (1969) graded response model was used to explore the item properties. The GHQ-28 scores showed good reliability and item-scale correlations. The factor analysis identified 3 factors: anxious-somatic symptoms, social dysfunction, and depression symptoms. All factors showed good correlation with the STAI. Before rotation, the first, second, and third factor explained 44.0%, 6.4%, and 5.0% of the variance, respectively. Varimax rotation redistributed the percentages of variance accounted for to 28.4%, 13.8%, and 13.2%, respectively. Items with the highest loadings in the first factor measured anxiety symptoms, whereas items with the highest loadings in the third factor measured suicide ideation. Samejima's model found that high scores in suicide-related items were associated with severe depression. The factor structure of the GHQ-28 found in this study underscores the preeminence of anxiety symptoms among victims of terrorism and their relatives. Item response analysis identified the most difficult and significant items for each factor.

  9. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    ERIC Educational Resources Information Center

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2015-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…

  10. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    ERIC Educational Resources Information Center

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2015-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…

  11. How Do You Know if They're Getting It? Writing Assessment Items that Reveal Student Understanding

    ERIC Educational Resources Information Center

    Taylor, Melanie; Smith, Sean

    2009-01-01

    Through a project funded by the National Science Foundation, Horizon Research has been developing assessment items for students (in the process, compiling item-writing principles from several sources and adding their own). In this article, the authors share what they have learned about writing items that reveal student understanding, including…

  12. How Do You Know if They're Getting It? Writing Assessment Items that Reveal Student Understanding

    ERIC Educational Resources Information Center

    Taylor, Melanie; Smith, Sean

    2009-01-01

    Through a project funded by the National Science Foundation, Horizon Research has been developing assessment items for students (in the process, compiling item-writing principles from several sources and adding their own). In this article, the authors share what they have learned about writing items that reveal student understanding, including…

  13. Investigation of Science Inquiry Items for Use on an Alternate Assessment Based on Modified Achievement Standards Using Cognitive Lab Methodology

    ERIC Educational Resources Information Center

    Dickenson, Tammiee S.; Gilmore, Joanna A.; Price, Karen J.; Bennett, Heather L.

    2013-01-01

    This study evaluated the benefits of item enhancements applied to science-inquiry items for incorporation into an alternate assessment based on modified achievement standards for high school students. Six items were included in the cognitive lab sessions involving both students with and without disabilities. The enhancements (e.g., use of visuals,…

  14. Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

    PubMed Central

    Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

    2014-01-01

    Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753

  15. Limitations of a single-item assessment of suicide attempt history: Implications for standardized suicide risk assessment.

    PubMed

    Hom, Melanie A; Joiner, Thomas E; Bernert, Rebecca A

    2016-08-01

    Although a suicide attempt history is among the single best predictors of risk for eventual death by suicide, little is known about the extent to which reporting of suicide attempts may vary by assessment type. The current study aimed to investigate the correspondence between suicide attempt history information obtained via a single-item self-report survey, multi-item self-report survey, and face-to-face clinical interview. Data were collected among a high-risk sample of undergraduates (N = 100) who endorsed a past attempt on a single-item prescreening survey. Participants subsequently completed a multi-item self-report survey, which was followed by a face-to-face clinical interview, both of which included additional questions regarding the timing and nature of previous attempts. Even though 100% of participants (n = 100) endorsed a suicide attempt history on the single-item prescreening survey, only 67% (n = 67) reported having made a suicide attempt on the multi-item follow-up survey. After incorporating ancillary information from the in-person interview, 60% of participants qualified for a Centers for Disease Control and Prevention (CDC)-defined suicide attempt. Of the 40% who did not qualify for a CDC-defined suicide attempt, 30% instead qualified for no attempt, 7% an aborted attempt, and 3% an interrupted attempt. These findings suggest that single-item assessments of suicide attempt history may result in the misclassification of prior suicidal behaviors. Given that such assessments are commonly used in research and clinical practice, these results emphasize the importance of utilizing follow-up questions and assessments to improve precision in the characterization and assessment of suicide risk. (PsycINFO Database Record

  16. Limitations of a Single-Item Assessment of Suicide Attempt History: Implications for Standardized Suicide Risk Assessment

    PubMed Central

    Hom, Melanie A.; Joiner, Thomas E.; Bernert, Rebecca A.

    2015-01-01

    Although a suicide attempt history is among the single best predictors of risk for eventual death by suicide, little is known about the extent to which reporting of suicide attempts may vary by assessment type. The current study aimed to investigate the correspondence between suicide attempt history information obtained via a single-item self-report survey, multi-item self-report survey, and face-to-face clinical interview. Data were collected among a high-risk sample of undergraduates (N = 100) who endorsed a past attempt on a single-item prescreening survey. Participants subsequently completed a multi-item self-report survey, which was followed by a face-to-face clinical interview, both of which included additional questions regarding the timing and nature of previous attempts. Even though 100% of participants (n = 100) endorsed a suicide attempt history on the single-item prescreening survey, only 67% (n = 67) reported having made a suicide attempt on the multi-item follow-up survey. After incorporating ancillary information obtained from the in-person interview, 60% of participants qualified for a CDC-defined suicide attempt. Of the 40% who did not qualify for a CDC-defined suicide attempt, 30% instead qualified for no attempt, 7% an aborted attempt, and 3% an interrupted attempt. These findings suggest that single-item assessments of suicide attempt history may result in the misclassification of prior suicidal behaviors. Given that such assessments are commonly used in research and clinical practice, these results emphasize the importance of utilizing follow-up questions and assessments to improve precision in the characterization and assessment of suicide risk. PMID:26502202

  17. Student Think Aloud Reflections on Comprehensible and Readable Assessment Items: Perspectives on What Does and Does Not Make an Item Readable. Technical Report 48

    ERIC Educational Resources Information Center

    Johnstone, Christopher; Liu, Kristi; Altman, Jason; Thurlow, Martha

    2007-01-01

    This document reports on research related to large-scale assessments for students with learning disabilities in the area of reading. As part of a process of making assessments more universally designed the authors examined the role of "readable and comprehensible" test items (Thompson, Johnstone, & Thurlow, 2002). In this research, they used think…

  18. Differential Item Functioning by Gender on a Large-Scale Science Performance Assessment: A Comparison across Grade Levels.

    ERIC Educational Resources Information Center

    Holweger, Nancy; Taylor, Grace

    The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…

  19. Implementing Formative Mathematics Assessments in Prekindergarten

    ERIC Educational Resources Information Center

    Komara, Cecile; Herron, Julie

    2012-01-01

    Authentic assessment "refers to the systematic collection of information about the naturally occurring behaviors of young children and families in their daily routines" (Neisworth & Bagnato, 2004, p. 204). In formative assessments, the assessment information informs instruction. Formative assessments are given periodically and should be used to…

  20. A flexible item to screen for depression in inner-city minorities during palliative care symptom assessment.

    PubMed

    Francoeur, Richard Benoit

    2006-03-01

    There is inconsistent evidence for the validity of a single item to screen depression. In inner-city minority populations, the "yes/no" forced-response option may encourage bias, especially in elders and men, who view depression as stigmatizing or the healthcare system as untrustworthy. In contrast, an open-choice format with a category for ambivalent and missing responses could be acceptable if administered during the legitimized context of a physical symptom assessment. Retrospective data were analyzed from 146 black and Latino inner-city patients receiving palliative care for various physical conditions. Bivariate analyses and ordinal regressions are based on the most recent comprehensive patient assessment conducted by a black female nurse and a bilingual Latina social worker. The depression item (no, unknown, yes) predicts pain and symptom attitude, which is more "hopeful" in older men with unknown depression status than in younger and older women with unknown depression status or no depression. The more "hopeful" pain and symptom attitudes by older men in the unknown category for depression suggest that depression, apathy, and resignation in older minority men may be hidden from clinicians in the absence of the open-choice depression item.

  1. A Study of Item Bias in the Maine Educational Assessment Test.

    ERIC Educational Resources Information Center

    Smith, James Brian

    A study used four statistical item bias analysis strategies to determine the French cross-cultural validity of the Maine Educational Assessment, a standardized test administered in six content areas to students in grades 4, 8, and 11. Analysis was performed on eighth grade pupil performance in test year 1988-89, in the areas of the 100 common…

  2. Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories

    ERIC Educational Resources Information Center

    Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L.

    2009-01-01

    Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…

  3. Identifying Promising Items: The Use of Crowdsourcing in the Development of Assessment Instruments

    ERIC Educational Resources Information Center

    Sadler, Philip M.; Sonnert, Gerhard; Coyle, Harold P.; Miller, Kelly A.

    2016-01-01

    The psychometrically sound development of assessment instruments requires pilot testing of candidate items as a first step in gauging their quality, typically a time-consuming and costly effort. Crowdsourcing offers the opportunity for gathering data much more quickly and inexpensively than from most targeted populations. In a simulation of a…

  4. Applying Item Response Theory Methods to Design a Learning Progression-Based Science Assessment

    ERIC Educational Resources Information Center

    Chen, Jing

    2012-01-01

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1)…

  5. The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

    ERIC Educational Resources Information Center

    Lee, HyeSun; Geisinger, Kurt F.

    2016-01-01

    The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…

  6. The Value of Item Response Theory in Clinical Assessment: A Review

    ERIC Educational Resources Information Center

    Thomas, Michael L.

    2011-01-01

    Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

  7. To Sum or Not to Sum: Taxometric Analysis with Ordered Categorical Assessment Items

    ERIC Educational Resources Information Center

    Walters, Glenn D.; Ruscio, John

    2009-01-01

    Meehl's taxometric method has been shown to differentiate between categorical and dimensional data, but there are many ways to implement taxometric procedures. When analyzing the ordered categorical data typically provided by assessment instruments, summing items to form input indicators has been a popular practice for more than 20 years. A Monte…

  8. Improving the Memory Sections of the Standardized Assessment of Concussion Using Item Analysis

    ERIC Educational Resources Information Center

    McElhiney, Danielle; Kang, Minsoo; Starkey, Chad; Ragan, Brian

    2014-01-01

    The purpose of the study was to improve the immediate and delayed memory sections of the Standardized Assessment of Concussion (SAC) by identifying a list of more psychometrically sound items (words). A total of 200 participants with no history of concussion in the previous six months (aged 19.60 ± 2.20 years; N?=?93 men, N?=?107 women)…

  9. The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

    ERIC Educational Resources Information Center

    Lee, HyeSun; Geisinger, Kurt F.

    2016-01-01

    The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…

  10. Applying Item Response Theory Methods to Design a Learning Progression-Based Science Assessment

    ERIC Educational Resources Information Center

    Chen, Jing

    2012-01-01

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1)…

  11. High-Dimensional Explanatory Random Item Effects Models for Rater-Mediated Assessments

    ERIC Educational Resources Information Center

    Kelcey, Ben; Wang, Shanshan; Cox, Kyle

    2016-01-01

    Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…

  12. Assessing Model Data Fit of Unidimensional Item Response Theory Models in Simulated Data

    ERIC Educational Resources Information Center

    Kose, Ibrahim Alper

    2014-01-01

    The purpose of this paper is to give an example of how to assess the model-data fit of unidimensional IRT models in simulated data. Also, the present research aims to explain the importance of fit and the consequences of misfit by using simulated data sets. Responses of 1000 examinees to a dichotomously scoring 20 item test were simulated with 25…

  13. To Sum or Not to Sum: Taxometric Analysis with Ordered Categorical Assessment Items

    ERIC Educational Resources Information Center

    Walters, Glenn D.; Ruscio, John

    2009-01-01

    Meehl's taxometric method has been shown to differentiate between categorical and dimensional data, but there are many ways to implement taxometric procedures. When analyzing the ordered categorical data typically provided by assessment instruments, summing items to form input indicators has been a popular practice for more than 20 years. A Monte…

  14. The Value of Item Response Theory in Clinical Assessment: A Review

    ERIC Educational Resources Information Center

    Thomas, Michael L.

    2011-01-01

    Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

  15. Identifying Promising Items: The Use of Crowdsourcing in the Development of Assessment Instruments

    ERIC Educational Resources Information Center

    Sadler, Philip M.; Sonnert, Gerhard; Coyle, Harold P.; Miller, Kelly A.

    2016-01-01

    The psychometrically sound development of assessment instruments requires pilot testing of candidate items as a first step in gauging their quality, typically a time-consuming and costly effort. Crowdsourcing offers the opportunity for gathering data much more quickly and inexpensively than from most targeted populations. In a simulation of a…

  16. Randomised Items in Computer-Based Tests: Russian Roulette in Assessment?

    ERIC Educational Resources Information Center

    Marks, Anthony M.; Cronje, Johannes C.

    2008-01-01

    Computer-based assessments are becoming more commonplace, perhaps as a necessity for faculty to cope with large class sizes. These tests often occur in large computer testing venues in which test security may be compromised. In an attempt to limit the likelihood of cheating in such venues, randomised presentation of items is automatically…

  17. Improving the Memory Sections of the Standardized Assessment of Concussion Using Item Analysis

    ERIC Educational Resources Information Center

    McElhiney, Danielle; Kang, Minsoo; Starkey, Chad; Ragan, Brian

    2014-01-01

    The purpose of the study was to improve the immediate and delayed memory sections of the Standardized Assessment of Concussion (SAC) by identifying a list of more psychometrically sound items (words). A total of 200 participants with no history of concussion in the previous six months (aged 19.60 ± 2.20 years; N?=?93 men, N?=?107 women)…

  18. PSSA Released Reading Items, 2000-2001. The Pennsylvania System of School Assessment.

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Education, Harrisburg. Bureau of Curriculum and Academic Services.

    This document contains materials directly related to the actual reading test of the Pennsylvania System of School Assessment (PSSA), including the reading rubric, released passages, selected-response questions with answer keys, performance tasks, and scored samples of students' responses to the tasks. All of these items may be duplicated to…

  19. Item Format as a Factor Affecting the Relative Standing of Countries in the Third International Mathematics and Science Study (TIMSS).

    ERIC Educational Resources Information Center

    O'Leary, Michael

    Data from the Third International Mathematics and Science Study (TIMSS) were examined to determine the extent to which the rank ordering of countries based on pupil test performance was consistent across three different item formats: multiple-choice, short-answer, and extended-response. Findings from the analysis are used to make the case that…

  20. Automatic Item Generation of Probability Word Problems

    ERIC Educational Resources Information Center

    Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

    2009-01-01

    Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…

  1. Automatic Item Generation of Probability Word Problems

    ERIC Educational Resources Information Center

    Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

    2009-01-01

    Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…

  2. Using a Constructed-Response Instrument to Explore the Effects of Item Position and Item Features on the Assessment of Students' Written Scientific Explanations

    NASA Astrophysics Data System (ADS)

    Federer, Meghan Rector; Nehm, Ross H.; Opfer, John E.; Pearl, Dennis

    2015-08-01

    A large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice instruments (MCI). Constructed-response instruments (CRI), however, have invited much less scrutiny, perhaps because of their reputation for avoiding many of the documented biases of MCIs. In this study we explored whether known biases of MCIs—specifically item sequencing and surface feature effects—were also apparent in a CRI designed to assess students' understanding of evolutionary change using written explanation (Assessment of COntextual Reasoning about Natural Selection [ACORNS]). We used three versions of the ACORNS CRI to investigate different aspects of assessment structure and their corresponding effect on inferences about student understanding. Our results identified several sources of (and solutions to) assessment bias in this practice-focused CRI. First, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. Second, a counterbalanced design (i.e., Latin Square) mitigated this bias at the population level of analysis. Third, ACORNS response scores were highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Our results suggest that as assessments in science education shift toward the measurement of scientific practices (e.g., explanation), it is critical that biases inherent in these types of assessments be investigated empirically.

  3. Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

    ERIC Educational Resources Information Center

    Baghaei, Purya; Aryadoust, Vahid

    2015-01-01

    Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…

  4. Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

    ERIC Educational Resources Information Center

    Baghaei, Purya; Aryadoust, Vahid

    2015-01-01

    Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…

  5. Formative Assessment: Guidance for Early Childhood Policymakers

    ERIC Educational Resources Information Center

    Riley-Ayers, Shannon

    2014-01-01

    This policy report provides a guide and framework to early childhood policymakers considering formative assessment. The report defines formative assessment and outlines its process and application in the context of early childhood. The substance of this document is the issues for consideration in the implementation of the formative assessment…

  6. Formative Assessment: Guidance for Early Childhood Policymakers

    ERIC Educational Resources Information Center

    Riley-Ayers, Shannon

    2014-01-01

    This policy report provides a guide and framework to early childhood policymakers considering formative assessment. The report defines formative assessment and outlines its process and application in the context of early childhood. The substance of this document is the issues for consideration in the implementation of the formative assessment…

  7. The Assessment of Quantitative Problem-Solving Skills with "None of the Above"--Items (NOTA Items).

    ERIC Educational Resources Information Center

    Dochy, Filip; Moerkerke, George; De Corte, Erik; Segers, Mien

    2001-01-01

    Focuses on the discussion of whether "none of the above" (NOTA) questions should be used on tests. Discusses a study in which a protocol analysis was conducted on written statements of examinees while answering NOTA items. Explains that a multiple-choice test was given to university students finding that NOTA options seem to be more attractive.…

  8. Formative and Summative Assessment in the Classroom

    ERIC Educational Resources Information Center

    Dixson, Dante D.; Worrell, Frank C.

    2016-01-01

    In this article, we provide brief overviews of the definitions of formative and summative assessment and a few examples of types of formative and summative assessments that can be used in classroom contexts. We highlight the points that these two types of assessment are complementary and the differences between them are often in the way these…

  9. Designing K-2 Formative Assessment Tasks

    ERIC Educational Resources Information Center

    Reed, Kristen E.; Goldenberg, E. Paul

    2016-01-01

    Formative assessment is a process used by teachers and students during instruction that provides feedback to adjust ongoing teaching and learning to improve students' achievements of intended instructional outcomes. Formative assessment means assessment embedded in instruction. That definition was adopted in 2006 by the Council of Chief State…

  10. Designing K-2 Formative Assessment Tasks

    ERIC Educational Resources Information Center

    Reed, Kristen E.; Goldenberg, E. Paul

    2016-01-01

    Formative assessment is a process used by teachers and students during instruction that provides feedback to adjust ongoing teaching and learning to improve students' achievements of intended instructional outcomes. Formative assessment means assessment embedded in instruction. That definition was adopted in 2006 by the Council of Chief State…

  11. Elementary Teacher Use of Formative Assessment

    ERIC Educational Resources Information Center

    Cotton, Donna McLamb

    2013-01-01

    This dissertation was designed to examine elementary teacher use of formative assessment and the impact formative assessment may have on student achievement as measured by benchmark assessments. The study was conducted in a school district in northwestern North Carolina. The teachers in this study have had NCFALCON training in the use of formative…

  12. Formative and Summative Assessment in the Classroom

    ERIC Educational Resources Information Center

    Dixson, Dante D.; Worrell, Frank C.

    2016-01-01

    In this article, we provide brief overviews of the definitions of formative and summative assessment and a few examples of types of formative and summative assessments that can be used in classroom contexts. We highlight the points that these two types of assessment are complementary and the differences between them are often in the way these…

  13. Using Item Response Theory (IRT) to Reduce Patient Burden When Assessing Desire for Hastened Death.

    PubMed

    Kolva, Elissa; Rosenfeld, Barry; Liu, Ying; Pessin, Hayley; Breitbart, William

    2016-06-09

    Desire for hastened death (DHD) represents a wish to die sooner than might occur by natural disease progression. Efficient and accurate assessment of DHD is vital for clinicians providing care to terminally ill patients. The Schedule of Attitudes Toward Hastened Death (SAHD) is a commonly used self-report measure of DHD. The goal of this study was to use methods grounded in item response theory (IRT) to analyze the psychometric properties of the SAHD and identify an abbreviated version of the scale. Data were drawn from 4 studies of psychological distress at the end of life. Participants were 1,076 patients diagnosed with either advanced cancer or AIDS. The sample was divided into 2 subsamples for scale analysis and development of the shortened form. IRT was used to estimate item parameters. A 6-item version of the SAHD (SAHD-A) was identified through examination of item parameter estimations. The SAHD-A demonstrated adequate convergent validity. Receiver operating characteristic analyses indicated comparable cut scores to identify patients with high levels of DHD. These analyses support the utility of the SAHD-A, which can be more easily integrated into research studies and clinical assessments of DHD. (PsycINFO Database Record

  14. State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts

    ERIC Educational Resources Information Center

    Swanson, Leonard C.

    2010-01-01

    This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…

  15. Assessment of dietary fish consumption in pregnancy: comparing one-, four- and thirty-six-item questionnaires.

    PubMed

    Oken, Emily; Guthrie, Lauren B; Bloomingdale, Arienne; Gillman, Matthew W; Olsen, Sjurdur F; Amarasiriwardena, Chitra J; Platek, Deborah N; Bellinger, David C; Wright, Robert O

    2014-09-01

    Fish consumption influences a number of health outcomes. Few studies have directly compared dietary assessment methods to determine the best approach to estimating intakes of fish and its component nutrients, including DHA, and toxicants, including methylmercury. Our objective was to compare three methods of assessing fish intake. We assessed 30 d fish intake using three approaches: (i) a single question on total fish consumption; (ii) a brief comprehensive FFQ that included four questions about fish; and (iii) a focused FFQ with thirty-six questions about different finfish and shellfish. Obstetrics practices in Boston, MA, USA. Fifty-nine pregnant women who consumed ≤2 monthly fish servings. Estimated intakes of fish, DHA and Hg were lowest with the one-question screener and highest with the thirty-six-item fish questionnaire. Estimated intake of DHA with the thirty-six-item questionnaire was 4·4-fold higher (97 v. 22 mg/d), and intake of Hg was 3·8-fold higher (1·6 v. 0·42 μg/d), compared with the one-question screener. Plasma DHA concentration was correlated with fish intake assessed with the one-question screener (Spearman r = 0·27, P = 0·04), but not with the four-item FFQ (r = 0·08, P = 0·54) or the thirty-six-item fish questionnaire (r = 0·01, P = 0·93). In contrast, blood and hair Hg concentrations were similarly correlated with fish and Hg intakes regardless of the assessment method (r = 0·35 to 0·52). A longer questionnaire provides no advantage over shorter questionnaires in ranking intakes of fish, DHA and Hg compared with biomarkers, but estimates of absolute intakes can vary by as much as fourfold across methods.

  16. Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

    ERIC Educational Resources Information Center

    Parish, Jane A.; Karisch, Brandi B.

    2013-01-01

    Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…

  17. PISA Test Items and School-Based Examinations in Greece: Exploring the Relationship between Global and Local Assessment Discourses

    ERIC Educational Resources Information Center

    Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

    2013-01-01

    The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…

  18. PISA Test Items and School-Based Examinations in Greece: Exploring the Relationship between Global and Local Assessment Discourses

    ERIC Educational Resources Information Center

    Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

    2013-01-01

    The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…

  19. A Beginning Validation of Causes of Local Item Dependence in a Large Scale Hands-On Science Performance Assessment.

    ERIC Educational Resources Information Center

    Ferrara, Steven; And Others

    A study was conducted to begin a process of validating hypothesized causes of local item dependence (LID) in large-scale performance assessments. Data for the study are item level scores from 26 science tasks from the 1993 edition of the Maryland School Performance Assessment Program. Causes of high LID were hypothesized from studies by Ferrara et…

  20. Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

    ERIC Educational Resources Information Center

    Parish, Jane A.; Karisch, Brandi B.

    2013-01-01

    Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…

  1. Differential item functioning between ethnic groups in the epidemiological assessment of depression.

    PubMed

    Breslau, Joshua; Javaras, Kristin N; Blacker, Deborah; Murphy, Jane M; Normand, Sharon-Lise T

    2008-04-01

    A potential explanation for the finding that disadvantaged minority status is associated with a lower lifetime risk for depression is that individuals from minority ethnic groups may be less likely to endorse survey questions about depression even when they have the same level of depression. We examine this possibility using a nonparametric item response theory approach to assess differential item functioning (DIF) in a national survey of psychiatric disorders, the National Comorbidity Survey. Of 20 questions used to assess depression symptoms, we found evidence of DIF in 3 questions when comparing non-Hispanic blacks with non-Hispanic whites and in 3 questions when comparing Hispanics with non-Hispanic whites. However, removal of the questions with DIF did not alter the relative prevalence of depression between ethnic groups. Ethnic differences do exist in response to questions concerning depression, but these differences do not account for the finding of relatively low prevalence of depression among minority groups.

  2. An item response theory evaluation of three depression assessment instruments in a clinical sample

    PubMed Central

    2012-01-01

    Background This study investigates whether an analysis, based on Item Response Theory (IRT), can be used for initial evaluations of depression assessment instruments in a limited patient sample from an affective disorder outpatient clinic, with the aim to finding major advantages and deficiencies of the instruments. Methods Three depression assessment instruments, the depression module from the Patient Health Questionnaire (PHQ9), the depression subscale of Affective Self Rating Scale (AS-18-D) and the Montgomery-Åsberg Depression Rating Scale (MADRS) were evaluated in a sample of 61 patients with affective disorder diagnoses, mainly bipolar disorder. A ‘3- step IRT strategy’ was used. Results In a first step, the Mokken non-parametric analysis showed that PHQ9 and AS-18-D had strong overall scalabilities of 0.510 [C.I. 0.42, 0.61] and 0,513 [C.I. 0.41, 0.63] respectively, while MADRS had a weak scalability of 0.339 [C.I. 0.25, 0.43]. In a second step, a Rasch model analysis indicated large differences concerning the item discriminating capacity and was therefore considered not suitable for the data. In third step, applying a more flexible two parameter model, all three instruments showed large differences in item information and items had a low capacity to reliably measure respondents at low levels of depression severity. Conclusions We conclude that a stepwise IRT-approach, as performed in this study, is a suitable tool for studying assessment instruments at early stages of development. Such an analysis can give useful information, even in small samples, in order to construct more precise measurements or to evaluate existing assessment instruments. The study suggests that the PHQ9 and AS-18-D can be useful for measurement of depression severity in an outpatient clinic for affective disorder, while the MADRS shows weak measurement properties for this type of patients. PMID:22721257

  3. Assessment of the quality and applicability of an e-portfolio capstone assessment item within a bachelor of midwifery program.

    PubMed

    Baird, Kathleen; Gamble, Jenny; Sidebotham, Mary

    2016-09-01

    Education programs leading to professional licencing need to ensure assessments throughout the program are constructively aligned and mapped to the specific professional expectations. Within the final year of an undergraduate degree, a student is required to transform and prepare for professional practice. Establishing assessment items that are authentic and able to reflect this transformation is a challenge for universities. This paper both describes the considerations around the design of a capstone assessment and evaluates, from an academics perspective, the quality and applicability of an e-portfolio as a capstone assessment item for undergraduate courses leading to a professional qualification. The e-portfolio was seen to meet nine quality indicators for assessment. Academics evaluated the e-portfolio as an authentic assessment item that would engage the students and provide them with a platform for ongoing professional development and lifelong learning. The processes of reflection on strengths, weaknesses, opportunities and threats, comparison of clinical experiences with national statistics, preparation of professional philosophy and development of a curriculum vitae, whilst recognised as comprehensive and challenging were seen as highly valuable to the student transforming into the profession.

  4. Formative Assessments in a Professional Learning Community

    ERIC Educational Resources Information Center

    Stanley, Todd; Moore, Betsy

    2011-01-01

    The ideas and examples in this book help teachers successfully collaborate to raise student achievement through the use of formative assessments. Here, Todd Stanley and Betsy Moore, educators with over 40 years of combined experience, offer proven formative assessment strategies to teachers in a professional learning community. Contents include:…

  5. Implementation of Formative Assessment in the Classroom

    ERIC Educational Resources Information Center

    Edman, Elaina; Gilbreth, Stephen G.; Wynn, Sheila

    2010-01-01

    This report details the work defined by a doctoral team looking at the literacy and implementation of formative assessment in classrooms in Southwest Missouri. The mission of this project was to identify the formative assessment literacy levels and the degree of classroom implementation of these strategies in districts and the resulting…

  6. Harnessing Collaborative Annotations on Online Formative Assessments

    ERIC Educational Resources Information Center

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…

  7. Formative Assessments in a Professional Learning Community

    ERIC Educational Resources Information Center

    Stanley, Todd; Moore, Betsy

    2011-01-01

    The ideas and examples in this book help teachers successfully collaborate to raise student achievement through the use of formative assessments. Here, Todd Stanley and Betsy Moore, educators with over 40 years of combined experience, offer proven formative assessment strategies to teachers in a professional learning community. Contents include:…

  8. Improving Foreign Language Speaking through Formative Assessment

    ERIC Educational Resources Information Center

    Tuttle, Harry Grover; Tuttle, Alan Robert

    2012-01-01

    Want a quick way to get your students happily conversing more in the target language? This practical book shows you how to use formative assessments to gain immediate and lasting improvement in your students' fluency. You'll learn how to: (1) Imbed the 3-minute formative assessment into every lesson with ease; (2) Engage students in peer formative…

  9. Harnessing Collaborative Annotations on Online Formative Assessments

    ERIC Educational Resources Information Center

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…

  10. The Political Dilemmas of Formative Assessment

    ERIC Educational Resources Information Center

    Dorn, Sherman

    2010-01-01

    The literature base on using formative assessment for instructional and intervention decisions is formidable, but the history of the practice of formative assessment is spotty. Even with the pressures of high-stakes accountability, its definition is fuzzy, its adoption is inconsistent, and the prognosis for future use is questionable. A historical…

  11. Making Moves: Formative Assessment in Mathematics

    ERIC Educational Resources Information Center

    Duckor, Brent; Holmberg, Carrie; Becker, Joanne Rossi

    2017-01-01

    Research on teacher professional learning has shown that formative assessment can improve student learning more than most instructional practices. Empirical evidence indicates that thoughtfully implemented formative assessment practices improve students' learning, increase students' scores, and narrow achievement gaps between low-achieving…

  12. Learning Progressions that Support Formative Assessment Practices

    ERIC Educational Resources Information Center

    Alonzo, Alicia C.

    2011-01-01

    Black, Wilson, and Yao (this issue) lay out a comprehensive vision for the way that learning progressions (or other "road maps") might be used to inform and coordinate formative and summative purposes of assessment. As Black, Wilson, and others have been arguing for over a decade, the effective use of formative assessment has great potential to…

  13. Formative Assessment Probes: With a Purpose

    ERIC Educational Resources Information Center

    Keeley, Page

    2011-01-01

    The first thing that comes to mind for many teachers when they think of assessment is testing, quizzes, performance tasks, and other summative forms used for grading purposes. Such assessment practices represent only a fraction of the kinds of assessment that occur on an ongoing basis in an effective science classroom. Formative assessment is a…

  14. Test Industry Split over "Formative" Assessment

    ERIC Educational Resources Information Center

    Cech, Scott J.

    2008-01-01

    There's a war of sorts going on within the normally staid assessment industry, and it's a war over the definition of a type of assessment that many educators understand in only a sketchy fashion. Formative assessments, also known as "classroom assessments," are in some ways easier to define by what they are not. They're not like the long,…

  15. Test Industry Split over "Formative" Assessment

    ERIC Educational Resources Information Center

    Cech, Scott J.

    2008-01-01

    There's a war of sorts going on within the normally staid assessment industry, and it's a war over the definition of a type of assessment that many educators understand in only a sketchy fashion. Formative assessments, also known as "classroom assessments," are in some ways easier to define by what they are not. They're not like the long,…

  16. Do item weights matter? An assessment using the oral health impact profile.

    PubMed

    Allen, P F; Locker, D

    1997-09-01

    To determine whether or not item weights contribute to the performance of the Oral Health Impact Profile (OHIP), a comprehensive measure of the functional, social and psychological outcomes of oral disorders. Data were obtained as part of an oral health survey of older adults living in Ontario, Canada. Subjects completed a personal interview, clinical examination and a self-complete version of the OHIP. OHIP scores were calculated in three ways: a simple count method, an additive method and a method incorporating item weights derived from the Thurstone paired comparison technique. These scores were calculated for the full 49-item version of the measure and for a short form consisting of 14 selected items. The discriminant, concurrent and predictive validity of these scores for the two versions of the measure were ascertained. Complete data were obtained for 522 subjects. Just over half were female (56 per cent) and their mean age was 66 years. The OHIP discriminated between groups based on dental status (dentate/edentulous), presence of dry mouth (yes/no) and, for the dentate, according to the number of remaining teeth (less than 20/20 or more) irrespective of scoring method or the version of the questionnaire used. All scores showed significant associations with self-rated oral health, self-perceived need for dental care and dissatisfaction with oral health status. There was evidence to suggest that weighted scores were better at discriminating between groups than the simple count method but no better than the additive method. Similar findings emerged with respect to the ability of the scores to predict prosthodontic, surgical and restorative treatment needs. Although the data suggested that item weights did improve the performance of the OHIP, the fact that simple scoring methods were as good as more sophisticated ones might mean that the OHIP could be used in contexts, such as patient assessment for clinical care, where the calculation of weighted scores was not

  17. The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

    ERIC Educational Resources Information Center

    Domyancich, John M.

    2014-01-01

    Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…

  18. Contextual Explanations of Local Dependence in Item Clusters in a Large Scale Hands-On Science Performance Assessment.

    ERIC Educational Resources Information Center

    Ferrara, Steven; Huynh, Huynh; Michaels, Hillary

    1999-01-01

    Provides hypothesized explanations for local item dependence (LID) in a large-scale hands-on science performance assessment involving approximately 55,000 students each at grades 3, 5, and 8. Items that appear to elicit locally dependent responses require examinees to answer and explain their answers or to use given or generalized information to…

  19. Exploring Individual and Item Factors that Affect Assessment Validity for Diverse Learners: Results from a Large-Scale Cognitive Lab

    ERIC Educational Resources Information Center

    Winter, Phoebe C.; Kopriva, Rebecca J.; Chen, Chen-Su; Emick, Jessica E.

    2006-01-01

    A cognitive lab technique (n=156) was used to investigate interactions between individual factors and item factors presumed to affect assessment validity for diverse students, including English language learners. Findings support the concept of "access"--an interaction between specific construct-irrelevant item features and individual…

  20. The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

    ERIC Educational Resources Information Center

    Domyancich, John M.

    2014-01-01

    Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…

  1. Item Difficulty Modeling of Paragraph Comprehension Items

    ERIC Educational Resources Information Center

    Gorin, Joanna S.; Embretson, Susan E.

    2006-01-01

    Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…

  2. Item Difficulty Modeling of Paragraph Comprehension Items

    ERIC Educational Resources Information Center

    Gorin, Joanna S.; Embretson, Susan E.

    2006-01-01

    Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…

  3. The Relation Between Item Format and the Structure of the Eysenck Personality Inventory

    ERIC Educational Resources Information Center

    Velicer, Wayne F.; Stevenson, John F.

    1978-01-01

    A Likert seven-choice response format for personality inventories allows finer distinctions by subjects than the traditional two-choice format. The Eysenck Personality Inventory was employed in the present study to test the hypothesis that use of the expanded format would result in a clearer and more accurate indication of test structure.…

  4. Psychometric features of an assessment instrument with likert and dichotomous response formats.

    PubMed

    Capik, Canturk; Gozum, Sebahat

    2015-01-01

    To assess the psychometric properties of a Likert-formatted assessment instrument after altering the responses to a dichotomous format. This methodological study used a 15-item instrument to obtain data from 183 participants who responded in both Likert and dichotomous formats. Response sets from each format were compared. Each response set underwent factor analysis, Kuder-Richardson 20, Cronbach's α coefficient, item-total correlation, and parallel form equivalence tests. Factor loads of the instrument varied between .362 and .754 when responses were Likert-formatted and between .370 and .713 when responses were dichotomous. The Cronbach's α coefficient with Likert-formatted responses was .858; the Kuder-Richardson 20 coefficient of the dichotomous responses was .827. Parallel form equivalences were significant at the level of r = .753. The instrument had valid results when either Likert or dichotomous responses were obtained. © 2014 Wiley Periodicals, Inc.

  5. Exploring Formative Assessment as a Tool for Learning: Students' Experiences of Different Methods of Formative Assessment

    ERIC Educational Resources Information Center

    Weurlander, Maria; Soderberg, Magnus; Scheja, Max; Hult, Hakan; Wernerson, Annika

    2012-01-01

    This study aims to provide a greater insight into how formative assessments are experienced and understood by students. Two different formative assessment methods, an individual, written assessment and an oral group assessment, were components of a pathology course within a medical curriculum. In a cohort of 70 students, written accounts were…

  6. Formative Assessment Requires Artistic Vision

    ERIC Educational Resources Information Center

    Macintyre Latta, Margaret; Buck, Gayle; Beckenhauer, April

    2007-01-01

    This two-year study focused on the lived terms of inquiry in middle-school science classrooms. The conditions that enable teachers to see and act on science learning as ongoing inquiry were deliberately sought in Year 2. Nine science teachers participated in search of capacities connecting curriculum, teaching, and assessment for greater student…

  7. Formative Assessment: Assessment Is for Self-Regulated Learning

    ERIC Educational Resources Information Center

    Clark, Ian

    2012-01-01

    The article draws from 199 sources on assessment, learning, and motivation to present a detailed decomposition of the values, theories, and goals of formative assessment. This article will discuss the extent to which formative feedback actualizes and reinforces self-regulated learning (SRL) strategies among students. Theoreticians agree that SRL…

  8. Formative Assessment: Assessment Is for Self-Regulated Learning

    ERIC Educational Resources Information Center

    Clark, Ian

    2012-01-01

    The article draws from 199 sources on assessment, learning, and motivation to present a detailed decomposition of the values, theories, and goals of formative assessment. This article will discuss the extent to which formative feedback actualizes and reinforces self-regulated learning (SRL) strategies among students. Theoreticians agree that SRL…

  9. Formative Assessment at the Crossroads: Conformative, Deformative and Transformative Assessment

    ERIC Educational Resources Information Center

    Torrance, Harry

    2012-01-01

    The theory and practice of formative assessment seems to be at a crossroads, even an impasse. Different theoretical justifications for the development of formative assessment, and different empirical exemplifications, have been apparent for many years. Yet practice, while quite widespread, is often limited in terms of its scope and its utilisation…

  10. An item-level psychometric analysis of the personality assessment inventory: clinical scales in a psychiatric inpatient unit.

    PubMed

    Siefert, Caleb J; Sinclair, Samuel J; Kehl-Fie, Kendra A; Blais, Mark A

    2009-12-01

    Multi-item multiscale self-report measures are increasingly used in inpatient assessments. When considering a measure for this setting, it is important to evaluate the psychometric properties of the clinical scales and items to ensure that they are functioning as intended in a highly distressed clinical population. The present study examines scale properties for a self-report measure frequently employed in inpatient assessments, the Personality Assessment Inventory (PAI). In addition to examining internal consistency statistics, this study extends prior PAI research by considering key issues related to inpatient assessment (e.g., scale distinctiveness, ceiling effects). Coefficient alphas, interitem correlations, and item- scale relationships suggest that the PAI clinical scales and subscales are internally consistent. Items for respective clinical scales generally showed significantly higher item-scale correlations with their intended scale (as compared with their item-scale correlation with scales they were not intended to measure). In addition, scales' coefficient alpha scores were higher than their interscale correlations. Taken as a whole, these results support the hypothesis that PAI scales were measuring relatively distinct constructs in this inpatient sample. Findings are discussed with regard to the implications for scale interpretation in inpatient assessment, functioning of individual scales and subscales, and functioning of specific items. Limitations of the present study and directions for future research are discussed.

  11. Promoting proximal formative assessment with relational discourse

    NASA Astrophysics Data System (ADS)

    Scherr, Rachel E.; Close, Hunter G.; McKagan, Sarah B.

    2012-02-01

    The practice of proximal formative assessment - the continual, responsive attention to students' developing understanding as it is expressed in real time - depends on students' sharing their ideas with instructors and on teachers' attending to them. Rogerian psychology presents an account of the conditions under which proximal formative assessment may be promoted or inhibited: (1) Normal classroom conditions, characterized by evaluation and attention to learning targets, may present threats to students' sense of their own competence and value, causing them to conceal their ideas and reducing the potential for proximal formative assessment. (2) In contrast, discourse patterns characterized by positive anticipation and attention to learner ideas increase the potential for proximal formative assessment and promote self-directed learning. We present an analysis methodology based on these principles and demonstrate its utility for understanding episodes of university physics instruction.

  12. An Investigation of Explanation Multiple-Choice Items in Science Assessment

    ERIC Educational Resources Information Center

    Liu, Ou Lydia; Lee, Hee-Sun; Linn, Marcia C.

    2011-01-01

    Both multiple-choice and constructed-response items have known advantages and disadvantages in measuring scientific inquiry. In this article we explore the function of explanation multiple-choice (EMC) items and examine how EMC items differ from traditional multiple-choice and constructed-response items in measuring scientific reasoning. A group…

  13. Innovations in Measuring Rater Accuracy in Standard Setting: Assessing "Fit" to Item Characteristic Curves

    ERIC Educational Resources Information Center

    Hurtz, Gregory M.; Jones, J. Patrick

    2009-01-01

    Standard setting methods such as the Angoff method rely on judgments of item characteristics; item response theory empirically estimates item characteristics and displays them in item characteristic curves (ICCs). This study evaluated several indexes of rater fit to ICCs as a method for judging rater accuracy in their estimates of expected item…

  14. A Simulation Study of Methods for Assessing Differential Item Functioning in Computer-Adaptive Tests.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; And Others

    Simulated data were used to investigate the performance of modified versions of the Mantel-Haenszel and standardization methods of differential item functioning (DIF) analysis in computer-adaptive tests (CATs). Each "examinee" received 25 items out of a 75-item pool. A three-parameter logistic item response model was assumed, and…

  15. A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

    ERIC Educational Resources Information Center

    Masters, James S.

    2010-01-01

    With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…

  16. Cognitive Processing Requirements of Constructed Figural Response and Multiple-Choice Items in Architecture Assessment.

    ERIC Educational Resources Information Center

    Martinez, Michael E.; Katz, Irvin R.

    1996-01-01

    Item level differences between a type of constructed response item (figural response) and comparable multiple choice items in the domain of architecture were studied. Data from 120 architects and architecture students show that item level differences in difficulty correspond to differences in cognitive processing requirements and that relations…

  17. A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

    ERIC Educational Resources Information Center

    Masters, James S.

    2010-01-01

    With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…

  18. Assessing birth experience in fathers as an important aspect of clinical obstetrics: how applicable is Salmon's Item List for men?

    PubMed

    Gawlik, Stephanie; Müller, Mitho; Hoffmann, Lutz; Dienes, Aimée; Reck, Corinna

    2015-01-01

    validated questionnaire assessment of fathers' experiences during childbirth is lacking in routine clinical practice. Salmon's Item List is a short, validated method used for the assessment of birth experience in mothers in both English- and German-speaking communities. With little to no validated data available for fathers, this pilot study aimed to assess the applicability of the German version of Salmon's Item List, including a multidimensional birth experience concept, in fathers. longitudinal study. Data were collected by questionnaires. University hospital in Germany. the birth experiences of 102 fathers were assessed four to six weeks post partum using the German version of Salmon's Item List. construct validity testing with exploratory factor analysis using principal component analysis with varimax rotation was performed to identify the dimensions of childbirth experiences. Internal consistency was also analysed. factor analysis yielded a four-factor solution comprising 17 items that accounted for 54.5% of the variance. The main domain was 'fulfilment', and the secondary domains were 'emotional distress', 'physical discomfort' and 'emotional adaption'. For fulfilment, Cronbach's α met conventional reliability standards (0.87). Salmon's Item List is an appropriate instrument to assess birth experience in fathers in terms of fulfilment. Larger samples need to be examined in order to prove the stability of the factor structure before this can be extended to routine clinical assessment. a reduced version of Salmon's Item List may be useful as a screening tool for general assessment. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Alternate item types: continuing the quest for authentic testing.

    PubMed

    Wendt, Anne; Kenny, Lorraine E

    2009-03-01

    Many test developers suggest that multiple-choice items can be used to evaluate critical thinking if the items are focused on measuring higher order thinking ability. The literature supports the use of alternate item types to assess additional competencies, such as higher level cognitive processing and critical thinking, as well as ways to allow examinees to demonstrate their competencies differently. This research study surveyed nurses after taking a test composed of alternate item types paired with multiple-choice items. The participants were asked to provide opinions regarding the items and the item formats. Demographic information was asked. In addition, information was collected as the participants responded to the items. The results of this study reveal that the participants thought that, in general, the items were more authentic and allowed them to demonstrate their competence better than multiple-choice items did. Further investigation into the optimal blend of alternate items and multiple-choice items is needed.

  20. Assessment of the Assessment Tool: Analysis of Items in a Non-MCQ Mathematics Exam

    ERIC Educational Resources Information Center

    Khoshaim, Heba Bakr; Rashid, Saima

    2016-01-01

    Assessment is one of the vital steps in the teaching and learning process. The reported action research examines the effectiveness of an assessment process and inspects the validity of exam questions used for the assessment purpose. The instructors of a college-level mathematics course studied questions used in the final exams during the academic…

  1. Instruction and Learning through Formative Assessments

    ERIC Educational Resources Information Center

    Bossé, Michael J.; Lynch-Davis, Kathleen; Adu-Gyamfi, Kwaku; Chandler, Kayla

    2016-01-01

    Assessment and instruction are interwoven in mathematically rich formative assessment tasks, so employing these tasks in the classrooms is an exciting and time-efficient opportunity. To provide a window into how these tasks work in the classroom, this article analyzes summaries of student work on such a task and considers several students'…

  2. Formative Assessment Probes: Representing Microscopic Life

    ERIC Educational Resources Information Center

    Keeley, Page

    2011-01-01

    This column focuses on promoting learning through assessment. The author discusses the formative assessment probe "Pond Water," which reveals how elementary children will often apply what they know about animal structures to newly discovered microscopic organisms, connecting their knowledge of the familiar to the unfamiliar through…

  3. Formative Assessment: Possibilities, Boundaries and Limitations

    ERIC Educational Resources Information Center

    Elwood, Jannette

    2006-01-01

    This review essay describes in detail two recent publications: "Formative Assessment: Improving Learning in Secondary Classrooms" (Centre for Educational Research and Innovation [CERI], 2005, Paris, OECD) and "Towards Coherence Between Classroom Assessment and Accountability" ("The 103rd Yearbook of the National Society…

  4. Screencasts: Formative Assessment for Mathematical Thinking

    ERIC Educational Resources Information Center

    Soto, Melissa; Ambrose, Rebecca

    2016-01-01

    Increased attention to reasoning and justification in mathematics classrooms requires the use of more authentic assessment methods. Particularly important are tools that allow teachers and students opportunities to engage in formative assessment practices such as gathering data, interpreting understanding, and revising thinking or instruction.…

  5. Instruction and Learning through Formative Assessments

    ERIC Educational Resources Information Center

    Bossé, Michael J.; Lynch-Davis, Kathleen; Adu-Gyamfi, Kwaku; Chandler, Kayla

    2016-01-01

    Assessment and instruction are interwoven in mathematically rich formative assessment tasks, so employing these tasks in the classrooms is an exciting and time-efficient opportunity. To provide a window into how these tasks work in the classroom, this article analyzes summaries of student work on such a task and considers several students'…

  6. Formative Assessment in Mathematics for Engineering Students

    ERIC Educational Resources Information Center

    Ní Fhloinn, Eabhnat; Carr, Michael

    2017-01-01

    In this paper, we present a range of formative assessment types for engineering mathematics, including in-class exercises, homework, mock examination questions, table quizzes, presentations, critical analyses of statistical papers, peer-to-peer teaching, online assessments and electronic voting systems. We provide practical tips for the…

  7. Summative and Formative Assessment: Perceptions and Realities

    ERIC Educational Resources Information Center

    Taras, Maddalena

    2008-01-01

    Assessment is critically important to education both for accreditation and to support learning. Yet the literature dealing with formative and summative assessment definitions and terminology is not aligned. This article reports an empirical small-scale study of lecturers in Education at an English university. The research posits that these…

  8. Formative Assessment Probes: Representing Microscopic Life

    ERIC Educational Resources Information Center

    Keeley, Page

    2011-01-01

    This column focuses on promoting learning through assessment. The author discusses the formative assessment probe "Pond Water," which reveals how elementary children will often apply what they know about animal structures to newly discovered microscopic organisms, connecting their knowledge of the familiar to the unfamiliar through…

  9. A Comparison of Three Test Formats to Assess Word Difficulty

    ERIC Educational Resources Information Center

    Culligan, Brent

    2015-01-01

    This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…

  10. Are CAFAS subscales and item weights valid? A preliminary investigation of the Child and Adolescent Functional Assessment Scale.

    PubMed

    Bates, Michael P; Furlong, Michael J; Green, Jennifer Greif

    2006-11-01

    Presents a psychometric analysis of the Child and Adolescent Functional Assessment Scale (CAFAS), one of the most commonly used measures of functional impairment in youths with emotional and behavioral disorders. Specific aims of the current investigation were to (a) examine the conceptual organization of the CAFAS items, (b) explore its scaling properties, and (c) investigate its construct validity. In Phase 1, a group of advanced graduate students and clinicians rated CAFAS items with respect to the degree that they reflect the originally assigned subscales. In Phase 2, additional raters assigned severity values to the subset of CAFAS items selected from Phase 1. Items were then scaled using simplified successive intervals scaling techniques. Results show differences between new empirically derived item weights and the original scoring method. This investigation highlights the benefits of continued examination and critique of level-of-functioning scaling for diagnosis, treatment, and prognosis in children and adolescents.

  11. Developing an item bank and short forms that assess the impact of asthma on quality of life.

    PubMed

    Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

    2014-02-01

    The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Teachers' Self-Assessment of the Effects of Formative and Summative Electronic Portfolios on Professional Development

    ERIC Educational Resources Information Center

    Beck, Robert J.; Livne, Nava L.; Bear, Sharon L.

    2005-01-01

    This study compared the effects of four electronic portfolio curricula on pre-service and beginning teachers' self-ratings of their professional development (n =207), using a 34 item electronic Portfolio Assessment Scale (ePAS). Three formative portfolios, A, C and D, had teacher development as a primary objective and used participants' narrative…

  13. Teachers' Self-Assessment of the Effects of Formative and Summative Electronic Portfolios on Professional Development

    ERIC Educational Resources Information Center

    Beck, Robert J.; Livne, Nava L.; Bear, Sharon L.

    2005-01-01

    This study compared the effects of four electronic portfolio curricula on pre-service and beginning teachers' self-ratings of their professional development (n =207), using a 34 item electronic Portfolio Assessment Scale (ePAS). Three formative portfolios, A, C and D, had teacher development as a primary objective and used participants' narrative…

  14. What Do They Understand? Using Technology to Facilitate Formative Assessment

    ERIC Educational Resources Information Center

    Mitten, Carolyn; Jacobbe, Tim; Jacobbe, Elizabeth

    2017-01-01

    Formative assessment is so important to inform teachers' planning. A discussion of the benefits of using technology to facilitate formative assessment explains how four primary school teachers adopted three different apps to make their formative assessment more meaningful and useful.

  15. The WHO quality of life assessment instrument (WHOQOL-Bref): the importance of its items for cross-cultural research.

    PubMed

    Saxena, S; Carlson, D; Billington, R

    2001-01-01

    One of the fundamental issues in the area of assessment of quality of life is to determine what is important to the individuals' quality of life. This is even more crucial when the instrument is for use in diverse cultural settings. This paper reports on the importance ratings on WHOQOL-Bref items obtained as a part of WHOQOL pilot field trial on 4804 respondents from 15 centres from 14 developed and developing countries using 12 languages. All items were rated as moderately or more important, but this was expected because the items were selected by extensive qualitative research for their salience across the centres. Significant differences on mean importance ratings were found between centres, but rank orders of item for their importance showed highly significant correlations between centres. This was especially true for items in the top and the bottom thirds of the item list arranged by overall importance. Most items were rated as more important by women compared to men and by younger compared to older persons. The results are discussed for their relevance in cross-cultural research on quality of life assessment.

  16. Sex Differences in Item Functioning in the Comprehensive Inventory of Basic Skills-II Vocabulary Assessments

    ERIC Educational Resources Information Center

    French, Brian F.; Gotch, Chad M.

    2013-01-01

    The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…

  17. Missouri Assessment Program (MAP), Spring 1999: High School Communication Arts, Released Items, Grade 11.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in communication arts for 11th graders in Missouri public schools. The document contains the following items from the Test Booklet: "Two Words" (Isabel Allende) (Session 1, Items 5, 6, and 7); "Gumshoes Turn to Internet for Spadework" (Nicole Gaouette) (Session 1, Item 5); a writing prompt; and…

  18. Naive Versus Sophisticated Item-Writers for the Assessment of Anxiety.

    ERIC Educational Resources Information Center

    Sharpley, Christopher F.; Rogers, H. Jane

    1985-01-01

    Compared items from psychologically naive vs. psychologically sophisticated item-writers vs. a standardized test (N=552). Results showed that nonpsychologists with no formal definition of the construct they were to measure were able to write items that were as valid as those elicited from psychologists. (BH)

  19. Missouri Assessment Program (MAP), Spring 1999: High School Communication Arts, Released Items, Grade 11.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in communication arts for 11th graders in Missouri public schools. The document contains the following items from the Test Booklet: "Two Words" (Isabel Allende) (Session 1, Items 5, 6, and 7); "Gumshoes Turn to Internet for Spadework" (Nicole Gaouette) (Session 1, Item 5); a writing prompt; and…

  20. Missouri Assessment Program (MAP), Spring 2000: Intermediate Communication Arts, Released Items, Grade 7.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in intermediate communication arts for seventh graders in Missouri public schools. The document contains the following items from the Session 1 Test Booklet: "Swimming in Snow" (Diana C. Conway) (Items 1, 2, and 5); "Discovery" (Marion Dane Bauer) (Item 13); writing prompt; and a writer's…

  1. Sex Differences in Item Functioning in the Comprehensive Inventory of Basic Skills-II Vocabulary Assessments

    ERIC Educational Resources Information Center

    French, Brian F.; Gotch, Chad M.

    2013-01-01

    The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…

  2. The Impact of Varied Discrimination Parameters on Mixed-Format Item Response Theory Model Selection

    ERIC Educational Resources Information Center

    Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G.

    2013-01-01

    Whittaker, Chang, and Dodd compared the performance of model selection criteria when selecting among mixed-format IRT models and found that the criteria did not perform adequately when selecting the more parameterized models. It was suggested by M. S. Johnson that the problems when selecting the more parameterized models may be because of the low…

  3. The Impact of Varied Discrimination Parameters on Mixed-Format Item Response Theory Model Selection

    ERIC Educational Resources Information Center

    Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G.

    2013-01-01

    Whittaker, Chang, and Dodd compared the performance of model selection criteria when selecting among mixed-format IRT models and found that the criteria did not perform adequately when selecting the more parameterized models. It was suggested by M. S. Johnson that the problems when selecting the more parameterized models may be because of the low…

  4. TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

    ERIC Educational Resources Information Center

    Brese, Falk, Ed.

    2012-01-01

    The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…

  5. Assessing DSM-IV symptoms of panic attack in the general population: an item response analysis.

    PubMed

    Sunderland, Matthew; Hobbs, Megan J; Andrews, Gavin; Craske, Michelle G

    2012-12-20

    Unexpected panic attacks may represent a non-specific risk factor for future depression and anxiety disorders. The examination of panic symptoms and associated latent severity levels may lead to improvements in the identification, prevention, and treatment of panic attacks and subsequent psychopathology for 'at risk' individuals in the general population. The current study utilised item response theory to assess the DSM-IV symptoms of panic in relation to the latent severity level of the panic attack construct in a sample of 5913 respondents from the National Epidemiologic Survey on Alcohol and Related conditions. Additionally, differential item functioning (DIF) was assessed to determine if each symptom of panic targets the same level of latent severity between different sociodemographic groups (male/female, young/old). Symptoms indexing 'choking', 'fear of dying', and 'tingling/numbness' are some of the more severe symptoms of panic whilst 'heart racing', 'short of breath', 'tremble/shake', 'dizzy/faint', and 'perspire' are some of the least severe symptoms. Significant levels of DIF were detected in the 'perspire' symptom between males and females and the 'fear of dying' symptom between young and old respondents. The current study was limited to examining cross-sectional data from respondents who had experienced at least one panic attack across their lifetime. The findings of the current study provide additional information regarding panic symptoms in the general population that may enable researchers and clinicians to further refine the detection of 'at-risk' individuals who experience threshold and sub-threshold levels of panic. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. Development of a self-report physical function instrument for disability assessment: item pool construction and factor analysis.

    PubMed

    McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K

    2013-09-01

    To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  7. Development of a Self-Report Physical Function Instrument for Disability Assessment: Item Pool Construction and Factor Analysis

    PubMed Central

    McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.

    2014-01-01

    Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402

  8. Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

    NASA Astrophysics Data System (ADS)

    Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly

    2014-09-01

    A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.

  9. The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments.

    PubMed

    Tarrant, Marie; Knierim, Aimee; Hayes, Sasha K; Ware, James

    2006-12-01

    Multiple-choice questions are a common assessment method in nursing examinations. Few nurse educators, however, have formal preparation in constructing multiple-choice questions. Consequently, questions used in baccalaureate nursing assessments often contain item-writing flaws, or violations to accepted item-writing guidelines. In one nursing department, 2770 MCQs were collected from tests and examinations administered over a five-year period from 2001 to 2005. Questions were evaluated for 19 frequently occurring item-writing flaws, for cognitive level, for question source, and for the distribution of correct answers. Results show that almost half (46.2%) of the questions contained violations of item-writing guidelines and over 90% were written at low cognitive levels. Only a small proportion of questions were teacher generated (14.1%), while 36.2% were taken from testbanks and almost half (49.4%) had no source identified. MCQs written at a lower cognitive level were significantly more likely to contain item-writing flaws. While there was no relationship between the source of the question and item-writing flaws, teacher-generated questions were more likely to be written at higher cognitive levels (p<0.001). Correct answers were evenly distributed across all four options and no bias was noted in the placement of correct options. Further training in item-writing is recommended for all faculty members who are responsible for developing tests. Pre-test review and quality assessment is also recommended to reduce the occurrence of item-writing flaws and to improve the quality of test questions.

  10. The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments.

    PubMed

    Tarrant, Marie; Knierim, Aimee; Hayes, Sasha K; Ware, James

    2006-12-01

    Multiple-choice questions are a common assessment method in nursing examinations. Few nurse educators, however, have formal preparation in constructing multiple-choice questions. Consequently, questions used in baccalaureate nursing assessments often contain item-writing flaws, or violations to accepted item-writing guidelines. In one nursing department, 2770 MCQs were collected from tests and examinations administered over a five-year period from 2001 to 2005. Questions were evaluated for 19 frequently occurring item-writing flaws, for cognitive level, for question source, and for the distribution of correct answers. Results show that almost half (46.2%) of the questions contained violations of item-writing guidelines and over 90% were written at low cognitive levels. Only a small proportion of questions were teacher generated (14.1%), while 36.2% were taken from testbanks and almost half (49.4%) had no source identified. MCQs written at a lower cognitive level were significantly more likely to contain item-writing flaws. While there was no relationship between the source of the question and item-writing flaws, teachergenerated questions were more likely to be written at higher cognitive levels (p<0.001). Correct answers were evenly distributed across all four options and no bias was noted in the placement of correct options. Further training in item-writing is recommended for all faculty members who are responsible for developing tests. Pre-test review and quality assessment is also recommended to reduce the occurrence of item-writing flaws and to improve the quality of test questions.

  11. Data Collection Design for Equivalent Groups Equating: Using a Matrix Stratification Framework for Mixed-Format Assessment

    ERIC Educational Resources Information Center

    Mbella, Kinge Keka

    2012-01-01

    Mixed-format assessments are increasingly being used in large scale standardized assessments to measure a continuum of skills ranging from basic recall to higher order thinking skills. These assessments are usually comprised of a combination of (a) multiple-choice items which can be efficiently scored, have stable psychometric properties, and…

  12. Data Collection Design for Equivalent Groups Equating: Using a Matrix Stratification Framework for Mixed-Format Assessment

    ERIC Educational Resources Information Center

    Mbella, Kinge Keka

    2012-01-01

    Mixed-format assessments are increasingly being used in large scale standardized assessments to measure a continuum of skills ranging from basic recall to higher order thinking skills. These assessments are usually comprised of a combination of (a) multiple-choice items which can be efficiently scored, have stable psychometric properties, and…

  13. Online Formative Assessments with Social Network Awareness

    ERIC Educational Resources Information Center

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    Social network awareness (SNA) has been used extensively as one of the strategies to increase knowledge sharing and collaboration opportunities. However, most SNA studies either focus on being aware of peer's knowledge context or on social context. This work proposes online formative assessments with SNA, trying to address the problems of online…

  14. Formative Assessment Probes: To Hypothesize or Not

    ERIC Educational Resources Information Center

    Keeley, Page

    2010-01-01

    Formative assessment probes are used not only to uncover the ideas students bring to their learning, they can also be used to reveal teachers' common misconceptions. Consider a process widely used in inquiry science--developing hypotheses. In this article, the author features the probe "Is It a Hypothesis?", which serves as an example of how…

  15. Targeting Instruction with Formative Assessment Probes

    ERIC Educational Resources Information Center

    Fagan, Emily R.; Tobey, Cheryl Rose; Brodesky, Amy R.

    2016-01-01

    This article introduces the formative assessment probe--a powerful tool for collecting focused, actionable information about student thinking and potential misconceptions--along with a process for targeting instruction in response to probe results. Drawing on research about common student mathematical misconceptions as well as the former work of…

  16. Maximizing the Effective Use of Formative Assessments

    ERIC Educational Resources Information Center

    Riddell, Nancy B.

    2016-01-01

    In the current age of accountability, teachers must be able to produce tangible evidence of students' concept mastery. This article focuses on implementation of formative assessments before, during, and after instruction in order to maximize teachers' ability to effectively monitor student achievement. Suggested strategies are included to help…

  17. Formative Assessment Probes: To Hypothesize or Not

    ERIC Educational Resources Information Center

    Keeley, Page

    2010-01-01

    Formative assessment probes are used not only to uncover the ideas students bring to their learning, they can also be used to reveal teachers' common misconceptions. Consider a process widely used in inquiry science--developing hypotheses. In this article, the author features the probe "Is It a Hypothesis?", which serves as an example of how…

  18. Online Formative Assessments with Social Network Awareness

    ERIC Educational Resources Information Center

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    Social network awareness (SNA) has been used extensively as one of the strategies to increase knowledge sharing and collaboration opportunities. However, most SNA studies either focus on being aware of peer's knowledge context or on social context. This work proposes online formative assessments with SNA, trying to address the problems of online…

  19. Targeting Instruction with Formative Assessment Probes

    ERIC Educational Resources Information Center

    Fagan, Emily R.; Tobey, Cheryl Rose; Brodesky, Amy R.

    2016-01-01

    This article introduces the formative assessment probe--a powerful tool for collecting focused, actionable information about student thinking and potential misconceptions--along with a process for targeting instruction in response to probe results. Drawing on research about common student mathematical misconceptions as well as the former work of…

  20. Measuring Teaching Best Practice in the Induction Years: Development and Validation of an Item-Level Assessment

    ERIC Educational Resources Information Center

    Kingsley, Laurie; Romine, William

    2014-01-01

    Schools and teacher induction programs around the world routinely assess teaching best practice to inform accreditation, tenure/promotion, and professional development decisions. Routine assessment is also necessary to ensure that teachers entering the profession get the assistance they need to develop and succeed. We introduce the Item-Level…

  1. Teachers' Use of Test-Item Banks for Student Assessment in North Carolina Secondary Agricultural Education Programs

    ERIC Educational Resources Information Center

    Marshall, Joy Morgan

    2014-01-01

    Higher expectations are on all parties to ensure students successfully perform on standardized tests. Specifically in North Carolina agriculture classes, students are given a CTE Post Assessment to measure knowledge gained and proficiency. Prior to students taking the CTE Post Assessment, teachers have access to a test item bank system that…

  2. Teachers' Use of Test-Item Banks for Student Assessment in North Carolina Secondary Agricultural Education Programs

    ERIC Educational Resources Information Center

    Marshall, Joy Morgan

    2014-01-01

    Higher expectations are on all parties to ensure students successfully perform on standardized tests. Specifically in North Carolina agriculture classes, students are given a CTE Post Assessment to measure knowledge gained and proficiency. Prior to students taking the CTE Post Assessment, teachers have access to a test item bank system that…

  3. Developing Parallel Career and Occupational Development Objectives and Exercise (Test) Items in Spanish for Assessment and Evaluation.

    ERIC Educational Resources Information Center

    Muratti, Jose E.; And Others

    A parallel Spanish edition was developed of released objectives and objective-referenced items used in the National Assessment of Educational Progress (NAEP) in the field of Career and Occupational Development (COD). The Spanish edition was designed to assess the identical skills, attitudes, concepts, and knowledge of Spanish-dominant students…

  4. Developing Parallel Career and Occupational Development Objectives and Exercise (Test) Items in Spanish for Assessment and Evaluation.

    ERIC Educational Resources Information Center

    Muratti, Jose E.; And Others

    A parallel Spanish edition was developed of released objectives and objective-referenced items used in the National Assessment of Educational Progress (NAEP) in the field of Career and Occupational Development (COD). The Spanish edition was designed to assess the identical skills, attitudes, concepts, and knowledge of Spanish-dominant students…

  5. NAEP Validity Studies: Improving the Information Value of Performance Items in Large Scale Assessments. Working Paper No. 2003-08

    ERIC Educational Resources Information Center

    Pearson, P. David; Garavaglia, Diane R.

    2003-01-01

    The purpose of this essay is to explore both what is known and what needs to be learned about the information value of performance items "when they are used in large scale assessments." Within the context of the National Assessment of Educational Progress (NAEP), there is substantial motivation for answering these questions. Over the…

  6. A Historical Investigation into Item Formats of ACS Exams and Their Relationships to Science Practices

    ERIC Educational Resources Information Center

    Brandriet, Alexandra; Reed, Jessica J.; Holme, Thomas

    2015-01-01

    The release of the "NRC Framework for K-12 Science Education" and the "Next Generation Science Standards" has important implications for classroom teaching and assessment. Of particular interest is the implementation of science practices in the chemistry classroom, and the definitions established by the NRC makes these…

  7. Assessment formats in dental medicine: An overview

    PubMed Central

    Gerhard-Szep, Susanne; Güntsch, Arndt; Pospiech, Peter; Söhnel, Andreas; Scheutzel, Petra; Wassmann, Torsten; Zahn, Tugba

    2016-01-01

    Aim: At the annual meeting of German dentists in Frankfurt am Main in 2013, the Working Group for the Advancement of Dental Education (AKWLZ) initiated an interdisciplinary working group to address assessments in dental education. This paper presents an overview of the current work being done by this working group, some of whose members are also actively involved in the German Association for Medical Education's (GMA) working group for dental education. The aim is to present a summary of the current state of research on this topic for all those who participate in the design, administration and evaluation of university-specific assessments in dentistry. Method: Based on systematic literature research, the testing scenarios listed in the National Competency-based Catalogue of Learning Objectives (NKLZ) have been compiled and presented in tables according to assessment value. Results: Different assessment scenarios are described briefly in table form addressing validity (V), reliability (R), acceptance (A), cost (C), feasibility (F), and the influence on teaching and learning (EI) as presented in the current literature. Infoboxes were deliberately chosen to allow readers quick access to the information and to facilitate comparisons between the various assessment formats. Following each description is a list summarizing the uses in dental and medical education. Conclusion: This overview provides a summary of competency-based testing formats. It is meant to have a formative effect on dental and medical schools and provide support for developing workplace-based strategies in dental education for learning, teaching and testing in the future. PMID:27579365

  8. Assessment of item-writing flaws in multiple-choice questions.

    PubMed

    Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

    2013-01-01

    This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.

  9. Formative Assessment Probes: Big and Small Seeds. Linking Formative Assessment Probes to the Scientific Practices

    ERIC Educational Resources Information Center

    Keeley, Page

    2016-01-01

    This column focuses on promoting learning through assessment. Formative assessment probes are designed to uncover students' ideas about objects, events, and processes in the natural world. This assessment information is then used throughout instruction to move students toward an understanding of the scientific ideas behind the probes. During the…

  10. Formative Assessment Probes: Big and Small Seeds. Linking Formative Assessment Probes to the Scientific Practices

    ERIC Educational Resources Information Center

    Keeley, Page

    2016-01-01

    This column focuses on promoting learning through assessment. Formative assessment probes are designed to uncover students' ideas about objects, events, and processes in the natural world. This assessment information is then used throughout instruction to move students toward an understanding of the scientific ideas behind the probes. During the…

  11. For Which Boys and Which Girls Are Reading Assessment Items Biased Against? Detection of Differential Item Functioning in Heterogeneous Gender Populations

    ERIC Educational Resources Information Center

    Grover, Raman K.; Ercikan, Kadriye

    2017-01-01

    In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…

  12. The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

    PubMed

    Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

    2016-04-01

    The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed.

  13. A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; And Others

    1994-01-01

    Simulated data were used to investigate the performance of modified versions of the Mantel-Haenszel method of differential item functioning (DIF) analysis in computerized adaptive tests (CAT). Results indicate that CAT-based DIF procedures perform well and support the use of item response theory-based matching variables in DIF analysis. (SLD)

  14. Missouri Assessment Program (MAP), Spring 2000: High School Communication Arts, Released Items, Grade 11.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in communication arts for 11th graders in Missouri public schools. The document contains the following items from Session 1 in the Test Booklet: "Thomas Hart Benton: Champion of the American Scene" (Jan Greenberg and Sandra Jordan) (Items 5, 6, and 7); "Rhythms of the River" (Rebecca Christian)…

  15. An Application of Cognitive Diagnostic Assessment on TIMMS-2007 8th Grade Mathematics Items

    ERIC Educational Resources Information Center

    Toker, Turker; Green, Kathy

    2012-01-01

    The least squares distance method (LSDM) was used in a cognitive diagnostic analysis of TIMSS (Trends in International Mathematics and Science Study) items administered to 4,498 8th-grade students from seven geographical regions of Turkey, extending analysis of attributes from content to process and skill attributes. Logit item positions were…

  16. Missouri Assessment Program (MAP), Spring 1999: Intermediate Communication Arts, Released Items, Grade 7.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in intermediate communication arts for seventh graders in Missouri public schools. The document contains the following items from the Test Booklet: "Under the Rice Moon" (Rhiannon Puck); "Dogspirit" (Gary Paulsen) (Session 1, Items 4, 5, 6, and 8); a writing prompt; and a writer's checklist. It…

  17. Assessing Impact, DIF, and DFF in Accommodated Item Scores: A Comparison of Multilevel Measurement Model Parameterizations

    ERIC Educational Resources Information Center

    Beretvas, S. Natasha; Cawthon, Stephanie W.; Lockhart, L. Leland; Kaye, Alyssa D.

    2012-01-01

    This pedagogical article is intended to explain the similarities and differences between the parameterizations of two multilevel measurement model (MMM) frameworks. The conventional two-level MMM that includes item indicators and models item scores (Level 1) clustered within examinees (Level 2) and the two-level cross-classified MMM (in which item…

  18. Efficiently Assessing Negative Cognition in Depression: An Item Response Theory Analysis of the Dysfunctional Attitude Scale

    ERIC Educational Resources Information Center

    Beevers, Christopher G.; Strong, David R.; Meyer, Bjorn; Pilkonis, Paul A.; Miller, Ivan R.

    2007-01-01

    Despite a central role for dysfunctional attitudes in cognitive theories of depression and the widespread use of the Dysfunctional Attitude Scale, form A (DAS-A; A. Weissman, 1979), the psychometric development of the DAS-A has been relatively limited. The authors used nonparametric item response theory methods to examine the DAS-A items and…

  19. Efficiently Assessing Negative Cognition in Depression: An Item Response Theory Analysis of the Dysfunctional Attitude Scale

    ERIC Educational Resources Information Center

    Beevers, Christopher G.; Strong, David R.; Meyer, Bjorn; Pilkonis, Paul A.; Miller, Ivan R.

    2007-01-01

    Despite a central role for dysfunctional attitudes in cognitive theories of depression and the widespread use of the Dysfunctional Attitude Scale, form A (DAS-A; A. Weissman, 1979), the psychometric development of the DAS-A has been relatively limited. The authors used nonparametric item response theory methods to examine the DAS-A items and…

  20. Some Issues in Item Response Theory: Dimensionality Assessment and Models for Guessing

    ERIC Educational Resources Information Center

    Smith, Jessalyn

    2009-01-01

    Currently, standardized tests are widely used as a method to measure how well schools and students meet academic standards. As a result, measurement issues have become an increasingly popular topic of study. Unidimensional item response models are used to model latent abilities and specific item characteristics. This class of models makes…

  1. Cognitive assessment in stroke: feasibility and test properties using differing approaches to scoring of incomplete items.

    PubMed

    Lees, Rosalind A; Hendry Ba, Kirsty; Broomfield, Niall; Stott, David; Larner, Andrew J; Quinn, Terence J

    2017-10-01

    Cognitive screening is recommended in stroke, but test completion may be complicated by stroke related impairments. We described feasibility of completion of three commonly used cognitive screening tools and the effect on scoring properties when cognitive testing was entirely/partially incomplete. We performed a cross-sectional study, recruiting sequential stroke patient admissions from two University Hospital stroke rehabilitation services. We assessed Folstein's mini-mental state examination (MMSE), Montreal cognitive assessment (MoCA) and Addenbrooke's cognitive examination (ACE-III). The multidisciplinary team gave an independent diagnostic formulation. We recorded numbers fully/partially completing tests, assistance and time required for testing. We calculated test discrimination metrics in relation to clinical assessment using four differing statistical approaches to account for incomplete testing. We recruited 51 patients. Direct assistance to complete cognitive tests was required for 33 (63%). At traditional cut-offs, the majority screened "positive" for cognitive impairment (ACE-III: 98%; MoCA: 98%; MMSE: 81%). Comparing against a clinical diagnosis, ACE-III and MoCA had excellent sensitivity but poor specificity. Partial completion of cognitive tests was common (ACE-III: 14/51, MMSE: 22/51; MoCA: 20/51 fully complete); greatest non completion was for test items that required copying or drawing. Adapting analyses to account for these missing data gave differing results; MMSE sensitivity ranged from 0.66 to 0.85, and specificity ranged from 0.44 to 0.71 depending on the approach employed. For cognitive screening in stroke, even relatively brief tools are associated with substantial incompletion. The way these missing data are accounted for in analyses impacts on apparent test properties. When choosing a cognitive screening tool, feasibility should be considered and approaches to handling missing data made explicit. Copyright © 2016 John Wiley & Sons, Ltd

  2. Item Banking. Basic Testing Series.

    ERIC Educational Resources Information Center

    Childs, Roy

    This pamphlet describes the exciting potential of item banking--a new approach to testing which combines both comparability of scores with flexibility of test format. Item banks are collections of items where the characteristics of each item is known and these characteristics can be summated to described a test made from such items. The principle…

  3. Using Distractor-Driven Standards-Based Multiple-Choice Assessments and Rasch Modeling to Investigate Hierarchies of Chemistry Misconceptions and Detect Structural Problems with Individual Items

    ERIC Educational Resources Information Center

    Herrmann-Abell, Cari F.; DeBoer, George E.

    2011-01-01

    Distractor-driven multiple-choice assessment items and Rasch modeling were used as diagnostic tools to investigate students' understanding of middle school chemistry ideas. Ninety-one items were developed according to a procedure that ensured content alignment to the targeted standards and construct validity. The items were administered to 13360…

  4. Innovative Application of a Multidimensional Item Response Model in Assessing the Influence of Social Desirability on the Pseudo-Relationship between Self-Efficacy and Behavior

    ERIC Educational Resources Information Center

    Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.

    2006-01-01

    This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…

  5. Innovative Application of a Multidimensional Item Response Model in Assessing the Influence of Social Desirability on the Pseudo-Relationship between Self-Efficacy and Behavior

    ERIC Educational Resources Information Center

    Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.

    2006-01-01

    This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…

  6. Assessing Middle and High School Mathematics & Science: Differentiating Formative Assessment

    ERIC Educational Resources Information Center

    Waterman, Sheryn Spencer

    2010-01-01

    For middle and high school teachers of mathematics and science, this book is filled with examples of instructional strategies that address students' readiness levels, interests, and learning preferences. It shows teachers how to formatively assess their students by addressing differentiated learning targets. Included are detailed examples of…

  7. Using Data Mining to Predict K-12 Students' Performance on Large-Scale Assessment Items Related to Energy

    ERIC Educational Resources Information Center

    Liu, Xiufeng; Ruiz, Miguel E.

    2008-01-01

    This article reports a study on using data mining to predict K-12 students' competence levels on test items related to energy. Data sources are the 1995 Third International Mathematics and Science Study (TIMSS), 1999 TIMSS-Repeat, 2003 Trend in International Mathematics and Science Study (TIMSS), and the National Assessment of Educational…

  8. Using Data Mining to Predict K-12 Students' Performance on Large-Scale Assessment Items Related to Energy

    ERIC Educational Resources Information Center

    Liu, Xiufeng; Ruiz, Miguel E.

    2008-01-01

    This article reports a study on using data mining to predict K-12 students' competence levels on test items related to energy. Data sources are the 1995 Third International Mathematics and Science Study (TIMSS), 1999 TIMSS-Repeat, 2003 Trend in International Mathematics and Science Study (TIMSS), and the National Assessment of Educational…

  9. Development of an Item Bank for Assessing Generic Competences in a Higher-Education Institute: A Rasch Modelling Approach

    ERIC Educational Resources Information Center

    Xie, Qin; Zhong, Xiaoling; Wang, Wen-Chung; Lim, Cher Ping

    2014-01-01

    This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social…

  10. Development of an Item Bank for Assessing Generic Competences in a Higher-Education Institute: A Rasch Modelling Approach

    ERIC Educational Resources Information Center

    Xie, Qin; Zhong, Xiaoling; Wang, Wen-Chung; Lim, Cher Ping

    2014-01-01

    This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social…

  11. Examination of the Assumptions and Properties of the Graded Item Response Model: An Example Using a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    1995-01-01

    Over 5,000 students participated in a study of the dimensionality and stability of the item parameter estimates of a mathematics performance assessment developed for the Quantitative Understanding: Amplifying Student Achievement and Reasoning (QUASAR) Project. Results demonstrate the test's dimensionality and illustrate ways to examine use of the…

  12. Examination of the Assumptions and Properties of the Graded Item Response Model: An Example Using a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    1995-01-01

    Over 5,000 students participated in a study of the dimensionality and stability of the item parameter estimates of a mathematics performance assessment developed for the Quantitative Understanding: Amplifying Student Achievement and Reasoning (QUASAR) Project. Results demonstrate the test's dimensionality and illustrate ways to examine use of the…

  13. A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.

    2015-01-01

    Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…

  14. A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.

    2015-01-01

    Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…

  15. A new ten-item questionnaire for assessing sensitive skin: the Sensitive Scale-10.

    PubMed

    Misery, Laurent; Jean-Decoster, Catherine; Mery, Sophie; Georgescu, Victor; Sibaud, Vincent

    2014-11-01

    Sensitive skin is common but until now there has been no scale for measuring its severity. The Sensitive Scale is a new scale with a 14-item and a 10-item version that was tested in 11 countries in different languages on 2,966 participants. The aim of this study was to validate the pertinence of using the Sensitive Scale to measure the severity of sensitive skin. The internal consistency was high. Correlations with the dry skin type, higher age, female gender, fair phototypes and Dermatology Life Quality Index were found. Using the 10-item version appeared to be preferable because it was quicker and easier to complete, with the same internal consistency and the 4 items that were excluded were very rarely observed in patients. The mean initial scores were around 44/140 and 37/100. The use of a cream for sensitive skin showed the pertinence of the scale before and after treatment.

  16. Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures.

    PubMed

    Teresi, J A; Kleinman, M; Ocepek-Welikson, K

    Cognitive screening tests and items have been found to perform differently across groups that differ in terms of education, ethnicity and race. Despite the profound implications that such bias holds for studies in the epidemiology of dementia, little research has been conducted in this area. Using the methods of modern psychometric theory (in addition to those of classical test theory), we examined the performance of the Attention subscale of the Mattis Dementia Rating Scale. Several item response theory models, including the two- and three-parameter dichotomous response logistic model, as well as a polytomous response model were compared. (Log-likelihood ratio tests showed that the three-parameter model was not an improvement over the two-parameter model.) Data were collected as part of the ten-study National Institute on Aging Collaborative investigation of special dementia care in institutional settings. The subscale KR-20 estimate for this sample was 0.92. IRT model-based reliability estimates, provided at several points along the latent attribute, ranged from 0.65 to 0.97; the measure was least precise at the less disabled tail of the distribution. Most items performed in similar fashion across education groups; the item characteristic curves were almost identical, indicating little or no differential item functioning (DIF). However, four items were problematic. One item (digit span backwards) demonstrated a large error term in the confirmatory factor analysis; item-fit chi-square statistics developed using BIMAIN confirm this result for the IRT models. Further, the discrimination parameter for that item was low for all education subgroups. Generally, persons with the highest education had a greater probability of passing the item for most levels of theta. Model-based tests of DIF using MULTILOG identified three other items with significant, albeit small, DIF. One item, for example, showed non-uniform DIF in that at the impaired tail of the latent distribution

  17. Illustrating the Use of Nonparametric Regression To Assess Differential Item and Bundle Functioning among Multiple Groups.

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Bolt, Daniel M.

    2001-01-01

    Presents an overview of nonparametric regression as it allies to differential item functioning analysis and then provides three examples to illustrate how nonparametric regression can be applied to multilingual, multicultural data to study group differences. (SLD)

  18. PISA Test Items and School-Based Examinations in Greece: Exploring the relationship between global and local assessment discourses

    NASA Astrophysics Data System (ADS)

    Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

    2013-03-01

    The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek school-based biology examinations' test items in terms of the nature of their textual construction. This nature is determined by the interplay of the notions of classification (content specialisation) and formality (code specialisation) modulated by both the linguistic and the visual expressive modes. The results of the analysis reveal disparities between assessment discourses promoted at the global and the local level. In particular, while PISA test items convey their scientific message (specialised content and code) principally through their visual mode, the specialised scientific meaning of school-based examinations test is mainly conveyed through their linguistic mode. On the other hand, the linguistic mode of PISA test items is mainly compatible with textual practices of the public domain (non-specialised content and code). Such a mismatch between assessment discourses at local and global level is expected to place Greek students at different discursive positions, promoting different types of knowledge. The expected shift from the epistemic positioning promoted in Greece to the one promoted by PISA could significantly restrict Greek students' ability to infer the PISA discursive context and produce appropriate responses. This factor could provide a meaningful contribution in the discussion of the relatively low achievement of Greek students in PISA scientific literacy assessment.

  19. Impact of different scoring algorithms applied to multiple-mark survey items on outcome assessment: an in-field study on health-related knowledge.

    PubMed

    Domnich, A; Panatto, D; Arata, L; Bevilacqua, I; Apprato, L; Gasparini, R; Amicizia, D

    2015-01-01

    Health-related knowledge is often assessed through multiple-choice tests. Among the different types of formats, researchers may opt to use multiple-mark items, i.e. with more than one correct answer. Although multiple-mark items have long been used in the academic setting - sometimes with scant or inconclusive results - little is known about the implementation of this format in research on in-field health education and promotion. A study population of secondary school students completed a survey on nutrition-related knowledge, followed by a single- lecture intervention. Answers were scored by means of eight different scoring algorithms and analyzed from the perspective of classical test theory. The same survey was re-administered to a sample of the students in order to evaluate the short-term change in their knowledge. In all, 286 questionnaires were analyzed. Partial scoring algorithms displayed better psychometric characteristics than the dichotomous rule. In particular, the algorithm proposed by Ripkey and the balanced rule showed greater internal consistency and relative efficiency in scoring multiple-mark items. A penalizing algorithm in which the proportion of marked distracters was subtracted from that of marked correct answers was the only one that highlighted a significant difference in performance between natives and immigrants, probably owing to its slightly better discriminatory ability. This algorithm was also associated with the largest effect size in the pre-/post-intervention score change. The choice of an appropriate rule for scoring multiple- mark items in research on health education and promotion should consider not only the psychometric properties of single algorithms but also the study aims and outcomes, since scoring rules differ in terms of biasness, reliability, difficulty, sensitivity to guessing and discrimination.

  20. Review of Formative Assessment Use and Training in Africa

    ERIC Educational Resources Information Center

    Perry, Lindsey

    2013-01-01

    This literature review examines formative assessment education practices currently being utilized in Africa, as well as recent research regarding professional development on such assessments. Two main conclusions about formative assessment use and training, as well as a set of recommendations about teacher training on formative assessment, can be…

  1. Review of Formative Assessment Use and Training in Africa

    ERIC Educational Resources Information Center

    Perry, Lindsey

    2013-01-01

    This literature review examines formative assessment education practices currently being utilized in Africa, as well as recent research regarding professional development on such assessments. Two main conclusions about formative assessment use and training, as well as a set of recommendations about teacher training on formative assessment, can be…

  2. Development of the AGREE II, part 2: assessment of validity of items and tools to support application

    PubMed Central

    Brouwers, Melissa C.; Kho, Michelle E.; Browman, George P.; Burgers, Jako S.; Cluzeau, Françoise; Feder, Gene; Fervers, Béatrice; Graham, Ian D.; Hanna, Steven E.; Makarski, Julie

    2010-01-01

    Background We established a program of research to improve the development, reporting and evaluation of practice guidelines. We assessed the construct validity of the items and user’s manual in the β version of the AGREE II. Methods We designed guideline excerpts reflecting high-and low-quality guideline content for 21 of the 23 items in the tool. We designed two study packages so that one low-quality and one high-quality version of each item were randomly assigned to each package. We randomly assigned 30 participants to one of the two packages. Participants reviewed and rated the guideline content according to the instructions of the user’s manual and completed a survey assessing the manual. Results In all cases, content designed to be of high quality was rated higher than low-quality content; in 18 of 21 cases, the differences were significant (p < 0.05). The manual was rated by participants as appropriate, easy to use, and helpful in differentiating guidelines of varying quality, with all scores above the mid-point of the seven-point scale. Considerable feedback was offered on how the items and manual of the β-AGREE II could be improved. Interpretation The validity of the items was established and the user’s manual was rated as highly useful by users. We used these results and those of our study presented in part 1 to modify the items and user’s manual. We recommend AGREE II (available at www.agreetrust.org) as the revised standard for guideline development, reporting and evaluation. PMID:20513779

  3. A comparison of item response theory-based methods for examining differential item functioning in object naming test by language of assessment among older Latinos

    PubMed Central

    Yang, Frances M.; Heslin, Kevin C.; Mehta, Kala M.; Yang, Cheng-Wu; Ocepek-Welikson, Katja; Kleinman, Marjorie; Morales, Leo S.; Hays, Ron D.; Stewart, Anita L.; Mungas, Dan; Jones, Richard N.; Teresi, Jeanne A.

    2012-01-01

    Object naming tests are commonly included in neuropsychological test batteries. Differential item functioning (DIF) in these tests due to cultural and language differences may compromise the validity of cognitive measures in diverse populations. We evaluated 26 object naming items for DIF due to Spanish and English language translations among Latinos (n=1,159), mean age of 70.5 years old (Standard Deviation (SD)±7.2), using the following four item response theory-based approaches: Mplus/Multiple Indicator, Multiple Causes (Mplus/MIMIC; Muthén & Muthén, 1998–2011), Item Response Theory Likelihood Ratio Differential Item Functioning (IRTLRDIF/MULTILOG; Thissen, 1991, 2001), difwithpar/Parscale (Crane, Gibbons, Jolley, & van Belle, 2006; Muraki & Bock, 2003), and Differential Functioning of Items and Tests/MULTILOG (DFIT/MULTILOG; Flowers, Oshima, & Raju, 1999; Thissen, 1991). Overall, there was moderate to near perfect agreement across methods. Fourteen items were found to exhibit DIF and 5 items observed consistently across all methods, which were more likely to be answered correctly by individuals tested in Spanish after controlling for overall ability. PMID:23471423

  4. Assessment of English-French differential item functioning of the Satisfaction with Appearance Scale (SWAP) in systemic sclerosis.

    PubMed

    Jewett, Lisa R; Kwakkenbos, Linda; Hudson, Marie; Baron, Murray; Thombs, Brett D

    2017-09-01

    The Satisfaction with Appearance Scale (SWAP) has been used to assess body image distress among people with the rare and disfiguring disease systemic sclerosis (SSc); however, it has not been validated across different languages groups. The objective was to examine differential item functioning of the SWAP among 856 Canadian English- or French-speaking SSc patients. Confirmatory factor analysis was used to evaluate the SWAP two-factor structure (Dissatisfaction with Appearance and Social Discomfort). The Multiple-Indicator Multiple-Cause model was utilized to assess differential item functioning. Results revealed that the established two-factor model of the SWAP demonstrated relatively good fit. Statistically significant, but small-magnitude differential item functioning was found for three SWAP items based on language; however, the cumulative effect on SWAP scores was negligible. Findings provided empirical evidence that SWAP scores from Canadian English- and French-speaking patients can be compared and pooled without concern that measurement differences may substantially influence results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

    ERIC Educational Resources Information Center

    Kleinke, David J.

    Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…

  6. A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

    ERIC Educational Resources Information Center

    Yao, Lihua; Schwarz, Richard D.

    2006-01-01

    Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

  7. A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

    ERIC Educational Resources Information Center

    Yao, Lihua; Schwarz, Richard D.

    2006-01-01

    Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

  8. Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

    ERIC Educational Resources Information Center

    Kleinke, David J.

    Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…

  9. Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

    PubMed

    Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

    2017-05-25

    The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. Evolution of a Test Item

    ERIC Educational Resources Information Center

    Spaan, Mary

    2007-01-01

    This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…

  11. Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measures of physical functioning ability and general distress.

    PubMed

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Kleinman, Marjorie; Cook, Karon F; Crane, Paul K; Gibbons, Laura E; Morales, Leo S; Orlando-Edelen, Maria; Cella, David

    2007-01-01

    Methods based on item response theory (IRT) that can be used to examine differential item functioning (DIF) are illustrated. An IRT-based approach to the detection of DIF was applied to physical function and general distress item sets. DIF was examined with respect to gender, age and race. The method used for DIF detection was the item response theory log-likelihood ratio (IRTLR) approach. DIF magnitude was measured using the differences in the expected item scores, expressed as the unsigned probability differences, and calculated using the non-compensatory DIF index (NCDIF). Finally, impact was assessed using expected scale scores, expressed as group differences in the total test (measure) response functions. The example for the illustration of the methods came from a study of 1,714 patients with cancer or HIV/AIDS. The measure contained 23 items measuring physical functioning ability and 15 items addressing general distress, scored in the positive direction. The substantive findings were of relatively small magnitude DIF. In total, six items showed relatively larger magnitude (expected item score differences greater than the cutoff) of DIF with respect to physical function across the three comparisons: "trouble with a long walk" (race), "vigorous activities" (race, age), "bending, kneeling stooping" (age), "lifting or carrying groceries" (race), "limited in hobbies, leisure" (age), "lack of energy" (race). None of the general distress items evidenced high magnitude DIF; although "worrying about dying" showed some DIF with respect to both age and race, after adjustment. The fact that many physical function items showed DIF with respect to age, even after adjustment for multiple comparisons, indicates that the instrument may be performing differently for these groups. While the magnitude and impact of DIF at the item and scale level was minimal, caution should be exercised in the use of subsets of these items, as might occur with selection for clinical decisions or

  12. OASIS skin and wound integumentary assessment items: applying the WOCN guidance document.

    PubMed

    Baranoski, Sharon; Thimsen, Kathi

    2003-03-01

    This supplement provides home care nurses, therapists, and clinical managers a tool for understanding of how the Wound, Ostomy, and Continence Nurses (WOCN) Guidance Document on OASIS Skin and Wound Status M0 Items 2001 should be used. The supplement is a pictorial guide that clarifies definitions by linking them to photos using the skin integrity OASIS M0 items as an outline. The additional visual cues enable the clinician to clearly observe subtle characteristics that lead to a more reliable OASIS score and appropriate reimbursement by choosing the correct Home Health Resource Group (HHRG).

  13. A 14-Item Mediterranean Diet Assessment Tool and Obesity Indexes among High-Risk Subjects: The PREDIMED Trial

    PubMed Central

    Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

    2012-01-01

    Objective Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Design Cross-sectional assessment of all participants in the “PREvención con DIeta MEDiterránea” (PREDIMED) trial. Subjects 7,447 participants (55–80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Results Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were −0.0066 (95% confidence interval, –0.0088 to −0.0049) for women and –0.0059 (–0.0079 to –0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥10 points versus ≤7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. Conclusions A brief 14-item tool was able to capture a strong monotonic inverse association between

  14. A 14-item Mediterranean diet assessment tool and obesity indexes among high-risk subjects: the PREDIMED trial.

    PubMed

    Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

    2012-01-01

    Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Cross-sectional assessment of all participants in the "PREvención con DIeta MEDiterránea" (PREDIMED) trial. 7,447 participants (55-80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥ 3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were -0.0066 (95% confidence interval, -0.0088 to -0.0049) for women and -0.0059 (-0.0079 to -0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥ 10 points versus ≤ 7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. A brief 14-item tool was able to capture a strong monotonic inverse association between adherence to a good quality dietary pattern (Mediterranean diet

  15. Formative and Summative Assessment in Veterinary Pathology and Other Courses at a Mexican Veterinary College.

    PubMed

    Valero, Germán; Cárdenas, Paula

    2016-10-25

    The Faculty of Veterinary Medicine and Animal Science of the National Autonomous University of Mexico (UNAM) uses the Moodle learning management system for formative and summative computer assessment. The authors of this article-the teacher primarily responsible for Moodle implementation and a researcher who is a recent Moodle adopter-describe and discuss the students' and teachers' attitudes to summative and formative computer assessment in Moodle. Item analysis of quiz results helped us to identify and fix poorly performing questions, which greatly reduced student complaints and improved objective assessment. The use of Certainty-Based Marking (CBM) in formative assessment in veterinary pathology was well received by the students and should be extended to more courses. The importance of having proficient computer support personnel should not be underestimated. A properly translated language pack is essential for the use of Moodle in a language other than English.

  16. Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

    ERIC Educational Resources Information Center

    Sachse, Karoline A.; Haag, Nicole

    2017-01-01

    Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

  17. Construct and Differential Item Functioning in the Assessment of Prescription Opioid Use Disorders among American Adolescents

    ERIC Educational Resources Information Center

    Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.

    2009-01-01

    DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.

  18. An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.

    2014-01-01

    As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…

  19. Missouri Assessment Program, Spring 2001: Communication Arts, Released Items, Grade 11.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This document deals with testing in communication arts for 11th graders in Missouri public schools. The document contains a short poem "Signs for My Father, Who Stressed the Bunt" (David Bottoms) for students to read and gives four questions for students to answer (Items 15, 16, 17, and 18) in Session 1. It also provides scoring guides…

  20. Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

    ERIC Educational Resources Information Center

    Wells, Craig S.; Bolt, Daniel M.

    2008-01-01

    Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

  1. Missouri Assessment Program, Spring 2002: Social Studies, Grade 8. Released Items [and] Scoring Guide.

    ERIC Educational Resources Information Center

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This booklet contains sample items from the Missouri social studies test for eighth graders. The first sample is based on a speech delivered by Elizabeth Cady Stanton in the mid-1880s, which proposed a new approach to raising girls. Students are directed to use their own knowledge and the speech excerpt to do three activities. The second sample…

  2. Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

    ERIC Educational Resources Information Center

    Wells, Craig S.; Bolt, Daniel M.

    2008-01-01

    Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

  3. Two Prophecy Formulas for Assessing the Reliability of Item Response Theory-Based Ability Estimates

    ERIC Educational Resources Information Center

    Raju, Nambury S.; Oshima, T.C.

    2005-01-01

    Two new prophecy formulas for estimating item response theory (IRT)-based reliability of a shortened or lengthened test are proposed. Some of the relationships between the two formulas, one of which is identical to the well-known Spearman-Brown prophecy formula, are examined and illustrated. The major assumptions underlying these formulas are…

  4. Use of differential item functioning analysis to assess the equivalence of translations of a questionnaire.

    PubMed

    Petersen, Morten Aa; Groenvold, Mogens; Bjorner, Jakob B; Aaronson, Neil; Conroy, Thierry; Cull, Ann; Fayers, Peter; Hjermstad, Marianne; Sprangers, Mirjam; Sullivan, Marianne

    2003-06-01

    In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents representing 10 European languages we tested for DIF in the nine translations of the EORTC QLQ-C30 emotional function scale when compared to the original English version. We tested for DIF using two different methods in parallel, a contingency table method and logistic regression. The DIF results obtained with the two methods were similar. We found indications of DIF in seven of the nine translations. At least two of the DIF findings seem to reflect linguistic problems in the translation. 'Imperfect' translations can affect conclusions drawn from cross-national comparisons. Given that translations can never be identical to the original we discuss how findings of DIF can be interpreted and discuss the difference between linguistic DIF and DIF caused by confounding, cross-cultural differences, or DIF in other items in the scale. We conclude that testing for DIF is a useful way to validate questionnaire translations.

  5. Construct and Differential Item Functioning in the Assessment of Prescription Opioid Use Disorders among American Adolescents

    ERIC Educational Resources Information Center

    Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.

    2009-01-01

    DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.

  6. An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

    ERIC Educational Resources Information Center

    Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.

    2014-01-01

    As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…

  7. Student Perceptions of Formative Assessment in the Chemistry Classroom

    ERIC Educational Resources Information Center

    Haroldson, Rachelle Ann

    2012-01-01

    Research on formative assessment has focused on the ways teachers implement and use formative assessment to check student understanding in order to guide their instruction. This study shifted emphasis away from teachers to look at how students use and perceive formative assessment in the science classroom. Four key strategies of formative…

  8. Formative Assessment of Writing in English as a Foreign Language

    ERIC Educational Resources Information Center

    Burner, Tony

    2016-01-01

    Recognizing the importance of formative assessment, this mixed-methods study investigates how four teachers and 100 students respond to the new emphasis on formative assessment in English as a foreign language (EFL) writing classes in Norway. While previous studies have examined formative assessment in oral classroom interactions and focused on…

  9. Hitting the Reset Button: Using Formative Assessment to Guide Instruction

    ERIC Educational Resources Information Center

    Dirksen, Debra J.

    2011-01-01

    Using formative assessment gives students a second chance to learn material they didn't master the first time around. It lets failure become a learning experience rather than something to fear. Several types of formative assessment are discussed, including how to use summative assessments formatively. (Contains 2 figures.)

  10. Formative Assessment of Writing in English as a Foreign Language

    ERIC Educational Resources Information Center

    Burner, Tony

    2016-01-01

    Recognizing the importance of formative assessment, this mixed-methods study investigates how four teachers and 100 students respond to the new emphasis on formative assessment in English as a foreign language (EFL) writing classes in Norway. While previous studies have examined formative assessment in oral classroom interactions and focused on…

  11. Valuing a More Rigorous Review of Formative Assessment's Effectiveness

    ERIC Educational Resources Information Center

    Apthorp, Helen; Klute, Mary; Petrites, Tony; Harlacher, Jason; Real, Marianne

    2016-01-01

    Prior reviews of evidence for the impact of formative assessment on student achievement suggest widely different estimates of formative assessment's effectiveness, ranging from 0.40 and 0.70 standard deviations in one review. The purpose of this study is to describe variability in the effectiveness of formative assessment for promoting student…

  12. Formative Assessment Probes: Talk Moves. A Formative Assessment Strategy for Fostering Productive Probe Discussions

    ERIC Educational Resources Information Center

    Keeley, Page

    2016-01-01

    Formative assessment probes can be used to foster productive science discussions in which students make their thinking visible to themselves, their peers, and the teacher. During these discussions, there is an exchange between the teacher and students that encourages exploratory thinking, supports careful listening to others' ideas, asks for…

  13. Formative Assessment Probes: Talk Moves. A Formative Assessment Strategy for Fostering Productive Probe Discussions

    ERIC Educational Resources Information Center

    Keeley, Page

    2016-01-01

    Formative assessment probes can be used to foster productive science discussions in which students make their thinking visible to themselves, their peers, and the teacher. During these discussions, there is an exchange between the teacher and students that encourages exploratory thinking, supports careful listening to others' ideas, asks for…

  14. Formative Assessment Probes: Is It Melting? Formative Assessment for Teacher Learning

    ERIC Educational Resources Information Center

    Keeley, Page

    2013-01-01

    Formative assessment probes are effective tools for uncovering students' ideas about the various concepts they encounter when learning science. They are used to build a bridge from where the student is in his or her thinking to where he or she needs to be in order to construct and understand the scientific explanation for observed phenomena.…

  15. Formative Assessment Probes: Constructing Cl-Ev-R Explanations to Formative Assessment Probes

    ERIC Educational Resources Information Center

    Keeley, Page

    2015-01-01

    A distinguishing feature of all the formative assessment probes in the "Uncovering Student Ideas" series is that each probe has two parts: (1) a selected answer choice that usually mirrors the research on commonly held ideas students have about concepts or phenomena; and (2) an explanation that supports their answer choice. It is this…

  16. Negative affectivity and social inhibition in cardiovascular disease: evaluating type-D personality and its assessment using item response theory.

    PubMed

    Emons, Wilco H M; Meijer, Rob R; Denollet, Johan

    2007-07-01

    Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)-referred to as type-D personality-are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The objectives of this study were (a) to evaluate the relative contribution of individual items to the measurement precision at the cutoff to distinguish type-D from non-type-D personality and (b) to investigate the comparability of NA, SI, and type-D constructs across the general population and clinical populations. Data from representative samples including 1316 respondents from the general population, 427 respondents diagnosed with coronary heart disease, and 732 persons suffering from hypertension were analyzed using the graded response IRT model. In Study 1, the information functions obtained in the IRT analysis showed that (a) all items had highest measurement precision around the cutoff and (b) items are most informative at the higher end of the scale. In Study 2, the IRT analysis showed that measurements were fairly comparable across the general population and clinical populations. The DS14 adequately measures NA and SI, with highest reliability in the trait range around the cutoff. The DS14 is a valid instrument to assess and compare type-D personality across clinical groups.

  17. A Faculty Toolkit for Formative Assessment in Pharmacy Education.

    PubMed

    DiVall, Margarita V; Alston, Greg L; Bird, Eleanora; Buring, Shauna M; Kelley, Katherine A; Murphy, Nanci L; Schlesselman, Lauren S; Stowe, Cindy D; Szilagyi, Julianna E

    2014-11-15

    This paper aims to increase understanding and appreciation of formative assessment and its role in improving student outcomes and the instructional process, while educating faculty on formative techniques readily adaptable to various educational settings. Included are a definition of formative assessment and the distinction between formative and summative assessment. Various formative assessment strategies to evaluate student learning in classroom, laboratory, experiential, and interprofessional education settings are discussed. The role of reflective writing and portfolios, as well as the role of technology in formative assessment, are described. The paper also offers advice for formative assessment of faculty teaching. In conclusion, the authors emphasize the importance of creating a culture of assessment that embraces the concept of 360-degree assessment in both the development of a student's ability to demonstrate achievement of educational outcomes and a faculty member's ability to become an effective educator.

  18. A Faculty Toolkit for Formative Assessment in Pharmacy Education

    PubMed Central

    Alston, Greg L.; Bird, Eleanora; Buring, Shauna M.; Kelley, Katherine A.; Murphy, Nanci L.; Schlesselman, Lauren S.; Stowe, Cindy D.; Szilagyi, Julianna E.

    2014-01-01

    This paper aims to increase understanding and appreciation of formative assessment and its role in improving student outcomes and the instructional process, while educating faculty on formative techniques readily adaptable to various educational settings. Included are a definition of formative assessment and the distinction between formative and summative assessment. Various formative assessment strategies to evaluate student learning in classroom, laboratory, experiential, and interprofessional education settings are discussed. The role of reflective writing and portfolios, as well as the role of technology in formative assessment, are described. The paper also offers advice for formative assessment of faculty teaching. In conclusion, the authors emphasize the importance of creating a culture of assessment that embraces the concept of 360-degree assessment in both the development of a student’s ability to demonstrate achievement of educational outcomes and a faculty member’s ability to become an effective educator. PMID:26056399

  19. Innovative learning: employing medical students to write formative assessments.

    PubMed

    Chamberlain, Suzanne; Freeman, Adrian; Oldham, James; Sanders, David; Hudson, Nicky; Ricketts, Chris

    2006-11-01

    Peninsula Medical School, UK, employed six students to write MCQ items for a formative applied medical knowledge item bank. The students successfully generated 260 quality MCQs in their six-week contracted period. Informal feedback from students and two staff mentors suggests that the exercise provided a very effective learning environment and that students felt they were 'being paid to learn'. Further research is under way to track the progress of the students involved in the exercise, and to formally evaluate the impact on learning.

  20. Assessing the Feasibility of a Test Item Bank and Assessment Clearinghouse: Strategies to Measure Technical Skill Attainment of Career and Technical Education Participants

    ERIC Educational Resources Information Center

    Derner, Seth; Klein, Steve; Hilber, Don

    2008-01-01

    This report documents strategies that can be used to initiate development of a technical skill test item bank and/or assessment clearinghouse and quantifies the cost of creating and maintaining such a system. It is intended to inform state administrators on the potential uses and benefits of system participation, test developers on the needs and…

  1. The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

    PubMed

    Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander

    2014-04-01

    An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.

  2. Science Teachers' Use of a Concept Map Marking Guide as a Formative Assessment Tool for the Concept of Energy

    ERIC Educational Resources Information Center

    Won, Mihye; Krabbe, Heiko; Ley, Siv Ling; Treagust, David F.; Fischer, Hans E.

    2017-01-01

    In this study, we investigated the value of a concept map marking guide as an alternative formative assessment tool for science teachers to adopt for the topic of energy. Eight high school science teachers marked students' concept maps using an itemized holistic marking guide. Their marking was compared with the researchers' marking and the scores…

  3. Assessment of single-item literacy questions, age, and education level in the prediction of low health numeracy.

    PubMed

    Johnson, Tim V; Abbasi, Ammara; Kleris, Renee S; Ehrlich, Samantha S; Barthwaite, Echo; DeLong, Jennifer; Master, Viraj A

    2013-08-01

    Determining a patient's health literacy is important to optimum patient care. Single-item questions exist for screening written health literacy. We sought to assess the predictive potential of three common screening questions, along with patient age and education level, in the prediction of low health numerical literacy (numeracy). After demographic and educational information was obtained, 441 patients were administered three health literacy screening questions. The three-item Schwartz-Woloshin Numeracy Scale was then administered to assess for low health numeracy (score of 0 out of 3). This score served as the reference standard for Receiver Operating Characteristics (ROC) curve analysis. ROC curves were constructed and used to determine the area under the curve (AUC); a higher AUC suggests increased statistical significance. None of the three screening questions were significant predictors of low health numeracy. However, education level was a significant predictor of low health numeracy, with an AUC (95% CI) of 0.811 (0.720-0.902). This measure had a specificity of 95.3% at the cutoff of 12 years of education (<12 versus > or = 12 years of education) but was non-sensitive. Common single-item questions used to screen for written health literacy are ineffective screening tools for health numeracy. However, low education level is a specific predictor of low health numeracy.

  4. Evaluating the reliability of assessing home-packed food items using digital photographs and dietary log sheets.

    PubMed

    Gauthier, Alain P; Jaunzarins, Bridget T; MacDougall, Sarah-Jane; Laurence, Michelle; Kabaroff, J Lynn; Godwin, Alison A; Dorman, Sandra C

    2013-01-01

    To assess the reliability of manual data entry for home-packed food items by using digital photographs and dietary log sheets. Data from 60 lunches were entered by researcher A and B independently. Researcher B re-entered researcher A's items within 1 week. Researcher B then re-entered her items 4 weeks from the initial entry point. The inter-rater reliability intraclass correlation coefficient (ICC) was 0.83 for total kilocalories and ranged from 0.75-0.87 for macronutrients. The intra-rater reliability ICC was 0.92 for total kcal and ranged from 0.90-0.92 for macronutrients. The inter-rater ICCs for the 5 selected micronutrients ranged from 0.33-0.83, whereas the intra-rater ICCs for these micronutrients ranged from 0.65-0.98. This method of data entry is feasible and its reliability is promising for macronutrient investigations. Continued assessment of this method for investigations related to micronutrient content is recommended. Copyright © 2013 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.

  5. Written formative assessment and silence in the classroom

    NASA Astrophysics Data System (ADS)

    Lee Hang, Desmond Mene; Bell, Beverley

    2015-09-01

    In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the account of assessment practices in the Chinese culture given by Yin and Buck. Secondly, we document the cultural practice of silence in Samoan classroom's which has lead to the use of written formative assessment as in the Yin and Buck article. We also discuss the use of written formative assessment as a scaffold for teacher development for formative assessment. Finally, we briefly discuss both studies on formative assessment as a sociocultural practice.

  6. Academic staff perspectives of formative assessment in nurse education.

    PubMed

    Koh, Lai Chan

    2010-07-01

    High quality formative assessment has been linked to positive benefits on learning while good feedback can make a considerable difference to the quality of learning. It is proposed that formative assessment and feedback is intricately linked to enhancement of learning and has to be interactive. Underlying this proposition is the recognition of the importance of staff perspectives of formative assessment and their influence on assessment practice. However, there appears to be a paucity of literature exploring this area relevant to nurse education. The aim of the research was to explore the perspectives of twenty teachers of nurse education on formative assessment and feedback of theoretical assessment. A qualitative approach using semi-structured interviews was adopted. The interview data were analysed and the following themes identified: purposes of formative assessment, involvement of peers in the assessment process, ambivalence of timing of assessment, types of formative assessment and quality of good feedback. The findings offer suggestions which may be of value to teachers facilitating formative assessment. The conclusion is that teachers require changes to the practice of formative assessment and feedback by believing that learning is central to the purposes of formative assessment and regarding students as partners in this process.

  7. Development of a Simple 12-Item Theory-Based Instrument to Assess the Impact of Continuing Professional Development on Clinical Behavioral Intentions

    PubMed Central

    Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

    2014-01-01

    Background Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Methods and Findings Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. Conclusion A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral

  8. Implication of formative assessment practices among mathematics teacher

    NASA Astrophysics Data System (ADS)

    Samah, Mas Norbany binti Abu; Tajudin, Nor'ain binti Mohd

    2017-05-01

    Formative assessment of school-based assessment (SBA) is implemented in schools as a move to improve the National Education Assessment System (NEAS). Formative assessment focuses on assessment for learning. There are various types of formative assessment instruments used by teachers of mathematics, namely the form of observation, questioning protocols, worksheets and quizzes. This study aims to help teachers improve skills in formative assessments during the teaching and learning (t&l) Mathematics. One mathematics teacher had been chosen as the study participants. The collecting data using document analysis, observation and interviews. Data were analyzed narrative and assessments can help teachers implement PBS. Formative assessment is conducted to improve the skills of students in t&l effectively.

  9. Behavioral Health Needs Assessment Survey (BHNAS): Overview of Survey Items and Measures

    DTIC Science & Technology

    2013-02-12

    BHNAS Measures 2 Abstract Preserving the psychological health of U.S. military service members and their families is of paramount concern to...Ruggiero, Del Ben, Scotti, & Rabalais, 2003; Weathers et al., 1993). High test-retest reliability has been reported at .96 for 2 –3 days and .88 for 1 week...past 2 weeks, the BHNAS uses a time frame of 4 weeks. Also, the wording of one of the original PHQ-9 items was modified slightly for use on the BHNAS

  10. Improving Content Assessment for English Language Learners: Studies of the Linguistic Modification of Test Items. Research Report. ETS RR-14-23

    ERIC Educational Resources Information Center

    Young, John W.; King, Teresa C.; Hauck, Maurice Cogan; Ginsburgh, Mitchell; Kotloff, Lauren; Cabrera, Julio; Cavalie, Carlos

    2014-01-01

    This article describes two research studies conducted on the linguistic modification of test items from K-12 content assessments. In the first study, 120 linguistically modified test items in mathematics and science taken by fourth and sixth graders were found to have a wide range of outcomes for English language learners (ELLs) and non-ELLs, with…

  11. Reducing the Item Number to Obtain Same-Length Self-Assessment Scales: A Systematic Approach Using Result of Graphical Loglinear Rasch Modeling

    ERIC Educational Resources Information Center

    Nielsen, Tine; Kreiner, Svend

    2011-01-01

    The Revised Danish Learning Styles Inventory (R-D-LSI) (Nielsen 2005), which is an adaptation of Sternberg-Wagner Thinking Styles Inventory (Sternberg, 1997), comprises 14 subscales, each measuring a separate learning style. Of these 14 subscales, 9 are eight items long and 5 are seven items long. For self-assessment, self-scoring and…

  12. A Comparison of Methods for Estimating Conditional Item Score Differences in Differential Item Functioning (DIF) Assessments. Research Report. ETS RR-10-15

    ERIC Educational Resources Information Center

    Moses, Tim; Miao, Jing; Dorans, Neil

    2010-01-01

    This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…

  13. Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

    PubMed

    Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

    2017-09-16

    This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.

  14. Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment.

    PubMed

    Fayers, Peter M

    2007-01-01

    We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.

  15. Written Formative Assessment and Silence in the Classroom

    ERIC Educational Resources Information Center

    Lee Hang, Desmond Mene; Bell, Beverley

    2015-01-01

    In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the…

  16. Written Formative Assessment and Silence in the Classroom

    ERIC Educational Resources Information Center

    Lee Hang, Desmond Mene; Bell, Beverley

    2015-01-01

    In this commentary, we build on Xinying Yin and Gayle Buck's discussion by exploring the cultural practices which are integral to formative assessment, when it is viewed as a sociocultural practice. First we discuss the role of assessment and in particular oral and written formative assessments in both western and Samoan cultures, building on the…

  17. Making Room for Formative Assessment Processes: A Multiple Case Study

    ERIC Educational Resources Information Center

    McEntarffer, Robert E.

    2012-01-01

    This qualitative instrumental multiple case study (Stake, 2005) explored how teachers made room for formative assessment processes in their classrooms, and how thinking about assessment changed during those formative assessment experiences. Data were gathered from six teachers over three months and included teacher interviews, student interviews,…

  18. Investigating an Invariant Item Ordering for Polytomously Scored Items

    ERIC Educational Resources Information Center

    Ligtvoet, Rudy; van der Ark, L. Andries; te Marvelde, Janneke M.; Sijtsma, Klaas

    2010-01-01

    This article discusses the concept of an invariant item ordering (IIO) for polytomously scored items and proposes methods for investigating an IIO in real test data. Method manifest IIO is proposed for assessing whether item response functions intersect. Coefficient H[superscript T] is defined for polytomously scored items. Given that an IIO…

  19. The School Age Gender Gap in Reading Achievement: Examining the Influences of Item Format and Intrinsic Reading Motivation

    ERIC Educational Resources Information Center

    Schwabe, Franziska; McElvany, Nele; Trendtel, Matthias

    2015-01-01

    The importance of reading competence for both individuals and society underlines the strong need to understand the gender gap in reading achievement. Beyond mean differences in reading comprehension, research has indicated that girls possess specific advantages on constructed-response items compared with boys of the same reading ability. Moreover,…

  20. [Assessment of criminal responsibility in paraphilic disorder. Can the severity of the disorder be assessed with items of standardized prognostic instruments?].

    PubMed

    Briken, P; Müller, J L

    2014-03-01

    Assessment of the severity of paraphilic disorders is an important aspect of psychiatric court reports for assessing criminal responsibility and placement in a forensic psychiatric hospital according to the German penal code (§§ 20, 21, 63 StGB). The minimum requirements for appraisal of criminal responsibility published by an interdisciplinary working group under the guidance of the German Federal Court of Justice define the standards for this procedure. This paper presents a research concept that aims to assess the severity of paraphilic disorders by using items of standardized prognostic instruments. In addition to a formal diagnosis according to the international classification of diseases (ICD) and the diagnostic and statistical manual of mental diseases (DSM) criteria, the items "deviant sexual interests" and "sexual preoccupations" from the prognosis instrument Stable 2007 are used to assess the severity of paraphilic disorders. Other criteria, such as "relationship deficits" are used to support the appraisal of the severity of the disorder. The items "sexual preoccupation", "emotional collapse" and "collapse of social support" from the prognosis instrument Acute 2007 are used to assess the capacity for self-control. In a next step the validity and reliability of this concept will be tested.

  1. Single-item vs multiple-item measures of stage of change in compliance with prescribed medications.

    PubMed

    Cook, Christopher L; Perri, Matthew

    2004-02-01

    The Stage of Change construct from the Transtheoretical Model of behavioral change has been widely utilized in the assessment of various health behaviors. The majority of these tests measure the Stage of Change construct using the single-item. multiple-choice format. This study validated the use of a single-item measure in measuring readiness to comply with taking a prescribed medication. A sample of 161 subjects tested the multiple-item Stage of Change measure, then a refined multiple-item survey was tested with 59 subjects. With the latter survey, discriminating subjects at the differing stages of change dimensions was difficult. A correlation of .91 was found for stage classifications between ratings on the single-item and multiple-item scales. The use of the single-item measure seems reasonable when assessing stage of change in compliance with prescribed medication.

  2. Development and calibration of an item bank for the assessment of activities of daily living in cardiovascular patients using Rasch analysis

    PubMed Central

    2013-01-01

    Background To develop and calibrate the activities of daily living item bank (ADLib-cardio) as a prerequisite for a Computer-adaptive test (CAT) for the assessment of ADL in patients with cardiovascular diseases (CVD). Methods After pre-testing for relevance and comprehension a pool of 181 ADL items were answered on a five-point Likert scale by 720 CVD patients, who were recruited in fourteen German cardiac rehabilitation centers. To verify that the relationship between the items is due to one factor, a confirmatory factor analysis (CFA) was conducted. A Mokken analysis was computed to examine the double monotonicity (i.e. every item generates an equivalent order of person traits, and every person generates an equivalent order of item difficulties). Finally, a Rasch analysis based on the partial credit model was conducted to test for unidimensionality and to calibrate the item bank. Results Results of CFA and Mokken analysis confirmed a one factor structure and double monotonicity. In Rasch analysis, merging response categories and removing items with misfit, differential item functioning or local response dependency reduced the ADLib-cardio to 33 items. The ADLib-cardio fitted to the Rasch model with a nonsignificant item-trait interaction (chi-square=105.42, df=99; p=0.31). Person-separation reliability was 0.81 and unidimensionality could be verified. Conclusions The ADLib-cardio is the first calibrated, unidimensional item bank that allows for the assessment of ADL in rehabilitation patients with CVD. As such, it provides the basis for the development of a CAT for the assessment of ADL in patients with cardiovascular diseases. Calibrating the ADLib-cardio in other than rehabilitation cardiovascular patient settings would further increase its generalizability. PMID:23914735

  3. Assessment and treatment of problem behavior maintained by escape from attention and access to tangible items.

    PubMed

    Hagopian, L P; Wilson, D M; Wilder, D A

    2001-01-01

    The results obtained from two consecutive functional analyses conducted with a 6-year-old child with autism are described. In the initial functional analysis, the highest rates of problem behavior occurred in the play condition. In that condition, the delivery of attention appeared to occasion problem behaviors. A second functional analysis was conducted wherein an escape from attention condition and a tangible condition were added. In the second functional analysis, higher rates of responding were observed in the escape from attention and tangible conditions. The results suggested that problem behavior was maintained by negative reinforcement in the form of escape from attention and positive reinforcement in the form of gaining access to preferred tangible items. Problem behavior was treated using functional communication training combined with noncontingent reinforcement.

  4. Improving psychometric assessment of the Beck Depression Inventory using multidimensional item response theory.

    PubMed

    Fragoso, Tiago M; Cúri, Mariana

    2013-07-01

    We studied the latent factor structure of the Beck Depression Inventory (BDI) under the light of Multidimensional Item Response Theory models. Under a Bayesian Markov chain Monte Carlo setting, we chose the most adequate model, estimated its parameters and verified its fit to the data. An evaluation of the inventory in terms of the assumed dimensions seems to agree with previous investigations in the factor structure of the BDI present in the literature. Cognitive and somatic-affective latent traits were identified in the analysis making possible the interpretation of symptom evolution along these dimensions, in terms of probability of their appearance. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

    ERIC Educational Resources Information Center

    Saß, Steffani; Schütte, Kerstin

    2016-01-01

    Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…

  6. Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

    ERIC Educational Resources Information Center

    Saß, Steffani; Schütte, Kerstin

    2016-01-01

    Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…

  7. An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog).

    PubMed

    Dowling, N Maritza; Bolt, Daniel M; Deng, Sien

    2016-12-01

    When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  8. Writing better test items.

    PubMed

    Aucoin, Julia W

    2005-01-01

    Professional development specialists have had little opportunity to learn how to write test items to meet the expectations of today's graduate nurse. Schools of nursing have moved away from knowledge-level test items and have had to develop more application and analysis items to prepare graduates for the National Council Licensure Examination (NCLEX). This same type of question can be used effectively to support a competence assessment system and document critical thinking skills.

  9. Formative Assessment Probes: Mountaintop Fossil: A Puzzling Phenomenon

    ERIC Educational Resources Information Center

    Keeley, Page

    2015-01-01

    This column focuses on promoting learning through assessment. This month's issue describes using formative assessment probes to uncover several ways of thinking about the puzzling discovery of a marine fossil on top of a mountain.

  10. Formative Assessment Probes: Is It Erosion or Weathering?

    ERIC Educational Resources Information Center

    Keeley, Page

    2016-01-01

    This column focuses on promoting learning through assessment. The formative assessment probe in this month's issue can be used as an initial elicitation before students are introduced to the formal concepts of weathering and erosion.

  11. Formative Assessment Probes: Mountaintop Fossil: A Puzzling Phenomenon

    ERIC Educational Resources Information Center

    Keeley, Page

    2015-01-01

    This column focuses on promoting learning through assessment. This month's issue describes using formative assessment probes to uncover several ways of thinking about the puzzling discovery of a marine fossil on top of a mountain.

  12. Formative Assessment Probes: Is It Erosion or Weathering?

    ERIC Educational Resources Information Center

    Keeley, Page

    2016-01-01

    This column focuses on promoting learning through assessment. The formative assessment probe in this month's issue can be used as an initial elicitation before students are introduced to the formal concepts of weathering and erosion.

  13. Motivating student learning using a formative assessment journey

    PubMed Central

    Evans, Darrell J R; Zeun, Paul; Stanier, Robert A

    2014-01-01

    Providing formative assessment opportunities has been recognised as a significant benefit to student learning. The outcome of any formative assessment should be one that ultimately helps improve student learning through familiarising students with the levels of learning required, informing them about gaps in their learning and providing feedback to guide the direction of learning. This article provides an example of how formative assessments can be developed into a formative assessment journey where a number of different assessments can be offered to students during the course of a module of teaching, thus utilising a spaced-education approach. As well as incorporating the specific drivers of formative assessment, we demonstrate how approaches deemed to be stimulating, interactive and entertaining with the aim of maximising enthusiasm and engagement can be incorporated. We provide an example of a mixed approach to evaluating elements of the assessment journey that focuses student reaction, appraisal of qualitative and quantitative feedback from student questionnaires, focus group analysis and teacher observations. Whilst it is not possible to determine a quantifiable effect of the assessment journey on student learning, usage data and student feedback shows that formative assessment can achieve high engagement and positive response to different assessments. Those assessments incorporating an active learning element and a quiz-based approach appear to be particularly popular. A spaced-education format encourages a building block approach to learning that is continuous in nature rather than focussed on an intense period of study prior to summative examinations. PMID:24111930

  14. Motivating student learning using a formative assessment journey.

    PubMed

    Evans, Darrell J R; Zeun, Paul; Stanier, Robert A

    2014-03-01

    Providing formative assessment opportunities has been recognised as a significant benefit to student learning. The outcome of any formative assessment should be one that ultimately helps improve student learning through familiarising students with the levels of learning required, informing them about gaps in their learning and providing feedback to guide the direction of learning. This article provides an example of how formative assessments can be developed into a formative assessment journey where a number of different assessments can be offered to students during the course of a module of teaching, thus utilising a spaced-education approach. As well as incorporating the specific drivers of formative assessment, we demonstrate how approaches deemed to be stimulating, interactive and entertaining with the aim of maximising enthusiasm and engagement can be incorporated. We provide an example of a mixed approach to evaluating elements of the assessment journey that focuses student reaction, appraisal of qualitative and quantitative feedback from student questionnaires, focus group analysis and teacher observations. Whilst it is not possible to determine a quantifiable effect of the assessment journey on student learning, usage data and student feedback shows that formative assessment can achieve high engagement and positive response to different assessments. Those assessments incorporating an active learning element and a quiz-based approach appear to be particularly popular. A spaced-education format encourages a building block approach to learning that is continuous in nature rather than focussed on an intense period of study prior to summative examinations.

  15. Psychometric Properties and Responsiveness to Change of 15- and 28-Item Versions of the SCORE: A Family Assessment Questionnaire.

    PubMed

    Hamilton, Elena; Carr, Alan; Cahill, Paul; Cassells, Ciara; Hartnett, Dan

    2015-09-01

    The SCORE (Systemic Clinical Outcome and Routine Evaluation) is a 40-item questionnaire for completion by family members 12 years and older to assess outcome in systemic therapy. This study aimed to investigate psychometric properties of two short versions of the SCORE and their responsiveness to therapeutic change. Data were collected at 19 centers from 701 families at baseline and from 433 of these 3-5 months later. Results confirmed the three-factor structure (strengths, difficulties, and communication) of the 15- and 28-item versions of the SCORE. Both instruments had good internal consistency and test-retest reliability. They also showed construct and criterion validity, correlating with measures of parent, child, and family adjustment, and discriminating between clinical and nonclinical cases. Total and factor scales of the SCORE-15 and -28 were responsive to change over 3-5 months of therapy. The SCORE-15 and SCORE-28 are brief psychometrically robust family assessment instruments which may be used to evaluate systemic therapy.

  16. Use of formative assessment as an educational tool.

    PubMed

    Jain, Vaishali; Agrawal, Vandana; Biswas, Shubho

    2012-01-01

    Though formative assessments are popular in medical education, but data to establish their educational benefits are lacking. This study was conducted to determine whether participation and performance of MBBS students in regular formative assessments are associated with positive outcomes and has measurable effects on learning. One hundred and fifty MBBS students of semester II attending Biochemistry classes were studied by dividing into two groups till the completion of a topic. End-of-topic summative assessment marks were analysed with respect to the effect of participation and performance in formative assessments. Participation in formative assessments had a statistically significant positive relationship with summative assessment marks. Mean difference in formative and summative assessment marks for group that participated in formative assessments is 1.6 (95% CI = 0.9-2.4, p < 0.001). The mean difference in summative assessment marks for two groups is 3.4 (95% CI = 2.3-4.6, p < 0.001). The mean difference in marks obtained by solving case studies given in Summative Assessment for two groups is 1.2 (95% CI = 0.7-1.6, p < 0.001). Formative assessment not only assesses students' achievements but it also enables students to recognise the areas in which they are having difficulty and to concentrate their future efforts on those areas. Adequate frequency of formative assessment with immediate feedback is beneficial as it stimulates meaningful and multifaceted learning. The results of this study encourage the use of formative assessment as an educational tool in all MBBS subjects for they have significant positive effects on learning.

  17. Virginia Standards of Learning Assessments. Grade 8 Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that approximately 79,000 students in grade 8 were required to answer as part of the SOL assessments. These…

  18. Virginia Standards of Learning Assessments. Grade 3 Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) Assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that approximately 83,000 students in grade 3 were required to answer as part of the SOL assessments. These…

  19. Virginia Standards of Learning Assessments. Grade 5 Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that approximately 80,000 students in grade 5 were required to answer as part of the SOL assessments. These…

  20. Exploring Plausible Causes of Differential Item Functioning in the PISA Science Assessment: Language, Curriculum or Culture

    ERIC Educational Resources Information Center

    Huang, Xiaoting; Wilson, Mark; Wang, Lei

    2016-01-01

    In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the…

  1. Multilevel Item Response Modeling: Applications to Large-Scale Assessment of Academic Achievement

    ERIC Educational Resources Information Center

    Zheng, Xiaohui

    2009-01-01

    The call for standards-based reform and educational accountability has led to increased attention to large-scale assessments. Over the past two decades, large-scale assessments have been providing policymakers and educators with timely information about student learning and achievement to facilitate their decisions regarding schools, teachers and…

  2. Virginia Standards of Learning Assessments. End of Course Released Test Items, 1998.

    ERIC Educational Resources Information Center

    Virginia State Dept.of Education, Richmond. Div. of Assessment and Reporting.

    Beginning in Spring 1998, Virginia students participated in the Standards of Learning (SOL) assessments designed to test student knowledge of the content and skills specified in the state's standards. This document contains questions that students were required to answer as part of the SOL End-of-Course assessments. These questions are…

  3. Helping Teachers Interpret Item-Level Data: The New Hampshire Statewide Assessment.

    ERIC Educational Resources Information Center

    Cook, Nancy R.; Smith, Robert A.

    New Hampshire has adopted a standards-based statewide assessment, the New Hampshire Educational Improvement and Assessment Program (NHEIAP), which is designed to measure students' learning against proficiency standards at grades 3, 6, and 10. Because of the difficulty teachers had in interpreting the NHEIAP results, a custom-designed software…

  4. Common Core State Standards Benchmark Assessments: Item Alignment to the Shifts in Tennessee

    ERIC Educational Resources Information Center

    Stugart, Melissa

    2016-01-01

    Our nation is in the midst of one of the largest education reforms in decades centered on the adoption of the Common Core State Standards (CCSS) and aligned assessments. In an era of rising accountability measures and declining literacy proficiency, it is vital to ensure that educational resources, such as benchmark assessments, are appropriately…

  5. Multilevel Item Response Modeling: Applications to Large-Scale Assessment of Academic Achievement

    ERIC Educational Resources Information Center

    Zheng, Xiaohui

    2009-01-01

    The call for standards-based reform and educational accountability has led to increased attention to large-scale assessments. Over the past two decades, large-scale assessments have been providing policymakers and educators with timely information about student learning and achievement to facilitate their decisions regarding schools, teachers and…

  6. Exploring Plausible Causes of Differential Item Functioning in the PISA Science Assessment: Language, Curriculum or Culture

    ERIC Educational Resources Information Center

    Huang, Xiaoting; Wilson, Mark; Wang, Lei

    2016-01-01

    In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the…

  7. Category Scoring Techniques from National Assessment: Applications to Free Response Items from Career and Occupational Development.

    ERIC Educational Resources Information Center

    Phillips, Donald L.

    The Career and Occupational Development (COD) assessment of the National Assessment of Educational Progress (NAEP) was made up of about 70 percent free response exercises requiring hand scoring. This paper describes the techniques used in developing the "scoring guides" for these exercises and summarizes the results of two empirical…

  8. Category Scoring Techniques from National Assessment: Applications to Free Response Items from Career and Occupational Development.

    ERIC Educational Resources Information Center

    Phillips, Donald L.

    The Career and Occupational Development (COD) assessment of the National Assessment of Educational Progress (NAEP) was made up of about 70 percent free response exercises requiring hand scoring. This paper describes the techniques used in developing the "scoring guides" for these exercises and summarizes the results of two empirical…

  9. A Critical Item Analysis of the QABF: Development of a Short Form Assessment Instrument

    ERIC Educational Resources Information Center

    Singh, Ashvind N.; Matson, Johnny L.; Mouttapa, Michelle; Pella, Russell D.; Hill, B. D.; Thorson, Ryan

    2009-01-01

    Due to the relative inability of individuals with intellectual disabilities (ID) to provide an accurate and reliable self-report, assessment in this population is more difficult than with individuals in the general population. As a result, assessment procedures must be adjusted to compensate for the relative lack of information that the individual…

  10. Common Core State Standards Benchmark Assessments: Item Alignment to the Shifts in Tennessee

    ERIC Educational Resources Information Center

    Stugart, Melissa

    2016-01-01

    Our nation is in the midst of one of the largest education reforms in decades centered on the adoption of the Common Core State Standards (CCSS) and aligned assessments. In an era of rising accountability measures and declining literacy proficiency, it is vital to ensure that educational resources, such as benchmark assessments, are appropriately…

  11. Formative Assessment in the Visual Arts

    ERIC Educational Resources Information Center

    Andrade, Heidi; Hefferen, Joanna; Palma, Maria

    2014-01-01

    Classroom assessment is a hot topic in K-12 education because of compelling evidence that assessment in the form of feedback is a powerful teaching and learning tool (Hattie & Timperley, 2007). Although formal evaluation has been anathema to many art specialists and teachers (Colwell, 2004), informal assessment in the form of feedback is not.…

  12. Formative Assessment in the Visual Arts

    ERIC Educational Resources Information Center

    Andrade, Heidi; Hefferen, Joanna; Palma, Maria

    2014-01-01

    Classroom assessment is a hot topic in K-12 education because of compelling evidence that assessment in the form of feedback is a powerful teaching and learning tool (Hattie & Timperley, 2007). Although formal evaluation has been anathema to many art specialists and teachers (Colwell, 2004), informal assessment in the form of feedback is not.…

  13. Construct Validity in Formative Assessment: Purpose and Practices

    ERIC Educational Resources Information Center

    Rix, Samantha

    2012-01-01

    This paper examines the utilization of construct validity in formative assessment for classroom-based purposes. Construct validity pertains to the notion that interpretations are made by educators who analyze test scores during formative assessment. The purpose of this paper is to note the challenges that educators face when interpreting these…

  14. Formative Assessment Jump-Starts a Middle Grades Differentiation Initiative

    ERIC Educational Resources Information Center

    Doubet, Kristina J.

    2012-01-01

    A rural middle level school had stalled in its third year of a district-wide differentiation initiative. This article describes the way teachers and the leadership team engaged in collaborative practices to put a spotlight on formative assessment. Teachers learned to systematically gather formative assessment data from their students and to use…

  15. Connected Classroom Technology Facilitates Multiple Components of Formative Assessment Practice

    ERIC Educational Resources Information Center

    Shirley, Melissa L.; Irving, Karen E.

    2015-01-01

    Formative assessment has been demonstrated to result in increased student achievement across a variety of educational contexts. When using formative assessment strategies, teachers engage students in instructional tasks that allow the teacher to uncover levels of student understanding so that the teacher may change instruction accordingly. Tools…

  16. Formative Assessment: Improvement, Immediacy and the Edge for Learning

    ERIC Educational Resources Information Center

    Staunton, Mike; Dann, Chris

    2016-01-01

    Formative assessment is about strengthening student learning and can dramatically improve student achievement when it guides changes in day-to-day classroom practice. Any attempt to understand formative assessment must therefore be grounded in a notion of learning, which this paper approaches from a constructivist/experiential perspective.…

  17. Formative Assessment and Teachers' Sensitivity to Student Responses

    ERIC Educational Resources Information Center

    Haug, Berit S.; Ødegaard, Marianne

    2015-01-01

    Formative assessment, and especially feedback, is considered essential to student learning. To provide effective feedback, however, teachers must act upon the information that students reveal during instruction. In this study, we apply a framework of formative assessment to explore how sensitive teachers are to students' thoughts and ideas when…

  18. The Relationship between Formative Assessment and Teachers' Self-Efficacy

    ERIC Educational Resources Information Center

    Eufemia, Francine

    2012-01-01

    This exploratory study sought to examine the relationship between teachers' use of formative assessment and their self-efficacy beliefs. Specifically, this study involved a quantitative analysis of the relationship between teachers' beliefs, knowledge base, and the use of formative assessment to make informed instructional changes and their…

  19. Revisiting the Impact of Formative Assessment Opportunities on Student Learning

    ERIC Educational Resources Information Center

    Peat, Mary; Franklin, Sue; Devlin, Marcia; Charles, Margaret

    2005-01-01

    This project developed as a result of some inconclusive data from an investigation of whether a relationship existed between the use of formative assessment opportunities and performance, as measured by final grade. We were expecting to show our colleagues and students that use of formative assessment resources had the potential to improve…

  20. Revisiting the Impact of Formative Assessment Opportunities on Student Learning

    ERIC Educational Resources Information Center

    Peat, Mary; Franklin, Sue; Devlin, Marcia; Charles, Margaret

    2005-01-01

    This project developed as a result of some inconclusive data from an investigation of whether a relationship existed between the use of formative assessment opportunities and performance, as measured by final grade. We were expecting to show our colleagues and students that use of formative assessment resources had the potential to improve…

  1. Connected Classroom Technology Facilitates Multiple Components of Formative Assessment Practice

    ERIC Educational Resources Information Center

    Shirley, Melissa L.; Irving, Karen E.

    2015-01-01

    Formative assessment has been demonstrated to result in increased student achievement across a variety of educational contexts. When using formative assessment strategies, teachers engage students in instructional tasks that allow the teacher to uncover levels of student understanding so that the teacher may change instruction accordingly. Tools…

  2. Leading Formative Assessment Change: A 3-Phase Approach

    ERIC Educational Resources Information Center

    Northwest Evaluation Association, 2016

    2016-01-01

    If you are seeking greater student engagement and growth, you need to integrate high-impact formative assessment practices into daily instruction. Read the final article in our five-part series to find advice aimed at leaders determined to bring classroom formative assessment practices district wide. Learn: (1) what you MUST consider when…

  3. Formative Assessment in Adult Language, Literacy and Numeracy

    ERIC Educational Resources Information Center

    Looney, Janet

    2007-01-01

    The paper reports on a multi-country review of formative assessment in adult education settings with specific reference to basic literacy and numeracy. It reports many examples of emergent innovative practice in Europe and elsewhere but concludes that there is a real need to develop a stronger conceptual base for formative assessment, as well as…

  4. The 4-Item Negative Symptom Assessment (NSA-4) Instrument: A Simple Tool for Evaluating Negative Symptoms in Schizophrenia Following Brief Training.

    PubMed

    Alphs, Larry; Morlock, Robert; Coon, Cheryl; van Willigenburg, Arjen; Panagides, John

    2010-07-01

    Objective. To assess the ability of mental health professionals to use the 4-item Negative Symptom Assessment instrument, derived from the Negative Symptom Assessment-16, to rapidly determine the severity of negative symptoms of schizophrenia.Design. Open participation.Setting. Medical education conferences.Participants. Attendees at two international psychiatry conferences.Measurements. Participants read a brief set of the 4-item Negative Symptom Assessment instructions and viewed a videotape of a patient with schizophrenia. Using the 1 to 6 4-item Negative Symptom Assessment severity rating scale, they rated four negative symptom items and the overall global negative symptoms. These ratings were compared with a consensus rating determination using frequency distributions and Chi-square tests for the proportion of participant ratings that were within one point of the expert rating.Results. More than 400 medical professionals (293 physicians, 50% with a European practice, and 55% who reported past utilization of schizophrenia ratings scales) participated. Between 82.1 and 91.1 percent of the 4-items and the global rating determinations by the participants were within one rating point of the consensus expert ratings. The differences between the percentage of participant rating scores that were within one point versus the percentage that were greater than one point different from those by the consensus experts was significant (p<0.0001). Participants rating of negative symptoms using the 4-item Negative Symptom Assessment did not generally differ among the geographic regions of practice, the professional credentialing, or their familiarity with the use of schizophrenia symptom rating instruments.Conclusion. These findings suggest that clinicians from a variety of geographic practices can, after brief training, use the 4-item Negative Symptom Assessment effectively to rapidly assess negative symptoms in patients with schizophrenia.

  5. e-GovQual: A Multiple-Item Scale for Assessing e-Government Service Quality

    ERIC Educational Resources Information Center

    Papadomichelaki, Xenia; Mentzas, Gregoris

    2012-01-01

    A critical element in the evolution of governmental services through the internet is the development of sites that better serve the citizens' needs. To deliver superior service quality, we must first understand how citizens perceive and evaluate online. Citizen assessment is built on defining quality, identifying underlying dimensions, and…

  6. Psychometrical Assessment and Item Analysis of the General Health Questionnaire in Victims of Terrorism

    ERIC Educational Resources Information Center

    Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

    2013-01-01

    There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and…

  7. Assessing the Dimensionality of Item Response Matrices with Small Sample Sizes and Short Test Lengths.

    ERIC Educational Resources Information Center

    De Champlain, Andre; Gessaroli, Marc E.

    1998-01-01

    Type I error rates and rejection rates for three-dimensionality assessment procedures were studied with data sets simulated to reflect short tests and small samples. Results show that the G-squared difference test (D. Bock, R. Gibbons, and E. Muraki, 1988) suffered from a severely inflated Type I error rate at all conditions simulated. (SLD)

  8. Psychometrical Assessment and Item Analysis of the General Health Questionnaire in Victims of Terrorism

    ERIC Educational Resources Information Center

    Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

    2013-01-01

    There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and…

  9. Assessing Dimensionality of a Set of Items--Comparison of Different Approaches.

    ERIC Educational Resources Information Center

    Nandakumar, Ratna

    The performance of the following four methodologies for assessing unidimensionality was examined: (1) DIMTEST; (2) the approach of P. W. Holland and P. R. Rosenbaum; (3) linear factor analysis; and (4) non-linear factor analysis. Each method is examined and compared with other methods using simulated data sets and real data sets. Seven data sets,…

  10. Assessing Dimensionality of a Set of Item Responses--Comparison of Different Approaches.

    ERIC Educational Resources Information Center

    Nandakumar, Ratna

    1994-01-01

    Using simulated and real data, this study compares the performance of three methodologies for assessing unidimensionality: (1) DIMTEST; (2) the approach of Holland and Rosenbaum; and (3) nonlinear factor analysis. All three models correctly confirm unidimensionality, but they differ in their ability to detect the lack of unidimensionality.…

  11. Assessing Dimensionality of a Set of Items--Comparison of Different Approaches.

    ERIC Educational Resources Information Center

    Nandakumar, Ratna

    Performance in assessing the unidimensionality of tests was examined for four methods: (1) W. F. Stout's procedure (1987); (2) the approach of P. W. Holland and P. R. Rosenbaum (1986); (3) linear factor analysis; and (4) non-linear factor analysis. Each method was examined and compared with the others using simulated and real test data. Seven data…

  12. e-GovQual: A Multiple-Item Scale for Assessing e-Government Service Quality

    ERIC Educational Resources Information Center

    Papadomichelaki, Xenia; Mentzas, Gregoris

    2012-01-01

    A critical element in the evolution of governmental services through the internet is the development of sites that better serve the citizens' needs. To deliver superior service quality, we must first understand how citizens perceive and evaluate online. Citizen assessment is built on defining quality, identifying underlying dimensions, and…

  13. Assessing the Dimensionality of Item Response Matrices with Small Sample Sizes and Short Test Lengths.

    ERIC Educational Resources Information Center

    De Champlain, Andre; Gessaroli, Marc E.

    1998-01-01

    Type I error rates and rejection rates for three-dimensionality assessment procedures were studied with data sets simulated to reflect short tests and small samples. Results show that the G-squared difference test (D. Bock, R. Gibbons, and E. Muraki, 1988) suffered from a severely inflated Type I error rate at all conditions simulated. (SLD)

  14. Test Item Construction and Validation: Developing a Statewide Assessment for Agricultural Science Education

    ERIC Educational Resources Information Center

    Rivera, Jennifer E.

    2011-01-01

    The State of New York Agriculture Science Education secondary program is required to have a certification exam for students to assess their agriculture science education experience as a Regent's requirement towards graduation. This paper focuses on the procedure used to develop and validate two content sub-test questions within a…

  15. Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

    ERIC Educational Resources Information Center

    Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

    2016-01-01

    The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…

  16. Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

    ERIC Educational Resources Information Center

    Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

    2016-01-01

    The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…

  17. Summative Assessment: The Missing Link for Formative Assessment

    ERIC Educational Resources Information Center

    Taras, Maddalena

    2009-01-01

    Assessment for learning is increasingly part of accepted orthodoxy, with massive government funding in England, is central to national assessment in Wales, and an export to the USA. Black et al.'s Assessment for learning: Putting it into practice (2003), the "bible"' of assessment for learning, is set reading for trainee teachers across…

  18. U.S. Naval Unit Behavioral Health Needs Assessment Survey, Overview of Survey Items and Measures

    DTIC Science & Technology

    2014-05-20

    stress, coping behaviors , alcohol use, and sleep . The scores for each issue were trichotomized by risk level, as green, yellow, or orange/red. The...military efforts. The Naval Unit Behavioral Health Needs Assessment Survey (NUBHNAS) will undertake the surveillance of Navy and Marine Corps personnel in... Behavioral health issues, including depression and posttraumatic stress disorder (PTSD), are an ongoing problem for U.S. military forces. Rates of diagnosed

  19. Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior.

    PubMed

    Tassé, Marc J; Schalock, Robert L; Thissen, David; Balboni, Giulia; Bersani, Henry Hank; Borthwick-Duffy, Sharon A; Spreat, Scott; Widaman, Keith F; Zhang, Dalun; Navas, Patricia

    2016-03-01

    The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT modeling and a nationally representative standardization sample, the item set was reduced to 75 items that provide the most precise adaptive behavior information at the cutoff area determining the presence or not of significant adaptive behavior deficits across conceptual, social, and practical skills. The standardization of the DABS is described and discussed.

  20. Does Computer-Aided Formative Assessment Improve Learning Outcomes?

    ERIC Educational Resources Information Center

    Hannah, John; James, Alex; Williams, Phillipa

    2014-01-01

    Two first-year engineering mathematics courses used computer-aided assessment (CAA) to provide students with opportunities for formative assessment via a series of weekly quizzes. Most students used the assessment until they achieved very high (>90%) quiz scores. Although there is a positive correlation between these quiz marks and the final…