Marder, M.; Bansal, D.
We apply visualization and modeling methods for convective and diffusive flows to public school mathematics test scores from Texas. We obtain plots that show the most likely future and past scores of students, the effects of random processes such as guessing, and the rate at which students appear in and disappear from schools. We show that student outcomes depend strongly upon economic class, and identify the grade levels where flows of different groups diverge most strongly. Changing the effectiveness of instruction in one grade naturally leads to strongly nonlinear effects on student outcomes in subsequent grades.
Padilla, Raymond; Richards, Michael
To address a public relations problem faced by a large urban public school district in Texas, we conducted action research that resulted in an audited self-monitoring system for high-stakes testing environments. The system monitors violations of testing protocols while identifying and disseminating best practices to improve the education of…
For decades the No. 2 pencil and bubble sheet have ruled the student assessment process. The time has finally come to move all of those important tests online. High-stakes computer-based testing has been around for more than 10 years, with some states eagerly embracing it and others avoiding it like whooping cough. But the advent of national…
Considers the guide on high-stakes testing issued by the federal Office for Civil Rights, including the controversy which ensued upon release of the first draft, changes in the subsequent version, and the issue of differences in educational achievement among ethnic and racial groups of which differences in standardized test scores may be…
The nursing community is troubled by the growing use of standardized exit examinations as a graduation requirement. After years of preparation, a single test score could prevent a student from graduating or taking a licensing examination. The tremendous importance placed on exit examinations qualifies them as "high-stakes testing," a concept not well studied in nursing education. This concept analysis provides a greater understanding of the term to help operationalize its use in the nursing discipline.
Kardong-Edgren, Suzan; Mulcock, Pamela M
The Angoff method is a commonly used and legally defensible method for setting passing or cut scores for high-stakes examinations. It also can be used for setting passing scores on clinical skill checklists. Two variations of the Angoff method were compared with a traditional and arbitrary 75% passing score, using a Foley catheter insertion checklist as an exemplar. Both Angoff methods produced slightly lower scores than our traditional scoring; because of "must pass" steps on our checklist, 12 of 13 students still failed the evaluation. The project uncovered multiple variations of checklists within different courses and variations in teaching practices for this skill.
High-stakes testing is not a new phenomenon in education. It has become part of the education system in many countries. These tests affect the school systems, teachers, students, politicians and parents, whether that is in a positive or negative sense. High-stakes testing is associated with concepts such as a school's accountability, funding and…
The effects of high stakes testing may be critical in the lives of public school students and may have many consequences for schools and teachers. There are no easy answers in measuring student achievement and in holding teachers accountable for learner progress. High stakes testing also involves responsibilities on the part of the principal who…
Educational stakeholders are aware that school administration has become an incredibly intricate dynamic that is too complex for principals to handle alone. Test-driven accountability has made the already daunting task of school administration even more challenging. Distributed leadership presents an opportunity to explore increased leadership…
Baker, Melissa; Johnston, Pattie
High-stakes testing plays a critical role in education today in the United States. Every state uses a high-stakes test to comply with the No Child Left Behind (NCLB) mandate. While many believe high-stakes testing is an acceptable and accurate way to measure students' learning, one has to ask whether high stakes testing is an effective measurement…
A rising tide of protest is sweeping across the nation as growing numbers of parents, teachers, administrators and academics take action against high-stakes testing. Instead of test-and-punish policies, which have failed to improve academic performance or equity, the movement is pressing for broader forms of assessment. From Texas to New York and…
Kearney, W. Sean; Smith, Page A.
The article discusses a case study that stems from actual events, targets the issue of ethics in schools, and is applicable for use in a variety of educational leadership courses. The article examines the issues related to ethical responsibilities and high-stakes testing in public schools. The administration must decide what actions to take…
This dissertation examined the impact of an intervention aimed at improving the standardized test scores for students on the mathematics portions of a high-stakes high school examination. Research shows that the achievement gap between high performing and low performing students on standardized tests continues to grow and that the long-term…
Walker, Sherry Freeland, Ed.
This theme issue focuses on the use and consequences of high stakes tests. The lead article, "High-Stakes Testing: Too Much? Too Soon?" by Sherry Freeland Walker, introduces the topic and related issues, outlining the pros and cons of high stakes testing by the states. The problem, some experts say, is that states have tried to do too much too…
Nichols, Sharon L.; Berliner, David C.
High-stakes testing is the practice of attaching important consequences to standardized test scores, and it is the engine that drives the No Child Left Behind (NCLB) Act. The rationale for high-stakes testing is that the promise of rewards and the threat of punishments will cause teachers to work more effectively, students to be more motivated,…
Nichols, Sharon L.; Berliner, David C.
High-stakes testing is the practice of attaching important consequences to standardized test scores, and it is the engine that drives the No Child Left Behind (NCLB) Act. The rationale for high-stakes testing is that the promise of rewards and the threat of punishments will cause teachers to work more effectively, students to be more motivated,…
Fletcher, Jack M.; Stuebing, Karla K.; Hughes, Lisa C.
IQ test scores should be corrected for high stakes decisions that employ these assessments, including capital offense cases. If scores are not corrected, then diagnostic standards must change with each generation. Arguments against corrections, based on standards of practice, information present and absent in test manuals, and related issues,…
The assumptions that high-stakes testing is useful in raising educational standards for all students and that higher standards lead to higher educational performance for all students have not been tested in schools along the Texas border with Mexico. This study analyzed the effects of the high-stakes testing policy on students in a small rural…
Triplett, Cheri Foster; Barksdale, Mary Alice
This study examined elementary students' perceptions of high-stakes testing through the use of drawings and writings. On the day after students completed their high-stakes tests in the spring, 225 students were asked to "draw a picture about your recent testing experience." The same students then responded in writing to the prompt "tell me about…
Using the method of qualitative metasynthesis, this study analyzes 49 qualitative studies to interrogate how high-stakes testing affects curriculum, defined here as embodying content, knowledge form, and pedagogy. The findings from this study complicate the understanding of the relationship between high-stakes testing and classroom practice by…
Nichols, Sharon L.
I review the literature on the impact on student achievement of high-stakes testing. Its popularity as a mechanism for holding educators accountable has triggered studies to examine whether its promise to increase student learning has been fulfilled. The review concludes there is no consistent evidence to suggest high-stakes testing leads to…
Although it is important to evaluate the intended outcomes of high-stakes testing, it is also important to evaluate the unintended outcomes, which might be as important or more important than the intended outcomes. The purpose of this paper is to examine some of the unintended outcomes of high-stakes testing, including those related to: (a) using…
Nichols, Sharon L.; Berliner, David C.
Drawing on their extensive research, Nichols and Berliner document and categorize the ways that high-stakes testing threatens the purposes and ideals of the American education system. For more than a decade, the debate over high-stakes testing has dominated the field of education. This passionate and provocative book provides a fresh perspective…
Kruger, Louis J.; Wandle, Caroline; Struzziero, Joan
High stakes testing puts considerable pressure on schools, teachers, and students to achieve at high levels. Therefore, how schools and individuals cope with this major source of stress may have important implications for the success of high stakes testing. This article reviews relevant theory and research on stress as they relate to public…
Johnson, Dale; Johnson, Bonnie; Farenga, Steve; Ness, Daniel
This book is a compelling indictment of the use of high-stakes assessments with punitive consequences in public schools. The authors trace the history of the policy and document the inequities for children of poverty that undergird high-stakes testing practices. Lack of dental and medical care, environmental violence, insufficient school funding,…
The inevitable responses to high stakes testing, wherein students' test scores are highly consequential for teachers and administrators, include cheating, excessive test preparation, changes in test scoring and other forms of gaming to ensure that test scores appear high. Over the last decade this has been demonstrated convincingly in the USA, but…
Shymansky, James A.; Wang, Tzu-Ling; Annetta, Leonard A.; Yore, Larry D.; Everett, Susan A.
This paper is a report of a quasi-experimental study on the impact of a systemic 5-year, K-6 professional development (PD) project on the 'high stakes' achievement test scores of different student groups in rural mid-west school districts in the USA. The PD programme utilized regional summer workshops, district-based leadership teams and distance delivery technologies to help teachers learn science concepts and inquiry teaching strategies associated with a selection of popular science inquiry kits and how to adapt inquiry science lessons in the kits to teach and reinforce skills in the language arts-i.e. to teach more than science when doing inquiry science. Analyses of the school district-level pre-post high-stakes achievement scores of 33 school districts participating in the adaptation of inquiry PD and a comparative group of 23 school districts revealed that both the Grade 3 and Grade 6 student-cohorts in the school districts utilizing adapted science inquiry lessons significantly outscored their student-cohort counterparts in the comparative school districts. The positive school district-level high-stakes test results, which serve as the basis for state and local decision making, suggest that an inquiry adaptation strategy and a combination of regional live workshop and distance delivery technologies with ongoing local leadership and support can serve as a viable PD option for K-6 science.
Koretz, Daniel M.; Hamilton, Laura S.
Previous studies of the validity of gains on high-stakes tests have compared trends in scores on a high-stakes test to trends on a lower-stakes test, such as NAEP. However, generalizability of gains is likely to be incomplete even when gains are meaningful because of differences in the inferences the two tests are designed to support. Therefore,…
Klenowski, Val; Wyatt-Smith, Claire
High stakes testing in Australia was introduced in 2008 by way of the National Assessment Program--Literacy and Numeracy (NAPLAN). Currently, every year all students in Years 3, 5, 7 and 9 are assessed on the same days using national tests in Reading, Writing, Language Conventions (Spelling, Grammar and Punctuation) and Numeracy. In 2010 the…
Pressures to help students pass high-stakes tests affect teachers' reading instruction, their responsiveness to students' learning needs, and their professional effectiveness. This article reports on how one reading specialist responded to testing pressures in her urban elementary school. She believed that what was "right" for her…
Denson, Bettina Coley
This study sought to determine whether a relationship existed between teacher-assigned classroom grades and high-stakes test scores. The study examined teacher-assigned math grades in correlation to the student scores on the Florida Comprehensive Assessment test (FCAT) in a selected Florida high school. It also sought to determine the relationship…
Shepard, Lorrie A.
Recounts historical experience with testing efforts. Recommends that in any testing program, the limitations of testing must be kept in mind in order for the tests to benefit student achievement. (DDR)
Mason, Emanuel J.
Validity and reliability of the new high stakes testing systems initiated in school systems across the United States in recent years in response to the accountability features mandated in the No Child Left Behind Legislation largely depend on item response theory and new rules of measurement. Reliability and validity in item response theory and…
Gregg, Noel; Coleman, Chris; Davis, Mark; Chalk, Jill C.
The majority of high-stakes tests from elementary school through postsecondary education include the timed impromptu essay as a measure of writing performance. For adolescents with writing disorders, this type of evaluation often presents a significant barrier. The purpose of the current study was twofold. First, we investigated the influence of…
Reich, Gabriel A.; Bally, David
Using personal narratives and research on teacher "communities of practice," the authors outline a proactive response to high-stakes testing policies that places teacher learning at its center. Although research on the effects of these policies is mixed, the authors are troubled by the ways in which the policies have been used to strip teachers of…
Gunzenhauser, Michael G.
Asserts that high stakes testing may lead to a default philosophy of education that holds in high regard a narrow bundle of knowledge and skills, offering suggestions for what educators can do in the current context (e.g., maintain dialogue in schools, expand internal accountability, engage high standards, connect to higher-order concepts, and…
A great number of teachers in the United States have found themselves wrestling with an internal conflict between their teaching beliefs and a need to revert back to traditional modes of teaching in order to have their students demonstrate proficiency on high-stakes tests. While they want to include more non-traditional methods in their repertoire…
School and government officials, system administrators and other policymakers offer a variety of reasons for engaging in high stakes testing: to monitor student performance, to measure teacher and/or school effectiveness, to ensure accountability, etc. Some of these reasons are good; others not. But the best reason--one that is never offered,…
Gregg, Noel; Coleman, Chris; Davis, Mark; Chalk, Jill C
The majority of high-stakes tests from elementary school through postsecondary education include the timed impromptu essay as a measure of writing performance. For adolescents with writing disorders, this type of evaluation often presents a significant barrier. The purpose of the current study was twofold. First, we investigated the influence of handwritten, typed, and typed/edited formats of an expository essay on the quality scores received by students with (n = 65) and without (n = 65) dyslexia. Second, we examined the contribution of spelling, handwriting, fluency, and vocabulary complexity to the quality scores that students with and without dyslexia received on the same writing task. Analyses indicated that vocabulary complexity, verbosity, spelling, and handwriting accounted for more variance in essay quality scores for writers with dyslexia than for their typically achieving peers. Both group and individual student outcomes are reported to better understand the needs of struggling writers with dyslexia. Implications for assessment, instruction, and accommodations are discussed with an eye toward reform efforts that target improved teaching and learning.
The purpose of this research paper was to examine the effects of standardized testing on the youth of America. It was intended to point out the shortcomings of the usage of such tests. There were comparisons of the effects testing has on different cultures of students as well as different socioeconomic classes. Court cases were brought into play…
Baker, Richard A., Jr.
This study examined high-stakes test scores for 37,222 eighth grade students enrolled in music and/or visual arts classes and those students not enrolled in arts courses. Students enrolled in music had significantly higher mean scores than those not enrolled in music (p less than 0.001). Results for visual arts and dual arts were not as…
Ortiz-Marrero, Floris Wilma; Sumaryono, Karen
Learning a language can be a long and arduous journey, and there is a lot of pressure on teachers to get students ready for standardized tests quickly. Because of the high-stakes consequences attached to standardized tests in combination with consistently lower test scores among English language learners (ELLs), the tests greatly impact the…
Amrein, Audrey L.; Berliner, David C.
Found, based on data from 28 states, that there is scant evidence to support the proposition that high-stakes tests, including high-stakes high school graduation exams, increase student achievement. Also found that adoption of high-stakes testing policies leads to increased dropout rates, decreased graduation rates, and higher rates of younger…
Moran, Aldo Alfredo
With the recent increase in accountability due to No Child Left Behind, graduation rates and drop-out rates are important indicators of how well a school district is performing. High-stakes testing scores are at the forefront of a school's success and recognition as a school that is preparing and graduating students to meet society's challenging…
Huddleston, Andrew P.; Rockwell, Elizabeth C.
This historical critique of high-stakes testing in reading focuses on selected events from three historical movements: 1) the history of standardized testing, 2) the history of standardized reading tests, and 3) the history of high-stakes testing. These three interrelated histories have produced the high-stakes, standardized reading tests used in…
High-stakes, standardized tests have become ubiquitous in public education in the United States. Teachers across the country are feeling the intensified pressures from high-stakes testing policies and are responding to these pressures by teaching to the tests in varying ways (Renter et al., 2006). Given the hegemony of high-stakes testing in…
Braden, Jeffery P.
This article intends to help school psychologists understand the nature of high stakes tests, methods for analyzing and reporting high stakes test data, standards for tests and program evaluation, and application of appropriate practices to program planning and evaluation. Although it is readily acknowledged that high stakes test data are not…
Cothern, Rebecca L.
Science education is a key to economic success for a country in terms of promoting advances in national industry and technology and maximizing competitive advantage in a global marketplace. The December 2010 Program for International Student Assessment (PISA) ranked the United States 23rd of 65 countries in science. That dismal standing in science proficiency impedes the ability of American school graduates to compete in the global market place. Furthermore, the implementation of high stakes testing in science mandated by the 2007 No Child Left Behind (NCLB) Act has created an additional need for educators to find effective science pedagogy. Research has shown that inquiry-based science instruction is one of the predominant science instructional methods. Inquiry-based instruction is a multifaceted teaching method with its theoretical foundation in constructivism. A correlational survey research design was used to determine the relationship between levels of inquiry-based science instruction and student performance on a standardized state science test. A self-report survey, using a Likert-type scale, was completed by 26 fifth grade teachers. Participants' responses were analyzed and grouped as high, medium, or low level inquiry instruction. The unit of analysis for the achievement variable was the student scale score average from the state science test. Spearman's Rho correlation data showed a positive relationship between the level of inquiry-based instruction and student achievement on the state assessment. The findings can assist teachers and administrators by providing additional research on the benefits of the inquiry-based instructional method. Implications for positive social change include increases in student proficiency and decision-making skills related to science policy issues which can help make them more competitive in the global marketplace.
The driving force behind high-stakes-testing may be attributed to the issue of education reform. In the last decade, high-stakes testing has generated intense controversy among educators and parents. The use of high-stakes testing in making decisions about student promotion and graduation is both controversial and significant. The purpose of the…
Null, Elizabeth Higgins
East Feliciana Parish (Louisiana) has raised achievement scores by involving students in hands-on projects related to community needs and resources. Project Connect, a hands-on science and math program begun by the Delta Rural Systemic Initiative, has expanded into a comprehensive place-based program. In response to new state standards, teams of…
Lievens, Filip; Sackett, Paul R
This study used principles underlying item generation theory to posit competing perspectives about which features of situational judgment tests might enhance or impede consistent measurement across repeat test administrations. This led to 3 alternate-form development approaches (random assignment, incident isomorphism, and item isomorphism). The effects of these approaches on alternate-form consistency, mean score changes, and criterion-related validity were examined in a high-stakes context (N = 3,361). Generally, results revealed that even small changes in the context of the situations presented resulted in significantly lower alternate-form consistency. Conversely, placing more constraints on the alternate-form development process proved beneficial. The contributions, implications, and limitations of these results for the development of situational judgment tests and high-stakes testing are discussed.
Ashadi, Ashadi; Rice, Suzanne
High-stakes testing regimes, in which schools are judged on their capacity to attain high student results in national tests, are becoming common in both developed and developing nations, including the United States, Britain and Australia. However, while there has been substantial investigation around the impact of high-stakes testing on curriculum…
This study examines the impact of high-stakes, large-scale, standardized literacy testing on youth who have failed the Ontario Secondary School Literacy Test. Interviews with youth indicate that the unintended impact of high-stakes testing is more problematic than policy makers and educators may realize. In contrast to literacy policy's aims to…
Dawson, Heather S.
High-stakes testing has created challenges for teachers, administrators, parents, students, and other related education stakeholders in recent decades (Nichols & Berliner, 2007). While high-stakes tests have a long history (Ravitch, 2009) it was not until No Child Left Behind was signed into law in 2002 that the tests became law for most…
Existing evidence suggests that high stakes exams result in little increased learning among students. Yet, given the federal mandates for greater accountability, such as No Child Left Behind (NCLB) legislation and Race to the Top policies, and the "pervasive testing culture," the use of high-stakes tests is presently an accepted…
This article argues that high-stakes educational testing, along with the attendant questions of power, education access, education management and social selection, cannot be considered in isolation from society at large. Thus, high-stakes testing practices bear numerous implications for democratic conditions in society. For decades, advocates of…
Davis, Melissa Ferman
The purpose of this study was to examine influences of standards-based reform and high-stakes testing on teacher practices and perceptions in high school science classrooms. A literature review suggested that teacher practices and perceptions are affected by emphasis on standards-based reform and high-stakes testing and that state level…
High-stakes testing has been a part of American education since its inception. The laws that govern the use of high-stakes tests include language that mandates the inclusion of students in special education. These laws play an influential role in the new large-scale assessments aligned with the Common Core State Standards (CCSS). The assessments…
Decuir, Erica L.
High stakes testing is popularly examined in educational research, but contemporary analyses tend to reflect a qualitative or quantitative research design (e.g., Au, 2007; Cochran-Smith & Lytle, 2006; Gamble, 2010). Exhaustive debate over the relative success or failure of high stakes testing is often framed between competing visions of…
This National Reading Conference Policy Brief provides information related to high stakes reading tests and reading assessment. High stakes reading tests are those with highly consequential outcomes for students, teachers, and schools. These outcomes may include student promotion or retention, student placement in reading groups, school funding…
Solorzano, Ronald W.
This article discusses the issues and implications of high stakes tests on English language learners (ELLs). As ELLs are being included in all high stakes assessments tied to accountability efforts (e.g., No Child Left Behind), it is crucial that issues related to the tests be critically evaluated relative to their use. In this case, academic…
High-stakes testing in undergraduate nursing education are those assessments used to make critical decisions for student progression and graduation. The purpose of this study was to explore the different ways students experience multiple high-stakes tests for progression in one undergraduate BSN program. Research participants were prelicensure…
This paper analyses how high-stakes, standardised testing became the policy tool in the U.S. that it is today and discusses its role in advancing an ideology of meritocracy that fundamentally masks structural inequalities related to race and economic class. This paper first traces the early history of high-stakes testing within the U.S. context,…
Au, Wayne W.
The effects of high-stakes, standardized testing on the curriculum are discouraging the teaching of multicultural, anti-racist content. Test-influenced educational environments contribute to the reproduction of racial and cultural inequality in education. Using the lens of sociolinguistics, the author asserts that high-stakes, standardized tests…
Au, Wayne W.
High-stakes, standardized testing has become the central tool for educational reform and regulation in many industrialized nations in the world, and it has been implemented with particular intensity in the United States and the United Kingdom. Drawing on research on high-stakes testing and its effect on classroom practice and pedagogic discourse…
Johnson, Dale D.; Johnson, Bonnie
High Stakes brings the voices of students and teachers to national debates over school accountability and educational reform. Recounting the experiences of two classrooms during one academic year, the book offers a critical exploration of excessive state-mandated monitoring, high-stakes testing pressures, and inequities in public school funding…
High-stakes standardized literacy testing is not neutral and continues to build upon the legacy of dominant power relations in the state in its ability to sort, select and rank students and ultimately produce and name some youth as illiterate in contrast to an ideal white, male, literate citizen. I trace the effects of high-stakes standardized…
High-stakes, standardized testing is regularly used within in accountability narratives as a tool for achieving racial equality in schools. Using the frameworks of "racial projects" and "neoliberal multiculturalism," and drawing on historical and empirical research, this article argues that not only does high-stakes,…
Winfield, Lisa M.
This study analyzed the legal documents of cases involving the denial of a high school diploma as the result of not passing a high stakes exam in public education. The qualitative extrapolation of consistent themes in the court documents revealed information regarding the court's interpretation of the intersection of state authority to…
Shymansky, James A.; Wang, Tzu-Ling; Annetta, Leonard A.; Yore, Larry D.; Everett, Susan A.
This paper is a report of a study that examines the relationship between teacher participation in a multi-year, K-6 professional development effort and the "high stakes" science test scores of different student groups in 33 rural mid-west school districts in the USA. The professional development program involved 1,269 elementary school…
Waber, Deborah P; Gerber, Emily B; Turcios, Viana Y; Wagner, Erin R; Forbes, Peter W
High-stakes achievement testing is a centerpiece of education reform. Children from socially disadvantaged backgrounds typically perform more poorly than their more advantaged peers. The authors evaluated 91 fifth-grade children from low-income urban schools using clinical neuropsychological tests and behavioral questionnaires and obtained fourth-grade scores on state mandated standards-based testing. Goals were to determine whether executive functions are selectively diminished in children from poor urban environments and to evaluate to what extent integrity of executive functions is associated with test scores. Neuropsychological variables (particularly executive functions) accounted for 40% of the variance in English scores and 30% in mathematics. Efforts to improve children's academic achievement should consider developmental factors as well as curricular content.
Vogler, Kenneth E.; Virtue, David
The authors discuss the impact of standards and testing on curriculum and instruction. They begin with a brief history of the growth and development of academic standards and high-stakes testing. Next, they review relevant research on the impact high-stakes testing has had on curriculum and instruction and discuss ways that high-stakes testing has…
Segool, Natasha K.; Carlson, John S.; Goforth, Anisa N.; von der Embse, Nathan; Barterian, Justin A.
This study explored differences in test anxiety on high-stakes standardized achievement testing and low-stakes testing among elementary school children. This is the first study to directly examine differences in young students' reported test anxiety between No Child Left Behind (NCLB) achievement testing and classroom testing. Three hundred…
Mohler, Marie Elaine
There are many reasons a person may fail a high stakes test such as the National Council Licensure Examination for Registered Nurses (NCLEX-RN®). Sleep deprivation, illness, life stressors, knowledge deficit, and test anxiety are some of the common explanations. A student with test anxiety may feel threatened by this evaluation process. This…
Meijer, Rob R.
Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
The College English Test (CET) is an English language test designed for educational purposes, administered on a very large scale, and used for making high-stakes decisions. This paper discusses the key issues facing the CET during the course of its development in the past two decades. It argues that the most fundamental and critical concerns of…
Meijer, Rob R.
Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
von der Embse, Nathaniel; Hasson, Ramzi
With the enactment of standards-based accountability in education, high-stakes tests have become the dominant method for measuring school effectiveness and student achievement. Schools and educators are under increasing pressure to meet achievement standards. However, there are variables which may interfere with the authentic measurement of…
Stenlund, Tova; Eklöf, Hanna; Lyrén, Per-Erik
This study investigated whether different groups of test-takers vary in their reported test-taking behaviour in a high-stakes test situation. A between-group design (N = 1129) was used to examine whether high and low achievers, as well as females and males, differ in their use of test-taking strategies, and in level of reported test anxiety and…
This study draws upon a qualitative case study to investigate the impact of the high-stakes test environment on an elementary teacher's identities and the influence of identity maintenance on science teaching. Drawing from social identity theory, I argue that we can gain deep insight into how and why urban elementary science teachers engage in defining and negotiating their identities in practice. In addition, we can further understand how and why science teachers of poor urban students engage in teaching decisions that accommodate school demands and students' needs to succeed in high-stakes tests. This paper presents in-depth experiences of one elementary teacher as she negotiates her identities and teaching science in school settings that emphasize high-stakes testing. I found that a teacher's identities generate tensions while teaching science when: (a) schools prioritize high-stakes tests as the benchmark of teacher success and student success; (b) activity-based and participatory science teaching is deemphasized; (c) science teacher of minority students identity is threatened or questioned; and (d) a teacher perceives a threat to one's identities in the context of high stakes testing. Further, the results suggest that stronger links to identities generate more positive values in teachers, and greater possibilities for positive actions in science classrooms that support minority students' success in science.
Madaus, George F.; Clarke, Marguerite
This paper examines four aspects of current high stakes testing that impact minority students and others traditionally underserved by American education. Data from research conducted at Boston College over 30 years highlight 4 issues: high stakes, high standards tests do not have a markedly positive effect on teaching and learning; high stakes…
von der Embse, Nathaniel P.; Witmer, Sara E.
This study examined the relationship between student anxiety about high-stakes testing and their subsequent test performance. The FRIEDBEN Test Anxiety Scale was administered to 1,134 11th-grade students, and data were subsequently collected on their statewide assessment performance. Test anxiety was a significant predictor of test performance…
Hong, Won-Pyo; Youngs, Peter
This article draws on research from Texas and Chicago to examine whether highstakes testing enables low-income and racial minority students to acquire cultural capital. While students' performance on state or district tests rose after the implementation of high-stakes testing and accountability policies in Texas and Chicago in the 1990s, several…
Harvard Education Press, 2003
This inaugural volume of our Spotlight Series features recent "Harvard Education Letter" articles on testing and new reports never before published on this important topic. Contributors address such issues as how educators can manage the "avalanche" of tests; whether the benefits of high-stakes tests justify the risks to…
Carter, Erik W.; Wehby, Joseph; Hughes, Carolyn; Johnson, Stephen M.; Plank, Don R.; Barton-Arwood, Sally M.; Lunsford, Lauren B.
Recent policy initiatives promoting high-stakes testing for graduation present a significant challenge to practitioners charged with educating students with high-incidence disabilities. The purpose of this study was to examine the effects of test-taking strategy instruction on the test performance of secondary students with high-incidence…
High-stakes testing has increased since the passage of the federal No Child Left Behind Act (NCLB) of 2001. Many teachers are using teacher-centered activities with memorization and testing coach books instead of creating student-centered higher-order thinking activities. Some school districts are eliminating subjects that are not tested on state…
Meylani, Rusen; Bitter, Gary G.; Castaneda, Rene
In this study regression and neural networks based methods are used to predict statewide high-stakes test results for middle school mathematics using the scores obtained from third party tests throughout the school year. Such prediction is of utmost significance for school districts to live up to the state's educational standards mandated by the…
Hill, Kathryn; McNamara, Tim
Those who work in second- and foreign-language testing often find Koretz's concern for validity inferences under high-stakes (VIHS) conditions both welcome and familiar. While the focus of the article is more narrowly on the potential for two instructional responses to test-based accountability, "reallocation" and "coaching,"…
"High stakes testing" is to be understood as testing with serious consequences for students, their teachers and their educational institutions. It plays a central role in holding teachers and educational institutions to account. In a recent article Randall Curren seeks to refute a number of philosophical arguments developed in my "The Limits of…
Mills, Craig N.; Stocking, Martha L.
Computerized adaptive testing (CAT), while well-grounded in psychometric theory, has had few large-scale applications for high-stakes, secure tests in the past. This is now changing as the cost of computing has declined rapidly. As is always true where theory is translated into practice, many practical issues arise. This paper discusses a number…
Story, Lauren L.
Many leading scholars believe that students' emotions affect their learning and performance, and have voiced concerns about the need to investigate these emotions in the context of classrooms and schools. Apart from the emotion of anxiety, very little research has assessed students' achievement emotions toward tests. As high-stakes tests have…
Polesel, John; Rice, Suzanne; Dulfer, Nicole
Debates continue about how high-stakes testing regimes influence schools at all levels: their impact on teaching practices, distribution of resources and curriculum provision, and whether they achieve the intended increases in student achievement in targeted areas. In 2008, the Australian government Introduced a national testing scheme, the"…
Nichols, Sharon L.; Glass, Gene V.; Berliner, David C.
The present research is a follow-up study of earlier published analyses that looked at the relationship between high-stakes testing pressure and student achievement in 25 states. Using the previously derived Accountability Pressure Index (APR) as a measure of state-level policy pressure for performance on standardized tests, a series of…
High-stakes testing is one of the most controversial issues in American education. Advocates contend that these tests encourage students to work harder, provide teachers with a stronger understanding of students' strengths and weaknesses, and allow educators to target failing schools for extra help. Critics claim that they narrow and distort the…
Stevenson, Howard; Wood, Phil
High stakes testing has been long established in the English school system. In this article, we seek to demonstrate how testing has become pivotal to securing the neo-liberal restructuring of schools, that commenced during the Thatcher era, and is reaching a critical point at the current time. Central to this project has been the need to assert…
Heilig, Julian Vasquez
Background/Context: The prevailing theory of action underlying No Child Left Behind's high-stakes testing and accountability ratings is that schools and students held accountable to these measures will automatically increase educational output as educators try harder, schools will adopt more effective methods, and students will learn more. In…
This ethnographic study reports on one elementary literacy coach's response to high-stakes testing and her approach to support third- through fifth-grade teachers in a Title I school in Texas. Sources of data included field notes and observations of classes and meetings, audio/video recordings, and transcribed interviews. The findings illustrate…
Shriberg, David; Kruger, Louis J.
This overview article addresses the different meanings of high takes testing, which takes into consideration accountability at different levels, such as teacher, school, and state. In this regard, "high-stakes" may mean different things in different states or countries. We will advance an argument for why school psychologists should (a) be…
This study draws upon a qualitative case study to investigate the impact of the high-stakes test environment on an elementary teacher's identities and the influence of identity maintenance on science teaching. Drawing from social identity theory, I argue that we can gain deep insight into how and why urban elementary science teachers engage in…
McDaniel, Sheneatha Lashelle Alexander
The purpose of this research was to examine the relationship between high-stakes tests and stress with secondary teachers. Furthermore, this study investigated whether veteran teachers experience more stress than novice teachers and whether or not self-efficacy, gender, accountability status, and years of experience influence teacher stress as it…
Over the last almost two decades, high-stakes testing has become increasingly central to New York's schools. In the 1990s, the State Department of Education began requiring that secondary students pass five standardized exams to graduate. In 2002, the federal No Child Left Behind Act required students in grades three through eight to take math and…
Brown, Duane; Galassi, John P.; Akos, Patrick
Two studies of school counselors' perceptions of the impact of the North Carolina ABC (high-stakes) testing program are reported in this article. (For ease of interpretation, percentages were rounded to the nearest whole number in both studies.) One hundred forty-one counselors who attended their state association's professional conference…
Turner, Steven L.
Over the last decade, high-stakes test preparation has crept into the inventory of developmentally responsive middle level instructional practices. Amid calls for increased accountability and more rigorous curriculum and academic standards, the middle school movement now finds itself in a spotlight of intense scrutiny. This article examines the…
Hoffman, Lynn M.; Nottis, Katharyn E. K.
This mixed-methods study examines young adolescents' perceptions of strategies implemented before a state-mandated "high-stakes" test. Survey results for Grade 8 students (N = 215) are analyzed by sex, academic group, and preparation team. Letters to the principal are reviewed for convergence and additional themes. Although students were most…
Moon, Tonya R.
The myth equating high-stakes testing with rigor and difficulty is one that can be debunked given the empirical work that has been conducted in this area. To completely debunk this myth in gifted education, the field must centralize efforts. Educators need to consider alternatives to the current system of assessment and the delivery of…
Thompson, Gail L.; Allen, Tawannah G.
In order to ensure that American students are competitive with students in other countries, since the 1980s, U.S. policymakers have been trying to improve the K-12 public school system. Recent reform efforts have led to the current high-stakes testing movement, which measures student achievement and school effectiveness mainly by standardized test…
This article examines how high-stakes testing policies can constrain the way teachers at predominately Latina/o high schools teach literacy and subsequently influence the success of Latina/o students at college. It is based on a year and a half study of seven Latina/o students making transition from a high school to a community college or…
Johnson, Karin Pogna
The purpose of this study was to describe my experiences as a campus principal in facilitating the use of participatory formative assessment (PFA) in an environment of accountability and high-stakes testing. The methodology I employed was autoethnography (Chang, 2008; Ellis, 2004; Reed-Danahay, 1997; Stinson, 2009). I kept journals over a period…
Lim, Hyo Jin
The present study investigated longitudinal changes of the reading achievement among schools populated with English learners. It also examined the heterogeneity in the English learners group in terms of students' performance in high stakes reading tests. Historically, English learners have often been considered the students who are in the process…
Smyth, Emer; Banks, Joanne
There is now a well developed literature on the impact of high stakes testing on teaching approaches and student outcomes. However, the student perspective has been neglected in much research. This article draws on a mixed method longitudinal study of secondary students in the Republic of Ireland to explore the impact of two sets of high stakes…
Rodriguez, Jessica M.; Arellano, Lucy
This study explores the influence high-stakes testing has on Latina/o student aspirations and subsequent college enrollment. It quantitatively examines the critical juncture of high school exit and college entry at a school district serving a predominately Latino population. Findings confirm a strong correlation between the math and English…
Lewis, Steven; Hardy, Ian
This paper provides insights into teacher and school-based administrators' responses to policy demands for improved outcomes on high-stakes, standardised literacy and numeracy tests in Australia. Specifically, the research reveals the effects of the National Assessment Program--Literacy and Numeracy (NAPLAN), and associated policies, in the state…
Christian, Sonya Colman
This study investigated the relationship between high-stakes testing and the stress levels of secondary teachers in Jackson's Jackson Public School District. The independent variables of age, gender, subject taught, teaching experience, degree and school level were used to determine the differences of the various groups. A survey was piloted and…
Dianis, Judith Browne; Jackson, John H.; Noguera, Pedro
The only thing that more testing will tell us is what we already know: The schools that disadvantaged children attend are not being given the supports necessary to produce achievement gains. Students cannot be tested out of poverty, and while NCLB did take us a step forward by requiring schools to produce evidence that students were learning, it…
Daniel, Tracy Demetrie
Determining if the investment in educational technology will improve student achievement is complicated and multifarious. The purpose of this study was to evaluate the influence of teacher technology integration on student achievement as measured by the Mississippi Subject Area Testing Program (SATP) and to explore the relationship between…
In this world of increasing competition for jobs and accountability in the workplace, adults are facing many new pressures, one of which is passing tests as part of the application process. This is especially difficult for adults who are academically challenged or did not go far enough with their education to feel comfortable in testing…
Yamashita, Mika Yoder
This study examined how a total of eight math and science elementary school teachers changed their classroom instruction in response to high stakes and low stakes testing in one school district. The district introduced new assessment in the school year of 2005--06 to meet the requirement set forth by the No Child Left Behind Act (NCLB)---that the assessment should be aligned with the state academic standards. I conducted interviews with teachers and school administrators at two elementary schools, district officials, and a representative of a non-profit organization during the school year 2007--08 to examine how the new assessment introduced in 2005--06 had shaped classroom instruction. Concepts from New Institutional Theory and cognitive approaches to policy implementation guided the design of this study. This study focused on how materials and activities associated with high stakes testing promoted ideas about good instruction, and how these ideas were carried to teachers. The study examined how teachers received messages about instruction and how they responded to the messages. The study found that high stakes testing influenced teachers' classroom instruction more than low stakes testing; however, the instructional changes teachers made in response to state testing was at the content level. The teachers' instructional strategies did not change. The teachers' instructional changes varied with the degree of implementation of existing math curriculum and with the degree of support they received in understanding the meaning of assessment results. The study concluded that, among the six teachers I studied, high stakes testing was not a sufficient intervention for changing teachers' instructional strategies. The study also addressed the challenges of aligning instructional messages across assessment, standards, and curriculum.
Boevé, Anja J; Meijer, Rob R; Albers, Casper J; Beetsma, Yta; Bosker, Roel J
The introduction of computer-based testing in high-stakes examining in higher education is developing rather slowly due to institutional barriers (the need of extra facilities, ensuring test security) and teacher and student acceptance. From the existing literature it is unclear whether computer-based exams will result in similar results as paper-based exams and whether student acceptance can change as a result of administering computer-based exams. In this study, we compared results from a computer-based and paper-based exam in a sample of psychology students and found no differences in total scores across the two modes. Furthermore, we investigated student acceptance and change in acceptance of computer-based examining. After taking the computer-based exam, fifty percent of the students preferred paper-and-pencil exams over computer-based exams and about a quarter preferred a computer-based exam. We conclude that computer-based exam total scores are similar as paper-based exam scores, but that for the acceptance of high-stakes computer-based exams it is important that students practice and get familiar with this new mode of test administration.
Segool, Natasha Katherine
The current study explored differences in test anxiety on high-stakes standardized achievement testing and classroom testing among elementary school children. This is the first study to directly examine differences in student test anxiety across two testing conditions with different stakes among young children. Three hundred and thirty-five…
Hebert, Terri Richardson
The appropriate methods utilized by school districts across the United States to measure student academic achievement has found an established place within the headlines of state and national newspapers, professional journals, and political offices. However, we seldom reach out to those in the classroom and engage in a meaningful dialogue about the pros and cons of high stakes, state mandated testing. Therefore, this study is designed to investigate the impact of the Texas Assessment of Knowledge and Skills (TAKS) test upon three fifth grade science teachers' instructional practices. The participating school, nestled within a large East Texas school district, was selected because of their high test scores, as well as their creative approach to teaching. The selected teachers were chosen primarily for their recognized abilities within a science classroom, specifically as they work to reach a diverse group of students at varying levels of ability and instill within them the ability to master necessary scientific concepts found on the state-mandated, high-stakes test. Using the portraiture methodology for this qualitative study (Lawrence-Lightfoot & Davis, 1997), data were collected that provide a rich texture of the fifth grade classes within the elementary school setting. Through close observations, formal and informal interviews, and attention to the teachers' reflective work, the woven tapestry emerged in conjunction with the voices of the teachers.
Atalmis, Erkan Hasan
Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
With the implementation of the No Child Left Behind Act in January of 2002, curricula in high schools in the United States have adjusted to make room for test preparation activities and high stakes testing. This involves teaching skills and content in the format of the test only, drilling students on specific skills and content areas that will be…
Sackett, Paul R; Borneman, Matthew J; Connelly, Brian S
The authors review criticisms commonly leveled against cognitively loaded tests used for employment and higher education admissions decisions, with a focus on large-scale databases and meta-analytic evidence. They conclude that (a) tests of developed abilities are generally valid for their intended uses in predicting a wide variety of aspects of short-term and long-term academic and job performance, (b) validity is not an artifact of socioeconomic status, (c) coaching is not a major determinant of test performance, (d) tests do not generally exhibit bias by underpredicting the performance of minority group members, and (e) test-taking motivational mechanisms are not major determinants of test performance in these high-stakes settings.
Lievens, Filip; Sackett, Paul R; Buyse, Tine
This study fills a key gap in research on response instructions in situational judgment tests (SJTs). The authors examined whether the assumptions behind the differential effects of knowledge and behavioral tendency SJT response instructions hold in a large-scale high-stakes selection context (i.e., admission to medical college). Candidates (N = 2,184) were randomly assigned to a knowledge or behavioral tendency response instruction SJT, while SJT content was kept constant. Contrary to prior research in low-stakes settings, no meaningfully important differences were found between mean scores for the response instruction sets. Consistent with prior research, the SJT with knowledge instructions correlated more highly with cognitive ability than did the SJT with behavioral tendency instructions. Finally, no difference was found between the criterion-related validity of the SJTs under the two response instruction sets.
It has been frequently suggested that personal characteristics (e.g., language deficiencies, atypical schooling) may be responsible for the tendency of individuals to answer with aberrant response patterns to high stakes tests. This has not, however, been adequately validated using empirical data. This research uses datasets from seven mathematics, English and science papers to investigate the consistency with which individuals respond aberrantly across papers. Pupils who responded aberrantly on one paper were more likely to do so on other papers on the same subject. Also, pupils who responded aberrantly on one paper of one subject were more likely to do so on papers of another subject. Logistic multilevel models using the generation of aberrant response patterns as a dependent variable have suggested non-negligible intra-pupil and intra-school correlations.
High-stakes tests have been employed widely to engineer curriculum innovation, or achieve intended washback in education. But our understanding of the role of high-stakes tests as an agent for change is limited due to the small number of empirical studies available on this issue. This paper reports on a washback study which focuses on the writing…
Howe, Harold II
Critiquing Nina and Sol Horowitz's article advocating high-stakes tests, the author deplores deleterious effects of too-rigorous standards on poor students and recent immigrants. Without large-scale initiatives to affect their lives out of school, urban youngsters' prospects are dim. A National Research Council report offers testing guidelines.…
Banerjee, Manju; Gregg, Noel
Unprecedented increases in the use of technologies throughout postsecondary education and the workplace are redefining traditional concepts of accessibility during testing for college students with learning disabilities. High stakes testing practices are under pressure to change. The challenge for professionals is to ensure that tests are designed…
Steele, Marcee M.
This article reviews characteristics of high school students with learning disabilities and presents instructional modifications and study skills to help them succeed in algebra and geometry courses and on high stakes mathematics assessments.
Current and former leaders of many major urban school districts, including Washington, D.C.'s Michelle Rhee and New Orleans' Paul Vallas, have sought to use tests to evaluate teachers. In fact, the use of high-stakes standardized tests to evaluate teacher performance in the manner of value-added measurement (VAM) has become one of the cornerstones…
Jerome, Diane C.
This study explored how science teachers and school administrators perceive the use of the affective domain during science instruction situated within a high-stakes testing environment. Through a multimethodological inquiry using phenomenology and critical ethnography, the researcher conducted semi-structured interviews with six fifth-grade…
Tingey, RaShel Anderson
The purpose of this study was to investigate the impact of high-stakes testing under the No Child Left Behind (NCLB) Act on school culture. Individual interviews and focus groups were conducted with first grade through sixth grade teachers and principals from two of Nebo School District's schools located in Utah. Their responses were categorized…
Mina Shaughnessy continues to exert powerful influences over Basic Writing practices, discourses and pedagogy thirty-five years after her death: Basic Writing remains in some ways trapped by Shaughnessy's legacy in what Min-Zhan Lu labeled as essentialism, accommodationism and linguistic innocence. High-stakes writing tests, a troubling hallmark…
Taylor-Smith, Carol J.
The purpose of this comparative qualitative study examined the impact of the achievement gap on the lack of highly qualified teachers instructing African American students consistently from K-12th grades and its effects on high-stakes testing. In addition, the study examined teacher perceptions that could also be contributing factors of the…
The following paper provides a case study of the resistance of the New York Performance Standards Consortium to the state's unitary high stakes testing policy from 1998 to 2006. After detailing the history of the grass roots actions undertaken by the group of alternative high schools called "The Consortium," the analysis seeks to apply…
The application of the principles of scientific management within the structure, organization, and curriculum of public schools in the US became dominant during the early 1900s. Based upon research evidence from the modern day era of high-stakes testing in US public education, the fundamental logics guiding scientific management have resurfaced…
Kang, Jee Sun Emily
This study explored how inquiry-based teaching and learning processes occurred in two teachers' diverse 8th grade Physical Science classrooms in a Program Improvement junior high school within the context of high-stakes standardized testing. Instructors for the courses examined included not only the two 8th grade science teachers, but also graduate fellows from a nearby university. Research was drawn from inquiry-based instruction in science education, the achievement gap, and the high stakes testing movement, as well as situated learning theory to understand how opportunities for inquiry were negotiated within the diverse classroom context. Transcripts of taped class sessions; student work samples; interviews of teachers and students; and scores from the California Standards Test in science were collected and analyzed. Findings indicated that the teachers provided structured inquiry in order to support their students in learning about forces and to prepare them for the standardized test. Teachers also supported students in generating evidence-based explanations, connecting inquiry-based investigations with content on forces, proficiently using science vocabulary, and connecting concepts about forces to their daily lives. Findings from classroom data revealed constraints to student learning: students' limited language proficiency, peer counter culture, and limited time. Supports were evidenced as well: graduate fellows' support during investigations, teachers' guided questioning, standardized test preparation, literacy support, and home-school connections. There was no statistical difference in achievement on the Forces Unit test or science standardized test between classes with graduate fellows and without fellows. There was also no statistical difference in student performance between the two teachers' classrooms, even though their teaching styles were very different. However, there was a strong correlation between students' achievement on the chapter test and
Mason, Janet Harmon
The purpose of this study was to explore the impact of high-stakes testing and accountability on teachers' perceptions of their professional identities. Teachers' instructional practice, work environments, and personal factors are now immersed in the context of high-stakes testing and accountability. This context colors the decisions teachers make…
Walden, Lavada M.; Kritsonis, William Allan
The author looks at critical dialogue surrounding the causes for the alarming high numbers of high school dropouts in states that use high stakes standardized testing mandated by the No Child Left Behind Act, and investigates the perceived correlations between high stakes testing and high numbers of high school dropouts of minority students.
Seymour, Clancy; Garrison, Mark
Building on recent discussions regarding how current national standards for physical education promote cognitive outcomes over physical outcomes, the authors explore how a new era in high-stakes testing is also contributing to an emphasis on the cognitive, over the physical. While high-stakes testing has been linked to reducing the amount of…
Kessner, Micheal J.
The purpose of this study was to assess the affects of hands-on, inquiry-based instruction on student science achievement in a high-stakes testing environment. Hands-on, inquiry-based science has become a popular way of teaching science because it is inviting and interesting for students. However, the question remains: Does implementation of inquiry-based science instruction in a high-stakes testing environment affect fifth-grade student science achievement? A quasi-experimental design employing quantitative and qualitative methods was used. The quantitative portion consisted of data collected from Student Surveys and individual science achievement scores for fifth-grade students at three participating schools in a large, suburban school district. The qualitative portion consisted of data collected using a Science Kit Usage Checklist, an open ended Teacher Survey of 5 fifth-grade science teachers, and Teacher Interviews for 3 fifth-grade science teachers. Descriptive analysis was utilized, and emerging codes and themes were identified for teacher education, science kit training, and understanding and implementation of science kits. Data and methods triangulation were employed (Berg, 2006; Patten, 2005) All data were utilized to determine if implementation of Science Kits impacted science achievement scores in a high stakes testing environment. Results indicated a general improvement of students meeting mastery of the fifth-grade science state assessment when kits were implemented. Teacher fidelity and high implementation were validated with Student and Teacher Surveys. Themes emerged involving training, time, student response, impact on instruction, impact on achievement scores, instructional organization, and instructional changes in future implementation. District supported training and materials led to teacher and student enjoyment of science kits, which led to implementation. Implementation then led to higher fifth-grade science achievement scores.
Pollard, Tracie L.
The No Child Left Behind (NCLB) Act of 2001 requires all schools to be accountable for student performance. High-stakes accountability represents a growing concern among the field of education. Literature supports that teachers are vital to the success of students; however, the impact of high-stakes testing on instructional practice is changing…
Waitoller, Federico R.; Pazey, Barbara L.
In this chapter, we examine tensions that can materialize at the intersection of high-stakes accountability assessments and the rights of parents of students with dis/abilities. We bring to the surface and analyze the competing notions of social justice that have fueled the implementation of both high-stakes testing and the inclusion of students…
Boger, John Charles
This paper examines student resegregation by race and socioeconomic class, high stakes accountability measures aimed at affecting educators' decisions on student promotion and graduation, and continuing disparities in school resources and finance, all of which intensified in 2002, particularly in North Carolina and the U.S. south. The paper…
Peters, Susan; Oliver, Laura Ann
While great progress has been made by the international community to promote inclusive education for all children, regardless of race, ethnicity, socio-economic status, gender or disability, many countries still continue to marginalize and exclude students in educational systems across the globe. High-stakes assessments in market-driven economies…
Pringle, Rose M.; Martin, Sarah Carrier
In 1983, the National Commission on Excellence in Education in the United States issued a report called A Nation at Risk: The Imperative for Educational Reform. This report and other policy initiatives such as the No Child Left Behind Legislation recommended that the individual states institute assessments to hold schools accountable. This research explored the potential impact of impending standardised testing on teaching science in elementary schools in one school district in Florida. We explored the teachers' concerns about the upcoming high-stakes tests in science, possible impact on their curriculum and what changes, if any, will be made in the approach to science teaching and learning in their classrooms. As the teachers look toward the implementation of high-stakes testing in science, they have recognised the need to teach science. This recognition is not borne out of the importance of science learning for elementary school children, but rather out of fear of failure and the effects of tangible rewards or punishments that accompany high-stakes testing. In anticipation, the teachers are preparing to align their teaching to the science standards while aggressively searching for test preparatory materials. Schools are also involved in professional development and structural changes to facilitate teaching of science.
Sharkey, Patrick; Schwartz, Amy Ellen; Ellen, Ingrid Gould; Lacoe, Johanna
This paper examines the effect of exposure to violent crime on students' standardized test performance among a sample of students in New York City public schools. To identify the effect of exposure to community violence on children's test scores, we compare students exposed to an incident of violent crime on their own blockface in the week prior…
Pelkey, Ramona K.
Gender, ethnicity, family economic status, reading score, mathematics score, and number of science semesters successfully completed were examined for their contributory role to a student's science score on a high-stakes, high school exit examination. Path analysis and analysis of variance procedures were used to quantify each variable's influence on science score. Gender, ethnicity, and family economic status were found to be moderators while reading proved to mediate within the model. The path model was created using a calibration sample and cross-validated using a hold-out validation sample. Bootstrapping was used to verify the goodness of fit of the model. A predictive equation explained 66% (R2 = .66) of the variance in observed TAKS science score.
Plank, Stephen B.; Condliffe, Barbara Falk
High-stakes tests are the most heavily weighted measures in accountability systems developed in response to No Child Left Behind. While some studies show high-stakes accountability being related to test score gains, others suggest these policies do not improve achievement and often result in unintended consequences. To understand mechanisms…
Bovaird, James A., Ed.; Geisinger, Kurt F., Ed.; Buckendahl, Chad W., Ed.
Educational assessment and, more broadly, educational research in the United States have entered into an era characterized by a dramatic increase in the prevalence and importance of test score use in accountability systems. This volume covers a selection of contemporary issues about testing science and practice that impact the nation's public…
This article asserts that the health of public schools depends on defining a new model of accountability--one that is balanced and comprehensive. This new model needs be one that involves much more than test scores. This article outlines the premises behind this argument asking for what, to whom, and by what means schools should be held…
Many schools of nursing are adopting progression policies to ensure that school licensure pass rates remain above acceptable levels. These policies prevent students who are predicted to fail the licensure examination from taking the examination, usually by preventing graduation from or completion of the nursing program. Progression policies frequently rely on a single test score from a predictive exit examination, such as the Health Education Systems, Inc. (HESI) Exit Examination, as a measure of whether students are likely to pass or fail the licensure examination and, therefore, whether they are permitted to graduate from the nursing program. In this article, questions about the HESI Exit Examination's test-use validity are explored. Best practices in testing and assessment require faculty to perform a more comprehensive assessment of students' abilities and to not rely on one predictor alone when making important educational decisions. Recommendations and suggestions are provided to guide faculty in decision making about progression policies.
Jerome, Diane C.
This study explored how science teachers and school administrators perceive the use of the affective domain during science instruction situated within a high-stakes testing environment. Through a multimethodological inquiry using phenomenology and critical ethnography, the researcher conducted semi-structured interviews with six fifth-grade science teachers and two administrators from two Texas school districts. Data reconstructions from interviews formed a bricolage of diagrams that trace the researcher's steps through a reflective exploration of these phenomena. This study addressed the following research questions: (a) What are the attitudes, interests, and values (affective domain) that fifth-grade science teachers integrate into science instruction? (b) How do fifth-grade science teachers attempt to integrate attitudes, interests and values (affective domain) in science instruction? and (c) How do fifth-grade science teachers manage to balance the tension from the seeming pressures caused by a high-stakes testing environment and the integration of attitudes, interests and values (affective domain) in science instruction? The findings from this study indicate that as teachers tried to integrate the affective domain during science instruction, (a) their work was set within a framework of institutional values, (b) teaching science for understanding looked different before and after the onset of the science Texas Assessment of Knowledge and Skills (TAKS), and (c) upon administration of the science TAKS---teachers broadened their aim, raised their expectations, and furthered their professional development. The integration of the affective domain fell into two distinct categories: 1) teachers targeted student affect and 2) teachers modeled affective behavior.
Norris, Stephen P.; Leighton, Jacqueline P.; Phillips, Linda M.
Many significant changes in perspective have to take place before efforts to learn the content and capabilities of children's minds can hold much sway in educational testing. The language of testing, especially of high stakes testing, remains firmly in the realm of "behaviors", "performance" and "competency" defined…
Rector, L. D.
The purpose of this study was to examine the relationship between the size of high schools, their percentage of SED (socio-economic disadvantaged) students, and API (academic performance index) scores in California, and determine if teacher preparation is a contributing factor. The 2010 API scores and median income of all 52 counties, and the 2010…
Winters, Marcus A.; Greene, Jay P.; Trivitt, Julie R.
School systems across the nation have adopted policies that reward or sanction particular schools on the basis of their students' performance on standardized math and reading tests. One of the most frequently raised concerns regarding such "high-stakes testing" policies is that they oblige schools to focus on subjects for which they are…
Winters, Marcus A.; Trivitt, Julie R.; Greene, Jay P.
An important criticism of high-stakes testing policies--policies that reward or sanction schools based on their students' performance on standardized tests--is that they provide schools with an incentive to focus on those subjects that play a role in the accountability system while decreasing attention to those subjects that are not part of the…
The purpose of this study is to determine the predictive validity of early literacy curriculum-based measurement on high-stakes reading tests administered in grades 3, 4 and 5. This study will examine curriculum-based measurement subtests kindergarten Dynamic Indicators of Early Literacy Skills (DIBELS) Letter-Naming Fluency (LNF) and 1st grade…
Young, I. Phillip; Cox, Edward P.; Buckman, David G.
To assess satisfactory job performance of superintendents on the basis of school districts' high-stakes testing outcomes, existing teacher models were reviewed and critiqued as potential options for retrofit. For these models, specific problems were identified relative to the choice of referent groups. An alternate referent group (statewide…
Dutro, Elizabeth; Selland, Makenzie
A significant body of research articulates concerns about the current emphasis on high-stakes testing as the primary lever of education reform in the United States. However, relatively little research has focused on how children make sense of the assessment policies in which they are centrally located. In this article, we share analyses of…
Giouroukakis, Vicky; Honigsfeld, Andrea
This multicase study investigated the impact of high-stakes testing on the literacy practices of teachers of high school English language learners (ELLs) in three Long Island, New York, school districts, in one of the most racially and socioeconomically segregated regions of the United States. The goal of the study was to explore what kinds of…
De Lisle, Jerome; Smith, Peter; Keller, Carol; Jules, Vena
High-stakes placement testing at eleven plus remains a central and constant feature of education systems in the Anglophone Caribbean. In the Republic of Trinidad and Tobago, the Eleven Plus has been retained well into the era of universal secondary education, with a perceived legitimacy founded on the belief that examinations provide the fairest…
Johnson, Dale D.; Johnson, Bonnie
This book connects the educational conditions created by high-stakes testing to the students and teachers who are influenced or victimized by the currents driving this movement. The authors left their positions as teacher-educators and taught grades 3 and 4 for 1 year as regular teachers in one of America's most impoverished schools. Redbud…
This paper argues that recent changes to two national high-stakes tests for English--the National Certificate of Educational Achievement (NCEA) in Aoteaora New Zealand and the General Certificate of Secondary Education (GCSE) in England--have shifted the assessment emphasis further away from poetry than previously and have significantly…
Young, I. Phillip; Fawcett, Paul
Several teacher models exist for using high-stakes testing outcomes to make continuous employment decisions for principals. These models are reviewed, and specific flaws are noted if these models are retrofitted for principals. To address these flaws, a different methodology is proposed on the basis of actual field data. Specially addressed are…
Akom, George Viche
Formative assessment, as a strategy used to improve student learning, encounters several obstacles in its implementation. This study explores changes in teachers' views and practices as they are introduced to formative assessment in a high stakes testing and limited resource environment. The study examines the extent to which teachers use the…
Many nursing programs use standardized testing packages in order to evaluate students' content mastery as well as predict probability of passing the National Council Licensure for Registered Nurses (NCLEX-RN). Instead of a diagnosis for weak content areas, programs implement testing policies in the belief that such policies ensure student success…
Henderson, Karen D.
The purpose of this study is to evaluate the extent to which special education teachers experience anxiety administering the different versions of the Criterion Referenced Competency Test (CRCT) and Criterion Referenced Competency Test-Modified (CRCT-M). This human service profession (education) was selected because there is very little literature…
Lievens, Filip; Sackett, Paul R.
This study used principles underlying item generation theory to posit competing perspectives about which features of situational judgment tests might enhance or impede consistent measurement across repeat test administrations. This led to 3 alternate-form development approaches (random assignment, incident isomorphism, and item isomorphism). The…
Amrein-Beardsley, Audrey; Berliner, David C.; Rideau, Sharon
Educators are under tremendous pressure to ensure that their students perform well on tests. Unfortunately, this pressure has caused some educators to cheat. The purpose of this study was to investigate the types of, and degrees to which, a sample of teachers in Arizona were aware of, or had themselves engaged in test-related cheating practices as…
Rhone, Angela E.
This article offers guidance for teachers seeking to improve instruction and their students' success rates, especially for those in urban schools. Furthermore, it discusses the advantages and disadvantages of in-class preparation tests, which teachers often administer to help increase students' chances for success on state-mandated tests. In-class…
Describes personal and professional pressures a first-year teacher faced when confronted with mandated standards in student academic achievement for at-risk students. Focuses on the teacher's efforts to: (1) examine student needs, including addressing test anxiety; (2) adapting classroom methods, including uses of practice testing; and (3)…
Sackett, Paul R.; Borneman, Matthew J.; Connelly; Brian S.
The authors review criticisms commonly leveled against cognitively loaded tests used for employment and higher education admissions decisions, with a focus on large-scale databases and meta-analytic evidence. They conclude that (a) tests of developed abilities are generally valid for their intended uses in predicting a wide variety of aspects of…
Belov, Dmitry I.
This article presents the Variable Match Index (VM-Index), a new statistic for detecting answer copying. The power of the VM-Index relies on two-dimensional conditioning as well as the structure of the test. The asymptotic distribution of the VM-Index is analyzed by reduction to Poisson trials. A computational study comparing the VM-Index with the…
In the twenty-first century, the use of standardized tests as the primary means to evaluate schools and teachers in the United States has contributed to severe dilemmas, including misleading information on what students know, lower-level instruction, cheating, less collaboration, unfair treatment of teachers, and biased teaching. This article…
Huynh, Huynh; Meyer, J. Patrick; Gallant, Dorinda J.
This study examined the effect of oral administration accommodations on test structure and student performance on the mathematics portion of the South Carolina High School Exit Examination (HSEE). The examination was given at Grade 10 and was untimed. Three groups of students were studied. Two groups took the regular form. One group had recorded…
General cognitive diagnostic models (CDM) such as the generalized deterministic input, noisy, "and" gate (G-DINA) model are flexible in that they allow for both compensatory and noncompensatory relationships among the subskills within the same test. Most of the previous CDM applications in the literature have been add-ons to simulation…
Jones, M. Gail; Jones, Brett D.; Hardin, Belinda; Chapman, Lisa; Yarbrough, Tracie; Davis, Marcia
Under North Carolina's ABC's accountability program, public schools are labeled "exemplary,""meets expectations,""adequate performance," or "low performance." Teachers are given $1,500 bonuses if their schools exceed expectations. A survey found that mandated tests increased student anxiety and negatively…
Jackson, Julie; Castro, Angela
Schools are under increasing pressure to meet accountability requirements and show growth in student achievement across tested content areas. As a result, throughout the school year, student achievement data are analyzed to discover data trends that highlight both student gains and gaps in learning. Achievement gaps are identified and addressed…
Inquiry-based teaching and assessment approaches are superior to standardized tests for measuring students' progress. Historical thinking skills employed in Leopold von Ranke's 19th-century seminars have been refined to consider point of view, credibility of evidence, historical context, causality, and multiple perspectives--benchmarks of…
Andrich, David; Styles, Irene; Mercer, Annette; Puddey, Ian B
The possibility that the validity of assessment is compromised by repeated sittings of highly competitive and high profile selection tests has been documented and is of concern to stake-holders. An illustrative example is the Undergraduate Medicine and Health Sciences Admission Test (UMAT) used by some medical and dental courses in Australia and New Zealand. The proficiencies of all applicants who sat the UMAT from one to four sittings between 2006 and 2012 were estimated on the same metric using the probabilistic Rasch model. A fit index characterising each profile's degree of conformity to the model was also calculated. Confirming expectations, mean proficiencies increased with repeated sittings on all three UMAT scales with the greatest difference (which was nevertheless relatively small) between the first two sittings. The fit index showed that the increases in proficiency estimates arose from additional easier items being answered correctly on repeated sittings rather than additional more difficult ones, suggesting that improvements are not on the substantive construct of the variable of assessment but in skills in answering the questions. Although strategies for dealing with the increase in proficiency estimates on repeated sittings could be canvassed, these results suggest that the validity of results on repeated sittings was not compromised. Accordingly, it might be concluded that although particular individuals might improve substantially between sittings, any validity is not likely to be compromised with the possibility that for some applicants, the second sitting might be the most valid.
Teachers' Perceptions of the Impact of High Stakes Testing on Instructional Content, Instructional Strategies, Motivation and Morale, and Pressure to Improve Student Performance in Relation to Their Views on Accountability and Its Effect on Students with Learning Disabilities
The purpose of the study was to examine teachers' perceptions of the impact of high stakes testing on instructional content, instructional strategies, motivation and morale, and pressure to improve student performance in relation to their views on accountability. It also sought to identify teachers' perceptions of the effect of high stakes testing…
Bethell, George; Zabulionis, Algirdas
Since the break up of the USSR, its former republics have seen the emergence and rapid expansion of an examinations industry that was, to all intents and purposes, unknown in Soviet times. New national assessment agencies have been established and been charged with, amongst other things, developing high-stakes exams to replace the diverse and…
McLarnon, Matthew J W; Goffin, Richard D; Schneider, Travis J; Johnston, Norman G
Including equal numbers of positively and negatively keyed items is common in Five-Factor Model (FFM) personality measures. Much literature has demonstrated the presence of positive and negative keying factors in low-stakes testing situations, but there is a dearth of research investigating these factors in high-stakes testing. To address this gap, we investigated whether an FFM measure used in high-stakes testing was influenced by positive and negative keying factors. We also examined the overlap of the positive and negative keying factors with social desirability, rule-consciousness, acquiescence, and cognitive ability. Confirmatory factor analysis supported the inclusion of distinct factors associated with positively and negatively keyed items and suggested that the keying factors accounted for a substantial portion of variation in responses to FFM items. Social desirability and rule-consciousness were found to have significant relations with both keying factors, whereas acquiescence was only related to the negative keying factor. Implications for the construct validity of FFM measures used in high-stakes testing and directions for future research are discussed.
Miller, Kelli Caldwell; Bell, Sherry Mee; McCallum, R. Steve
Because of the increased emphasis on standardized testing results, scores from a high-stakes, end-of-year test (Tennessee Comprehensive Assessment Program [TCAP] Reading Composite) were used as the standard against which scores from a group-administered, curriculum-based measure (CBM), Monitoring Instructional Responsiveness: Reading (MIR:R), were…
There has been a growing consensus among the educational measurement experts and psychometricians that test taker characteristics may unduly affect the performance on tests. This may lead to construct-irrelevant variance in the scores and thus render the test biased. Hence, it is incumbent on test developers and users alike to provide evidence…
Akom, George Viche
Formative assessment, as a strategy used to improve student learning, encounters several obstacles in its implementation. This study explores changes in teachers' views and practices as they are introduced to formative assessment in a high stakes testing and limited resource environment. The study examines the extent to which teachers use the technique of formative assessment to engage students in authentic learning even while not sacrificing high test scores on summative assessments. A case study methodology was employed to address the research topic. Science teachers in the West African country of Cameroon were engaged in a process of lesson planning and implementation to collaboratively build lessons with large amounts of formative assessment. Qualitative data from written surveys, group discussions, classroom and workshop observations, and from teacher reflections reveal the extent to which lesson fidelity is preserved from views to planning to implementation. The findings revealed that though the teachers possess knowledge of a variety of assessment methods they do not systematically use these methods to collect information which could help in improving student learning. Oral questioning remained the dominant method of student assessment. The study also showed that the teachers made minimal to big changes depending on the particular aspect of formative assessment being considered. For aspects which needed just behavioral adaptations, the changes were significant but for those which needed acquisition of more pedagogic knowledge and skills the changes were minimal. In terms of constraints in the practice of formative assessment, the teachers cited large class size and lack of teaching materials as common ones. When provided with the opportunity to acquire teaching materials, however, they did not effectively utilize the opportunity. The study revealed a need for the acquisition of inquiry skills by the teachers which can serve as a platform for the
Røykenes, Kari; Smith, Kari; Larsen, Torill M B
Test anxiety affects the learning, performance and well-being of students, and it increases as the stakes get higher. Norwegian nursing students must pass a drug calculation test with a flawless performance if they are to qualify as nurses. The aim of the current study was to investigate the test anxiety experiences of students faced with such a high-stakes test. We used a mixed methods approach where the data were collected using a survey questionnaire and a focus group interview. In total, 203 freshman nursing students completed the questionnaire, six of whom also participated in the focus group interview. The survey results showed that 44.3% of the students reported high mathematics test anxiety in the months before the drug calculation test. More than 12% of the high-anxiety students reported a low mathematics self-concept. High and medium self-concept students also experienced high test anxiety. Our analysis of the focus group interview data confirmed that the high stakes of the test increased the test anxiety dramatically.
Starr, Joshua P.; Spellings, Margaret
More than 40 states plan to assess student performance with new tests tied to the Common Core State Standards. In summer 2013, results from Common Core-aligned tests in New York showed a steep decline in outcomes. Common Core advocates hailed the scores as an honest accounting of school and student performance, while others worried that they…
In the current climate of high stakes testing and tough love rhetoric, many educational stakeholders have become increasingly reliant on standardized test scores to determine whether or not individual students, teachers, and schools--and even entire districts and states--are successful. In contrast to the black and white picture that test-driven…
South Korea's students consistently outperform their counterparts in almost every country in reading and math. Experts have concluded, however, that the South Korean education system has produced students who score well on tests, but fall short on creativity and innovative thinking. They blame these shortcomings on schools' emphasis on rote…
Hughes, Jan N; Chen, Qi; Thoemmes, Felix; Kwok, Oi-Man
The association between grade retention in first grade and passing the third grade state accountability tests, the Texas Assessment of Knowledge and Skills (TAKS) reading and math, was investigated in a sample of 769 students who were recruited into the study when they were in first grade. Of these 769 students, 165 were retained in first grade and 604 were promoted. Using propensity matching, we created five imputed datasets (average N=321) in which promoted and retained students were matched on 67 comprehensive covariates. Using GEE models, we obtained the association between retention and passing the 3(rd) grade TAKS reading and math tests. The positive association between retention and math scores was significant while the association was marginally significant for reading scores.
Banerjee, Jayanti; Papageorgiou, Spiros
The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Brinckerhoff, Loring C.; Banerjee, Manju
The process of submitting documentation to testing agencies as proof of a disability can be time consuming, expensive, and even intimidating to test takers with learning disabilities. Misconceptions about the accommodations review process employed by testing agencies add to the anxiety that many test takers feel around obtaining approval for…
This article examines major trends in testing and accountability reform in the United States over the past decade. The review covers the apex and decline of the national experimentation with a range of alternative assessments and the rise of test-based accountability as a central policy initiative. These trends signify that testing has become a…
Zhan, Ying; Wan, Zhi Hong
Test takers' beliefs or experiences have been overlooked in most validation studies in language education. Meanwhile, a mutual exclusion has been observed in the literature, with little or no dialogue between validation studies and studies concerning the uses and consequences of testing. To help fill these research gaps, a group of Senior III…
Drummond, Todd W.
Cross-lingual tests are assessment instruments created in one language and adapted for use with another language group. Practitioners and researchers use cross-lingual tests for various descriptive, analytical and selection purposes both in comparative studies across nations and within countries marked by linguistic diversity (Hambleton, 2005).…
Brown, Christopher P.
Background: Policymakers' use of high-stakes exams to improve students' academic achievement affects teachers and their tenure in the field at all levels of schooling. Novice teachers now being inducted into the field have been educated almost exclusively in these high-stakes learning environments. Yet, how their familiarity with these contexts…
Michaelides, Michalis P.
Student examinees are key stakeholders in large-scale, high-stakes, public examination systems. How they perceive the purpose, comprehend the technical characteristics of testing and how they interpret scores influence their response to the system demands and their preparation for the examinations; this information relates to intended and…
Because of the growing focus on the production of favorable academic standardized test scores, schools have become increasingly resistant to sponsoring nonacademic programming, such as tobacco cessation services for students. Nevertheless, the need for such programs has not diminished. The purpose of this article is to provide descriptive information about the logistics of establishing and delivering a health intervention in schools that are resistant to nonacademic programming. The data were collected as part of a qualitative retrospective process evaluation of Full Court Press, a 5-year youth tobacco demonstration project funded by the Robert Woods Johnson Foundation and implemented in Tucson, Arizona. Lessons learned about recruiting schools, integrating programs, and managing facilitators are presented.
DeWitt, Scott W.; Patterson, Nancy; Blankenship, Whitney; Blevins, Brooke; DiCamillo, Lorrei; Gerwin, David; Gradwell, Jill M.; Gunn, John; Maddox, Lamont; Salinas, Cinthia; Saye, John; Stoddard, Jeremy; Sullivan, Caroline C.
This study indicates that the state-mandated high-stakes social studies assessments in four states do not require students to demonstrate that they have met the cognitive demands articulated in the state-mandated learning standards. Further, the assessments do not allow students to demonstrate the critical thinking skills required by the…
Tuerk, Peter W
The federal No Child Left Behind Act (NCLB; 2001), mandating standardized testing in public schools, provides researchers with unprecedented opportunities for scientific comparison. At the same time, the climate of high-stakes testing encouraged by the law merits empirical scrutiny from psychologists across an array of specialties. If researchers wish to advance policy through psychological science, they must take care to construct research designs that are meaningful to policymakers and professionals in other disciplines. The present study used data from 1,450 Virginia schools to provide a model of scientifically grounded research that is also informed by current legal and political contexts. Results indicate that student poverty and geography are associated with differential access to highly qualified teachers, and that differential access to qualified teachers is uniquely associated with performance on high-stakes achievement tests. Psychologists, with their unique training, are encouraged to take a more active role in using NCLB data.
Administrators and teachers in several large districts nationwide have cheated on standardized tests to make achievement levels look better than they actually were. The offenses range from giving students advance answers to questions on standardized tests, to erasing and changing unsatisfactory answers. As a result of district and state…
Clement, Mary C.
The risks are high when it comes to hiring a new principal. A principal is accountable for the safety, well-being and achievement of all the children in a school, as well as for representing the school to the community. With increasing demands on building administrators, the hiring of principals certainly may be considered high stakes. For several…
How Principals Level the Playing Field of Accountability in Florida's High-Poverty/Low-Performing Schools--Part I: The Intersection of High-Stakes Testing and Effects of Poverty on Teaching and Learning.
Acker-Hocevar, Michele; Touchton, Debra
Case study examines attitudes of principals in selected high-poverty/low-performing Florida schools toward the state's system of school improvement and accountability. Finds that principals oppose the high-stakes nature of the Florida Comprehensive Assessment Test because of unfair comparisons with more affluent schools. Principals believe that…
Lievens, Filip; Patterson, Fiona
In high-stakes selection among candidates with considerable domain-specific knowledge and experience, investigations of whether high-fidelity simulations (assessment centers; ACs) have incremental validity over low-fidelity simulations (situational judgment tests; SJTs) are lacking. Therefore, this article integrates research on the validity of…
Tavakol, Mohsen; Dennick, Reg
As great emphasis is rightly placed upon the importance of assessment to judge the quality of our future healthcare professionals, it is appropriate not only to choose the most appropriate assessment method, but to continually monitor the quality of the tests themselves, in a hope that we may continually improve the process. This article stresses the importance of quality control mechanisms in the exam cycle and briefly outlines some of the key psychometric concepts including reliability measures, factor analysis, generalisability theory and item response theory. The importance of such analyses for the standard setting procedures is emphasised. This article also accompanies two new AMEE Guides in Medical Education (Tavakol M, Dennick R. Post-examination Analysis of Objective Tests: AMEE Guide No. 54 and Tavakol M, Dennick R. 2012. Post examination analysis of objective test data: Monitoring and improving the quality of high stakes examinations: AMEE Guide No. 66) which provide the reader with practical examples of analysis and interpretation, in order to help develop valid and reliable tests.
Alavi, Seyed Mohammad; Bordbar, Soodeh
Differential Item Functioning (DIF) analysis is a key element in evaluating educational test fairness and validity. One of the frequently cited sources of construct-irrelevant variance is gender which has an important role in the university entrance exam; therefore, it causes bias and consequently undermines test validity. The present study aims…
Sackett, P R; Schmitt, N; Ellingson, J E; Kabin, M B
Cognitively loaded tests of knowledge, skill, and ability often contribute to decisions regarding education, jobs, licensure, or certification. Users of such tests often face difficult choices when trying to optimize both the performance and ethnic diversity of chosen individuals. The authors describe the nature of this quandary, review research on different strategies to address it, and recommend using selection materials that assess the full range of relevant attributes using a format that minimizes verbal content as much as is consistent with the outcome one is trying to achieve. They also recommend the use of test preparation, face-valid assessments, and the consideration of relevant job or life experiences. Regardless of the strategy adopted, it is unreasonable to expect that one can maximize both the performance and ethnic diversity of selected individuals.
East, Martin; King, Chris
In the listening component of the IELTS examination candidates hear the input once, delivered at "normal" speed. This format for listening can be problematic for test takers who often perceive normal speed input to be too fast for effective comprehension. The study reported here investigated whether using computer software to slow down…
This research studies the effects of mobility on the high-stakes test scores of a Title I South Central Texas school district. The study involved 10, 5th-grade elementary feeder school populations graduating to the 6th grade in 3 middle schools. The researcher compared the 1st administration scores of the Texas Assessment of Knowledge and Skills…
Tan, May; Turner, Carolyn E.
In Quebec the high-stakes Secondary Five ESL exit writing exam developed by the Education Ministry (MELS) is administered and corrected by classroom teachers. In this distinctive situation, the MELS works toward aligning classroom-based assessment (CBA) and the writing exam by making ongoing teacher involvement part of its development and…
Yerdelen-Damar, Sevda; Elby, Andrew
This study investigates how elite Turkish high school physics students claim to approach learning physics when they are simultaneously (i) engaged in a curriculum that led to significant gains in their epistemological sophistication and (ii) subject to a high-stakes college entrance exam. Students reported taking surface (rote) approaches to…
Putwain, David William; Symes, Wendy
Previous work has examined how messages communicated to students prior to high-stakes exams, that emphasise the importance of avoiding failure for subsequent life trajectory, may be appraised as threatening. In two studies, we extended this work to examine how students may also appraise such messages as challenging or disregard them as being of…
Nelson, J. Ruth
This present study examined the intended and unintended consequences of Minnesota's high stakes graduation exam on students with disabilities. Historically, little empirical data have been collected and the scant data available suggest some significant unintended consequences for educational accountability systems (e.g., the retention of students…
Lievens, Filip; Patterson, Fiona
In high-stakes selection among candidates with considerable domain-specific knowledge and experience, investigations of whether high-fidelity simulations (assessment centers; ACs) have incremental validity over low-fidelity simulations (situational judgment tests; SJTs) are lacking. Therefore, this article integrates research on the validity of knowledge tests, low-fidelity simulations, and high-fidelity simulations in advanced-level high-stakes settings. A model and hypotheses of how these 3 predictors work in combination to predict job performance were developed. In a sample of 196 applicants, all 3 predictors were significantly related to job performance. Both the SJT and the AC had incremental validity over the knowledge test. Moreover, the AC had incremental validity over the SJT. Model tests showed that the SJT fully mediated the effects of declarative knowledge on job performance, whereas the AC partially mediated the effects of the SJT.
Stillman, Jamy; Anderson, Lauren
Considerable research indicates that high-stakes accountability policies have the capacity to influence language arts instruction, particularly in urban, high-needs schools where pressure to increase test scores tends to be most acute. This article utilizes Cultural Historical Activity Theory to critically examine the constraints and affordances…
Ditkowsky, Ben; Koonce, Danel A.
The current study examines the predictive relationship of "Dynamic Indicators of Basic Early Literacy Skills" oral reading fluency (DORF) scores to high-stakes test performance. Data were collected from 423 students. The participants were divided into three groups based on the level of progress that was made from the fall to the spring benchmark…
McKenzie, Kathryn Bell
In this era of accountability and high stakes testing, district and school administrators are vigilant in their attention to student test scores and the ramifications these have for district and school performance labels. In other words, no school or district wants to be labeled "low performing." This case, based on a real situation, demonstrates…
Newhouse, C. Paul; Tarricone, Pina
High-stakes external assessment for practical courses is fraught with problems impacting on the manageability, validity and reliability of scoring. Alternative approaches to assessment using digital technologies have the potential to address these problems. This paper describes a study that investigated the use of these technologies to create and…
Thomas, P. Ann
The focus of the investigation is on a sixth grade population not performing reading on grade level and not achieving high-stakes test score proficiency causing the school to fail adequate yearly progress (AYP). The lack of reading skills causes the students to repeat grades in middle school and high school. Reading technology instruction is the…
Briggs, Derek C.
The use of large-scale assessments for making high stakes inferences about students and the schools in which they are situated is premised on the assumption that tests are sensitive to good instruction. An increase in the quality of classroom instruction should cause, on the average, an increase in test scores. In work with a number of colleagues…
Kiany, Gholam Reza; Shayestefar, Parvaneh; Samar, Reza Ghafar; Akbari, Ramin
A steady stream of studies on high-stakes tests such as University Entrance Examinations (UEEs) suggests that high-stakes tests reforms serve as the leverage for promoting quality of learning, standards of teaching, and credible forms of accountability. However, such remediation is often not as effective as hoped and success is not necessarily…
Ketter, Jean; Pool, Jonelle
Explores the effects of a high-stakes, direct writing test on three teachers and their students. Suggests that an emphasis on test preparation diminished the likelihood of the teachers' engaging in reflective practice that is sensitive to the needs of individual students, and that the high-stakes assessment process discounted the validity of…
High stakes testing is a given in many public school districts in the United States. This paper reports the chilling effect high stakes testing had on the pedagogy of one teacher. The study took place in a large Midwestern urban district where a university consultant observed a fifth-grade classroom. This researcher was able to observe and…
McNeil, Linda McSpadden; Coppola, Eileen; Radigan, Judy; Heilig, Julian Vasquez
In the state of Texas, whose standardized, high-stakes test-based accountability system became the model for the nation's most comprehensive federal education policy, more than 135,000 youth are lost from the state's high schools every year. Dropout rates are highest for African American and Latino youth, more than 60% for the students we…
Greuel, Dirk; Deeken, Jan; Suslov, Dmitry; Schäfer, Klaus; Schlechtriem, Stefan
The LOX/LH2 Staged Combustion Rocket Engine Demonstrator (SCORE-D) is part of ESA's Future Launcher Preparatory Program (FLPP). SCORE-D serves as a technology demonstrator in perspective of the development of the High Thrust Engine (HTE), which is designated as a candidate for the main stage engine of the Next Generation Launcher (NGL). To develop and test the SCORE-D engine, ESA investigates configurations of the test benches P3.2 and P5 at DLR test site in Lampoldshausen. For the SCORE-D Hot Combustion Devices (HCD) development, i.e. Pre-burner (PB) and thrust chamber assembly (TCA), the P3.2 test facility has to be modified for further usage. Recently, the first steps in this endeavor have been made with the evaluation of the necessary modifications to the facility. To accommodate the SCORE-D engine, it is foreseen to modify the P5 test facility in the coming years. In the last year, DLR has started the design phase for these modifications. In preparatory test programs at the P8 test facility, Astrium has conducted sub-scale hot combustion devices tests. While Astrium designed and manufactured the sub-scale assembly of the pre-burner and the main combustion chamber (MCC) for SCORE-D, DLR operated the P8 test facility.
Yerdelen-Damar, Sevda; Elby, Andrew
This study investigates how elite Turkish high school physics students claim to approach learning physics when they are simultaneously (i) engaged in a curriculum that led to significant gains in their epistemological sophistication and (ii) subject to a high-stakes college entrance exam. Students reported taking surface (rote) approaches to learning physics, largely driven by college entrance exam preparation and therefore focused on algorithmic problem solving at the expense of exploring concepts and real-life examples more deeply. By contrast, in recommending study strategies to "Arzu," a hypothetical student who doesn't need to take a college entrance exam and just wants to understand physics deeply, the students focused more on linking concepts and real-life examples and on making sense of the formulas and concepts—deep approaches to learning that reflect somewhat sophisticated epistemologies. These results illustrate how students can epistemically compartmentalize, consciously taking different epistemic stances—different views of what counts as knowing and learning—in different contexts even within the same discipline.
Fields, Henry W; Fields, Anne M; Beck, F Michael
The purpose of this study was to determine whether gender affects high-stakestest performance among dental students. Our sample consisted of 128 women and 323 men from six consecutive dental classes for which we recorded AADSAS overall and science predental GPAs; Dental Admission Test (DAT) scores; National Board Dental Examination (NBDE) I and II scores and pass/fail status; North East Regional Board of Dental Examiners (NERB) pass/fail status; and cumulative GPAs following the spring quarter of year two and summer quarter of year four of dental school. DAT scores, when controlled for previous academic performance, revealed that men significantly outperformed women in all areas except reading comprehension and biology, where the women's scores significantly exceeded the men's and were comparable, respectively. NBDE I results favored men and approached significance (p = 0.066), while for Part II men significantly outscored women. NBDE I and II and NERB pass rates showed no significant differences. These board results were also controlled for previous academic performance. Although we found that differences existed between genders, which appear to be the ramification of the classic high-stakes dilemma (women do as well as men in the classroom and on course-related tests, but less well on gatekeeper board exams), the context mitigates their operational effects. DAT differences are likely reduced by most admissions processes, but may be problematic when selected predictive algorithms are used. Practically, the NBDE I and II results are unlikely to meaningfully influence women's academic progress in dental school or postgraduate education admissions due to their magnitude and timing.
Koretz, Daniel; Jennings, Jennifer L.; Ng, Hui Leng; Yu, Carol; Braslow, David; Langi, Meredith
Test-based accountability often produces score inflation. Most studies have evaluated inflation by comparing trends on a high-stakes test and a lower stakes audit test. However, Koretz and Beguin (2010) noted weaknesses of audit tests and suggested self-monitoring assessments (SMAs), which incorporate audit items into high-stakes tests. This…
Boulet, John R
Throughout their careers, physicians are exposed to a wide array of assessments, including those aimed at evaluating knowledge, clinical skills, and clinical decision-making. While many of these assessments are used as part of formative evaluation activities, others are employed to establish competence and, as a byproduct, to promote patient safety. In the past 10 years, simulations have been successfully incorporated in a number of high-stakes physician certification and licensure exams. In developing these simulation-based assessments, testing organizations were able to promote novel test administration protocols, build enhanced assessment rubrics, advance sophisticated scoring and equating algorithms, and promote innovative standard-setting methods. Moreover, numerous studies have been conducted to identify potential threats to the validity of test score interpretations. As simulation technology expands and new simulators are invented, this groundbreaking work can serve as a basis for organizations to build or expand their summative assessment activities. Although there will continue to be logistical and psychometric problems, many of which will be specialty- or simulator-specific, past experience with performance-based assessments suggests that most challenges can be addressed through focused research. Simulation, whether it involves standardized patients (SPs), computerized case management scenarios, part-task trainers, electromechanical mannequins, or a combination of these methods, holds great promise for high-stakes assessment.
Royal, Kenneth D.; Gilliland, Kurt O.; Kernick, Edward T.
Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high-stakes testing arena rely on classical test theory…
McDermott, Paul A; Watkins, Marley W; Rhoad, Anna M
Assessor bias variance exists for a psychological measure when some appreciable portion of the score variation that is assumed to reflect examinees' individual differences (i.e., the relevant phenomena in most psychological assessments) instead reflects differences among the examiners who perform the assessment. Ordinary test reliability estimates and standard errors of measurement do not inherently encompass assessor bias variance. This article reports on the application of multilevel linear modeling to examine the presence and extent of assessor bias in the administration of the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) for a sample of 2,783 children evaluated by 448 regional school psychologists for high-stakes special education classification purposes. It was found that nearly all WISC-IV scores conveyed significant and nontrivial amounts of variation that had nothing to do with children's actual individual differences and that the Full Scale IQ and Verbal Comprehension Index scores evidenced quite substantial assessor bias. Implications are explored.
Zwerling, Harris L.
In the context of controversy over the use of high stakes testing, the Pennsylvania State Education Association (PSEA) asked for an evaluation of the performance levels and cut score of the Pennsylvania System of School Assessment (PSSA) mathematics and reading tests. While awaiting technical documentation from the Pennsylvania Department of…
Accountability for student achievement is required by legislation and demanded by the public. Testing is the method of choice for determining student achievement and for informing teachers, parents and students about what students know and still need to learn. State tests that meet the demands of federal legislation have far-reaching consequences…
Escudier, M. P.; Newton, T. J.; Cox, M. J.; Reynolds, P. A.; Odell, E. W.
This study compared higher education dental undergraduate student performance in online assessments with performance in traditional paper-based tests and investigated students' perceptions of the fairness and acceptability of online tests, and showed performance to be comparable. The project design involved two parallel cross-over trials, one in…
Taylor, Grace; Shepard, Lorrie; Kinner, Freya; Rosenthal, Justin
Using a random sample of 1,000 Colorado teachers, this study surveyed the effects of standards, the Colorado Student Assessment Program (CSAP), and school report cards on instruction and test-related practices. Findings show that standards were perceived to have greater impact on improving instruction than did testing. Teachers said they aligned…
The use of aptitude and competency testing in the public school system can have some undesired effects on students in terms of the stress and anxiety tests can impose. Adolescence is already a time in a child's life when many different pressures are coming to bear upon the psyche. Among the most undesirable consequences that can result from the…
Tavakol, Mohsen; Dennick, Reg
The purpose of this Guide is to provide both logical and empirical evidence for medical teachers to improve their objective tests by appropriate interpretation of post-examination analysis. This requires a description and explanation of some basic statistical and psychometric concepts derived from both Classical Test Theory (CTT) and Item Response Theory (IRT) such as: descriptive statistics, explanatory and confirmatory factor analysis, Generalisability Theory and Rasch modelling. CTT is concerned with the overall reliability of a test whereas IRT can be used to identify the behaviour of individual test items and how they interact with individual student abilities. We have provided the reader with practical examples clarifying the use of these frameworks in test development and for research purposes.
Glover, Todd A.; Reddy, Linda A.; Kettler, Ryan J.; Kunz, Alexander; Lekwa, Adam J.
The accountability movement and high-stakes testing fail to attend to ongoing instructional improvements based on the regular assessment of student skills and teacher practices. Summative achievement data used for high-stakes accountability decisions are collected too late in the school year to inform instruction. This is especially problematic…
Yeager, Elizabeth Anne, Ed.; Davis, O. L., Ed.
The chapters in this volume illustrate how teachers are bringing creativity, higher-order thinking, and meaningful learning activities into particular school settings despite pressures of standards and testing. The editors chose the word wise for the title of this book, and they use it frequently to describe the pedagogical practices they have…
Clarke, Marguerite; Shore, Arnold; Rhoades, Kathleen; Abrams, Lisa; Miao, Jing; Li, Jie
The goal of this study was to identify the effects of state-level standards-based reform on teaching and learning, paying particular attention to the state test and associated stakes. On-site interviews were conducted with 360 educators (elementary, middle, and high school teachers) in 3 states (120 in each state) attaching different stakes to the…
Trinkle, James M., II
Relatively recent federal education initiatives, such as No Child Left Behind (NCLB; 2001), have focused on school accountability for student achievement including achievement of traditionally at-risk populations, such as students in special education, students from low-income or high poverty areas, and students who speak English as a new second…
Attali, Yigal; Lewis, Will; Steier, Michael
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Keller, Shani Malaika
Framed by a discussion of the heightened importance of science education in the U.S., this paper describes the prevalence, content, and format of high-stakes science assessments in the U.S. and explores the possibility that differences in assessment format may affect score gaps among student subgroups. An analysis of proficiency rates for 2010-11 high school exit exams in science was inconclusive; however, score gaps among ethnic subgroups on the 2009 grade 12 NAEP science assessment were larger for multiple choice items than for performance-based components. Further, a comparison of subgroup score gaps on the 2009 NAEP science assessment and those on the ACT science subtest suggest that the assessment with more diverse and innovative items resulted in a smaller gap in subgroup test scores. These findings point to the need for greater investigation of the extent to which item type affects subgroup score differences on science assessments.
Thibodeau, Janice J.
A diagnostic-prescriptive scheme is illustrated using subtests of the Slingerland Screening Tests for Identifying Children with Specific Language Disability and the Detroit Tests of Learning Aptitude. The scheme is intended to focus on the child's learning style by examining the task and the strategies employed. (CL)
Gregory, Kelvin; Clarke, Marguerite
Presents an overview of the English and Singaporean education systems, focusing on the high stakes assessment systems operating at the elementary level in both countries. The effects of these high stakes assessments on teachers and students are described, noting potential lessons for U.S. educators related to establishing credibility, engaging the…
This article provides an introduction to the kind of computer software that is used to score student writing in some high stakes testing programs, and that is being promoted as a teaching and learning tool to schools. It sketches the state of play with machines for the scoring of writing, and describes how these machines work and what they do.…
Putwain, David W.; Daly, Anthony L.; Chamberlain, Suzanne; Sadreddini, Shireen
Background: Prior research has shown that test anxiety is negatively related to academic buoyancy, but it is not known whether test anxiety is an antecedent or outcome of academic buoyancy. Furthermore, it is not known whether academic buoyancy is related to performance on high-stakes examinations. Aims: To test a model specifying reciprocal…
Breithaupt, Krista; Hare, Donovan R.
Many challenges exist for high-stakes testing programs offering continuous computerized administration. The automated assembly of test questions to exactly meet content and other requirements, provide uniformity, and control item exposure can be modeled and solved by mixed-integer programming (MIP) methods. A case study of the computerized…
Thibeault, Matthew D.
The author argues that concerts create pressures on the music curriculum similar to those high-stakes tests generate on the general curriculum. Three similarities are presented and discussed using the example of a concert the author organized: first, teaching to the test and the narrowing of curricular goals; second, evaluation by a single source…
Wold, Donald C.
In the 20 years since the federal report on education "A Nation at Risk" appeared, much has been written on test scores of students in the United States versus their counterparts elsewhere. One of the issues is whether their scores are in fact inferior, or merely a statistical difference due to their universal schooling philosophy. Since…
Uhlmann, Eric Luis; Barnes, Christopher M
High-stakes team competitions can present a social dilemma in which participants must choose between concentrating on their personal performance and assisting teammates as a means of achieving group objectives. We find that despite the seemingly strong group incentive to win the NBA title, cooperative play actually diminishes during playoff games, negatively affecting team performance. Thus team cooperation decreases in the very high stakes contexts in which it is most important to perform well together. Highlighting the mixed incentives that underlie selfish play, personal scoring is rewarded with more lucrative future contracts, whereas assisting teammates to score is associated with reduced pay due to lost opportunities for personal scoring. A combination of misaligned incentives and psychological biases in performance evaluation bring out the "I" in "team" when cooperation is most critical.
Uhlmann, Eric Luis; Barnes, Christopher M.
High-stakes team competitions can present a social dilemma in which participants must choose between concentrating on their personal performance and assisting teammates as a means of achieving group objectives. We find that despite the seemingly strong group incentive to win the NBA title, cooperative play actually diminishes during playoff games, negatively affecting team performance. Thus team cooperation decreases in the very high stakes contexts in which it is most important to perform well together. Highlighting the mixed incentives that underlie selfish play, personal scoring is rewarded with more lucrative future contracts, whereas assisting teammates to score is associated with reduced pay due to lost opportunities for personal scoring. A combination of misaligned incentives and psychological biases in performance evaluation bring out the “I” in “team” when cooperation is most critical. PMID:24763384
Bensnes, Simon Søbstad
Pollen is known to cause allergic reactions and affect cognitive performance in around 20% of the population. Although pollen season peaks when students take high-stakes exams, the effect of pollen allergies on school performance has received nearly no attention from economists. Using a student fixed effects model and administrative Norwegian data, this paper finds that increasing the ambient pollen levels by one standard deviation at the mean leads to a 2.5% standard deviation decrease in test scores, with potentially larger effects for allergic students. There also appear to be longer-run effects. The findings imply that random increases in pollen counts reduce test scores for allergic students relative to their peers, who consequently will be at a disadvantage when competing for jobs or higher education. This paper contributes to the literature by illuminating the interplay between individual health and human capital accumulation, which in turn can impact long-run economic growth.
Witte, John F.; Wolf, Patrick J.; Cowen, Joshua M.; Carlson, Deven E.; Fleming, David J.
This article considers the impact of a high-stakes testing and reporting requirement on students using publicly funded vouchers to attend private schools. We describe how such a policy was implemented during the course of a previously authorized multi-year evaluation of the Milwaukee Parental Choice Program, which provided us with data on voucher…
Baines, Lawrence A.; Stanley, Gregory Kent
High-stakes testing costs up to $50 billion per annum, has no impact on student achievement, and has changed the focus of American public schools. This article analyzes the benefits and costs of the accountability movement, as well as discusses its roots in the eugenics movements of the early 20th century.
Margo, Desiree Marie
The intense focus on standards and accountability is rapidly altering the education environment. Often the gauge for measuring school effectiveness is performance on high-stake state tests. In this retrospective cohort comparison study, I observe the relation between the use of curriculum-based measures (CBMs) for reading and change on a state…
Pandya, Jessica Zacher
This timely book explores what is often overlooked in policy debates about the education of English language learners: how the day-to-day dynamics of the classroom are affected by high-stakes testing and the pressures students and teachers experience and internalize as a result. The author presents and analyzes classroom observations, student…
Vogler, Kenneth E.
The purpose of this study was to determine if the public release of student results on high-stakes, state-mandated performance assessments influence instructional practices, and if so in what manner. The research focused on changes in teachers' instructional practices and factors that may have influenced such changes since the public release of high-stakes, state-mandated student performance assessment scores. The data for this study were obtained from a 54-question survey instrument given to a stratified random sample of teachers teaching at least one section of 10th grade English, mathematics, or science in an academic public high school within Massachusetts. Two hundred and fifty-seven (257) teachers, or 62% of the total sample, completed the survey instrument. An analysis of the data found that teachers are making changes in their instructional practices. The data show notable increases in the use of open-response questions, creative/critical thinking questions, problem-solving activities, use of rubrics or scoring guides, writing assignments, and inquiry/investigation. Teachers also have decreased the use of multiple-choice and true-false questions, textbook-based assignments, and lecturing. Also, the data show that teachers felt that changes made in their instructional practices were most influenced by an "interest in helping my students attain MCAS assessment scores that will allow them to graduate high school" and by an "interest in helping my school improve student (MCAS) assessment scores," Finally, mathematics teachers and teachers with 13--19 years of experience report making significantly more changes than did others. It may be interpreted from the data that the use of state-mandated student performance assessments and the high-stakes attached to this type of testing program contributed to changes in teachers' instructional practices. The changes in teachers' instructional practices have included increases in the use of instructional practices deemed
Bellinger, David B; DeCaro, Marci S; Ralston, Patricia A S
Mindfulness enhances emotion regulation and cognitive performance. A mindful approach may be especially beneficial in high-stakes academic testing environments, in which anxious thoughts disrupt cognitive control. The current studies examined whether mindfulness improves the emotional response to anxiety-producing testing situations, freeing working memory resources, and improving performance. In Study 1, we examined performance in a high-pressure laboratory setting. Mindfulness indirectly benefited math performance by reducing the experience of state anxiety. This benefit occurred selectively for problems that required greater working memory resources. Study 2 extended these findings to a calculus course taken by undergraduate engineering majors. Mindfulness indirectly benefited students' performance on high-stakes quizzes and exams by reducing their cognitive test anxiety. Mindfulness did not impact performance on lower-stakes homework assignments. These findings reveal an important mechanism by which mindfulness benefits academic performance, and suggest that mindfulness may help attenuate the negative effects of test anxiety.
Koch, Martha J.; DeLuca, Christopher
In this article we rethink validation within the complex contexts of high-stakes assessment. We begin by considering the utility of existing models for validation and argue that these models tend to overlook some of the complexities inherent to assessment use, including the multiple interpretations of assessment purposes and the potential…
Brown, Dianne C.
The decline in standardized test scores during the 1960s and 1970s is well documented and is seen in both aptitude and achievement test scores. This paper describes and analyzes the test score trends over the 1960s, 1970s and early 1980s for five aptitude tests: (1) the Scholastic Aptitude Test; (2) the American College Test; (3) the Preliminary…
Levin, Henry M.
Around the world we hear considerable talk about creating world-class schools. Usually the term refers to schools whose students get very high scores on the international comparisons of student achievement such as PISA or TIMSS. The practice of restricting the meaning of exemplary schools to the narrow criterion of achievement scores is usually…
van Hover, Stephanie; Hicks, David; Sayeski, Kristin
In order to provide increasing support for students with disabilities in inclusive classrooms in high-stakes testing contexts, some schools have implemented co-teaching models. This qualitative case study explores how 1 special education teacher (Anna) and 1 general education history teacher (John) make sense of working together in an inclusive…
Azzam, Tarek; Levine, Bret
The role of politics has often been discussed in evaluation theory and practice. The political influence of the situation can have major effects on the evaluation design, approach and methods. Politics also has the potential to influence the decisions made from the evaluation findings. The current study focuses on the influence of the political context on stakeholder decision making. Utilizing a simulation scenario, this study compares stakeholder decision making in high and low stakes evaluation contexts. Findings suggest that high stakes political environments are more likely than low stakes environments to lead to reduced reliance on technically appropriate measures and increased dependence on measures better reflect the broader political environment.
Nicewander, W Alan
The most widely used, general index of measurement precision for psychological and educational test scores is the reliability coefficient-a ratio of true variance for a test score to the true-plus-error variance of the score. In item response theory (IRT) models for test scores, the information function is the central, conditional index of measurement precision. In this inquiry, conditional reliability coefficients for a variety of score types are derived as simple transformations of information functions. It is shown, for example, that the conditional reliability coefficient for an ordinary, number-correct score, X, is equal to, ρ(X,X'|θ)=I(X,θ)/[I(X,θ)+1] Where: θ is a latent variable measured by an observed test score, X; p(X, X'|θ) is the conditional reliability of X at a fixed value of θ; and I(X, θ) is the score information function. This is a surprisingly simple relationship between the 2, basic indices of measurement precision from IRT and classical test theory (CTT). This relationship holds for item scores as well as test scores based on sums of item scores-and it holds for dichotomous as well as polytomous items, or a mix of both item types. Also, conditional reliabilities are derived for computerized adaptive test scores, and for θ-estimates used as alternatives to number correct scores. These conditional reliabilities are all related to information in a manner similar-or-identical to the 1 given above for the number-correct (NC) score. (PsycINFO Database Record
Schwartz, Sarah M.; Evans, Cathy; Agur, Anne M.R.
Students in health care professional programs face many stressful tests that determine successful completion of their program. Test anxiety during these high stakes examinations can affect working memory and lead to poor outcomes. Methods of decreasing test anxiety include lengthening the time available to complete examinations or evaluating…
Rawlusyk, Kevin James
Test items used to assess learners' knowledge on high-stakes science examinations contain contextualized questions that unintentionally assess reading skill along with conceptual knowledge. Therefore, students who are not proficient readers are unable to comprehend the text within the test item to demonstrate effectively their level of science knowledge. The purpose of this quantitative study was to understand what reading attributes were required to successfully answer the Biology 30 Diploma Exam. Furthermore, the research sought to understand the cognitive relationships among the reading attributes through quantitative analysis structured by the Attribute Hierarchy Model (AHM). The research consisted of two phases: (1) Cognitive development, where the cognitive attributes of the Biology 30 Exam were specified and hierarchy structures were developed; and (2) Psychometric analysis, that statistically tested the attribute hierarchy using the Hierarchy Consistency Index (HCI), and calculate attribute probabilities. Phase one of the research used January 2011, Biology 30 Diploma Exam, while phase two accessed archival data for the 9985 examinees who took the assessment on January 24th, 2011. Phase one identified ten specific reading attributes, of which five were identified as unique subsets of vocabulary, two were identified as reading visual representations, and three corresponded to general reading skills. Four hierarchical cognitive model were proposed then analyzed using the HCI as a mechanism to explain the relationship among the attributes. Model A had the highest HCI value (0.337), indicating an overall poor data fit, yet for the top achieving examinees the model had an excellent model fit with an HCI value of 0.888, and for examinees that scored over 60% there was a moderate model fit (HCI = 0.592). Linear regressions of the attribute probability estimates suggest that there is a cognitive relationship among six of the ten reading attributes (R2 = 0.958 and 0
Haladyna, Thomas M.
This article argues that the validity of standardized achievement test-score interpretation and use is problematic; consequently, confidence and trust in such test scores may often be unwarranted. The problem is particularly severe in high-stakes situations. This essay provides a context for understanding standardized achievement testing, then…
Traditionally, the test score represented by the number of items answered correctly was taken as an indicator of the examinee's ability level. Researchers still tend to think that the number-correct score is a way of ordering individuals with respect to the latent trait. The objective of this study is to depict the benefits of using ability…
Immekus, Jason C.; McGee, Dean
Student effort on large-scale assessments has important implications on the interpretation and use of scores to guide decisions. Within the United States, English Language Learners (ELLs) generally are outperformed on large-scale assessments by non-ELLs, prompting research to examine factors associated with test performance. There is a gap in the literature regarding the test-taking motivation of ELLs compared to non-ELLs and whether existing measures have similar psychometric properties across groups. The Student Opinion Scale (SOS; Sundre, 2007) was designed to be administered after completion of a large-scale assessment to operationalize students’ test-taking motivation. Based on data obtained on 5,257 (41.8% ELL) 10th grade students, study purpose was to test the measurement invariance of the SOS across ELLs and non-ELLs based on completion of low- and high-stakes assessments. Preliminary item analyses supported the removal of two SOS items (Items 3 and 7) that resulted in improved internal consistency for each of the two SOS subscales: Importance, Effort. A subsequent multi-sample confirmatory factor analysis (MCFA) supported the measurement invariance of the scale’s two-factor model across language groups, indicating it met strict factorial invariance (Meredith, 1993). A follow-up latent means analysis found that ELLs had higher effort on both the low- and high-stakes assessment with a small effect size. Effect size estimates indicated negligible differences on the importance factor. Although the instrument can be expected to function similarly across diverse language groups, which may have direct utility of test users and research into factors associated with large-scale test performance, continued research is recommended. Implications for SOS use in applied and research settings are discussed. PMID:27672375
Clausing, Gerhard; Senko, Donna
Cloze testing and language performance is discussed as are two techniques for awarding partial credit: the quick performance measurement and feedback technique and the three-stage scoring hierarchy for partial credit. A figure and tables are included. (EJS)
Tempel, Melissa Bollow
Computerized testing, including the widely used MAP test, has infiltrated the public schools in Milwaukee and across the nation, bringing with it a frightening future for public education. High-stakes standardized tests can be scored almost immediately via the internet, and testing companies can now easily link districts to their online data…
The Jomtien conference in 1990 on Education for All is seen by many as a turning point for the introduction of increased monitoring and evaluation of the quality of education systems around the world. Internationally, debates have arisen about the nature and frequency of assessment and its impact on education systems with its intended and…
Kukucka, Susan R.
Mandates that follow from the No Child Left Behind Act (NCLB, 2002) led to changes to curriculum and classroom instruction. Teachers felt pressured to alter their curriculum and instructional practices. To ensure that students receive a quality education, teacher perceptions of instructional assessment and curriculum is of paramount concern,…
As states begin to demand more rigor on their high-stakes tests--and the tests evolve to incorporate revised academic standards--many officials are gambling that an initial wave of lower scores will give way to greater student achievement in the future. Changes to statewide tests and subsequent plummeting scores sparked controversy and emergency…
Vannest, Kimberly J.; Parker, Richard I.; Davis, John L.; Soares, Denise A.; Smith, Stacey L.
More and more, schools are considering the use of progress monitoring data for high-stakes decisions such as special education eligibility, program changes to more restrictive environments, and major changes in educational goals. Those high-stakes types of data-based decisions will need methodological defensibility. Current practice for…
Vernaza, Natasha A.
Teachers are catalysts to the success of high-stakes accountability policies, yet noticeably absent from previous studies is an examination of teachers' responses toward being held accountable for their students' performance on state-mandated, high-stakes assessments in low socioeconomic status (SES) school settings. An on-line survey instrument…
Weinstein, Lawrence; Laverghetta, Antonio; Alexander, Ralph; Stewart, Megan
The current study is an extension of a previous investigation dealing with teacher greetings to students. The present investigation used teacher greetings with college students and academic performance (test scores). We report data using university students and in-class test performance. Students in introductory psychology who received teachers'…
McIntosh, James; Munk, Martin D.
Latent class Poisson count models are used to analyse a sample of Danish test score results from a cohort of individuals born in 1954-1955, tested in 1968, and followed until 2011. The procedure takes account of unobservable effects as well as excessive zeros in the data. We show that the test scores measure manifest or measured ability as it has…
Schwartz, Sarah M; Evans, Cathy; Agur, Anne M R
Students in health care professional programs face many stressful tests that determine successful completion of their program. Test anxiety during these high stakes examinations can affect working memory and lead to poor outcomes. Methods of decreasing test anxiety include lengthening the time available to complete examinations or evaluating students using untimed examinations. There is currently no consensus in the literature regarding whether untimed examinations provide a benefit to test performance in clinical anatomy. This study aimed to determine the impact of timed versus untimed practical tests on Master of Physical Therapy student anatomy performance and test anxiety. Test anxiety was measured using the State-Trait Anxiety Inventory (STAI). Differences in performance, anxiety scores, and time taken were compared using paired sample Student's t-tests. Eighty-one of the 84 students completed the study and provided feedback. Students performed significantly higher on the untimed test (P = 0.005), with a significant reduction in test anxiety (P < 0.001). Students who were unsuccessful on the timed test showed the greatest improvement on the untimed test ( x¯ = 20.4 ±10%). Eighty-three percent (n = 69) of students preferred the untimed test, 8.4% (n = 7) the timed test, and 8.4% (n = 7) had no preference. Students took on average eight minutes longer on the untimed test. This study found that physical therapy students perform better on untimed tests, which may be related to a reduction in test anxiety. If the intended goal of evaluating health care professional students is to determine fundamental competencies, these factors should be considered when designing future curricula.
Jencks, Christopher, Ed.; Phillips, Meredith, Ed.
The 15 chapters of this book address issues related to the continuing test score gap between black and white students. The editors argue against traditional explanations which emphasize differences in economic resources and demographic factors, and they urge that more emphasis be put on psychological and cultural factors. The book suggests studies…
Miller, Steven C.
The Wyoming Department of Education (WDE) has invested time and money developing standardized achievement test score reports designed to give teachers data about each of their students' levels of mastery of particular concepts in order to differentiate their instruction. The purpose of this study was to determine the extent to which eighth-grade…
Current thinking on validity suggests that educational institutions and individuals should evaluate their uses of test scores in the context of their fundamental goals. Regression coefficients and other traditional criterion-related validity statistics provide relevant information, but often do not, by themselves, address the fundamental reasons…
Smith, Vernon G.; Szymanski, Antonia
This article is for practicing or aspiring school administrators. The demand for excellence in public education has lead to an emphasis on standardized test scores. This article explores the development of a professional enhancement program designed to prepare teachers to teach higher order thinking skills. Higher order thinking is the primary…
Singapore students have scored exceedingly well on international tests in mathematics. In response, there has been a desire in the United States--both at the policy level and at the school level--to emulate Singapore. Because what can be identified most easily about Singapore's school mathematics can be gleaned from curriculum documents from the…
Koretz, Daniel M.; McCaffrey, Daniel F.
Given current high-stakes uses of tests, one of the most pressing and difficult problems confronting the field of measurement is to develop better methods for distinguishing between meaningful gains in performance and score inflation. This study explores the potential usefulness of adapting differential item functioning (DIF) techniques for this…
The purpose of this study was to determine if special education and at-risk students educated exclusively in a school-within-a-school setting showed improved high-stakes standardized reading test scores after learning the strategic instruction model (SIM) inference strategy. This study was focused on four groups of eighth-grade students attending…
Reeves, Edward B.
The system of high-stakes accountability in the Kentucky public schools raises the question of whether teachers and administrators should be held accountable if test scores are influenced by external factors over which educators have no control. This study investigates whether such external factors , or "contextual effects," bias the…
Performance of students in low-stakes testing situations has been a concern and focus of recent research. However, researchers who have examined the effect of stakes on performance have not been able to compare low-stakes performance to truly high-stakes performance of the same students. Results of such a comparison are reported in this article.…
This cutting-edge guide presents multiple approaches to teaching poetry at the middle and high school levels. The author provides field-tested activities with detailed how-to instructions, as well as advice for how educators can "justify" their teaching within a high-stakes curriculum environment. "Let's Poem" will show pre- and inservice teachers…
Johnson, Karen A.
The enactment of No Child Left Behind (2002) and the reauthorization of the Individuals with Disabilities Education Act had a significant impact upon how we hold schools and its students accountable for high stakes testing. In particular, students with educational disabilities who were previously exempted from any performance accountability on…
Cizek, Gregory J.
Advances in validity theory and alacrity in validation practice have suffered because the term "validity" has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test…
The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…
Chafin, Carl K.
For as long as there have been standardized tests that provide "objective" data about student performance, there has been an understandable, though often misguided, inclination to use that data to judge the performance of schools, teachers and students. Paralleling the rise of high-stakes statewide achievement testing in recent years, that…
von Schrader, Sarah; Ansley, Timothy
Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
Test anxiety is one of the most confronting issues in modern times with the increase in the number of standardised and high-stakes testing. Research has established that there is a direct link between test anxiety and cognitive deficits. The aim of this study is to determine the test anxiety scores of the students with intellectual disabilities in…
Accountability has become a primary function of large-scale testing in the United States. The pressure on educators to raise scores is vastly greater than it was several decades ago. Research has shown that high-stakes testing can generate behavioral responses that inflate scores, often severely. I argue that because of these responses, using…
What Works Clearinghouse, 2016
Most colleges and universities in the United States require students to take the SAT or ACT as part of the college application process. These tests are high stakes in at least three ways. First, most universities factor scores on these tests into admissions decisions. Second, higher scores can increase a student's chances of being admitted to…
Jennings, Jennifer L.; Beveridge, Andrew A.
Analyzing data from a large urban district in Texas, this study examines how high-stakes test exemptions alter officially reported scores and asks whether test exemption has implications for the academic achievement of special education students. Test exemption inflated overall passing rates but especially affected the passing rates of African…
Abrams, Lisa M.; Pedulla, Joseph J.; Madaus, George F.
Discusses teachers' views on state-mandated testing. Data from a literature review and teacher surveys indicate that high stakes, state-mandated testing can lead to instruction that contradicts teachers' views of sound educational practice. Teachers feel that pressure to raise test scores encourages them to emphasize instructional and assessment…
Strader, Douglas A.
There are many advantages supporting the use of computers as an alternate mode of delivery for high stakes testing: cost savings, increased test security, flexibility in test administrations, innovations in items, and reduced scoring time. The purpose of this study was to determine if the use of computers as the mode of delivery had any…
McCollough, Cherie A.
The current reform movement in education has two forces that appear contradictory in nature. The first is an emphasis on rigor and accountability that is assessed through high-stakes testing. The second is the recommendation to have student centered approaches to teaching and learning, especially those that emphasize inquiry methodology and constructivist pedagogy. Literature reports that current reform efforts involving accountability through high-stakes tests are detrimental to student learning and are contradictory to student-centered teaching approaches. However, by focusing attention on those teachers who "teach against the grain" and raise the achievement levels of students from diverse backgrounds, instructional strategies and personal characteristics of exemplary teachers can be identified. This mixed-methods research study investigated four exemplary urban high school science teachers in high-stakes (TAKS) tested science classrooms. Classroom observations, teacher and student interviews, pre-/postcontent tests and the Constructivist Learning Environment Survey (CLES) (Johnson & McClure, 2004) provided the main data sources. The How People Learn (National Research Council, 2000) theoretical framework provided evidence of elements of inquiry-based, student-centered teaching. Descriptive case analysis (Yin, 1994) and quantitative analysis of pre/post tests and the CLES revealed the following results. First, all participating teachers included elements of learner-centeredness, knowledge-centeredness, assessment-centeredness and community-centeredness in their teaching as recommended by the National Research Council, (2000), thus creating student-centered classroom environments. Second, by establishing a climate of caring where students felt supported and motivated to learn, teachers managed tensions resulting from the incorporation of student-centered elements and the accountability-based instructional mandates outlined by their school district and state
Baker, Richard Allen, Jr.
The purpose of this study was to examine the policy implications allowing administrators to exempt a student from required arts instruction if the student obtained unsatisfactory scores on the high-stake state mandated tests in English and mathematics. This study examined English language arts and math test scores for 37,222 eighth grade students…
Sachar, Jane; Suppes, Patrick
The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
Parker, Richard I.; Vannest, Kimberly J.; Davis, John L.; Clemens, Nathan H.
Within a response to intervention model, educators increasingly use progress monitoring (PM) to support medium- to high-stakes decisions for individual students. For PM to serve these more demanding decisions requires more careful consideration of measurement error. That error should be calculated within a fixed linear regression model rather than…
Describes the controversy over high stakes gambling operations on Tuscarora and Mohawk reservation lands that has shaken Iroquois communities. Outlines the arguments of both sides, and suggests that tribal ownership and control of gambling operations, which has worked satisfactorily for the Seneca, may provide a resolution. (SV)
Paugh, Patricia; Carey, Jane; King-Jackson, Valerie; Russell, Shelley
This article focuses on the evolution of the classroom literacy block as a learning space where teachers and students renegotiated activities for independent vocabulary and word work within a high-stakes reform environment. When a second grade classroom teacher and literacy support specialist decided to co-teach, they invited all students in the…
Buxton, Cory; Provenzo, Eugene F., Jr.
Science curriculum and instruction in K-12 settings in the United States is currently dominated by an emphasis on the science standards movement of the 1990s and the resulting standards-based high-stakes assessment and accountability movement of the 2000s. We argue that this focus has moved the field away from important philosophical…
Objectives. To describe the implementation of a high-stakes rubric to assess student professionalism in introductory and advanced pharmacy practice experiences (IPPEs and APPEs) to promote the professional socialization of students in the doctor of pharmacy (PharmD) program at Western New England University (WNE). Findings. A professionalism rubric was adapted from the literature to assess the professional behavior of students enrolled in experiential courses based on evaluation of the following criteria: appropriate communication skills with patients and providers, appearance and dress code, timeliness, and initiative. The rubric was implemented in the fall semester of 2013 as a high-stakes component of the assessment within all experiential courses. Students were required to meet expectations for each of the four criteria in order to pass the practice experience, independent of their performance in other course components. Students were assessed by their preceptors at the midpoint and end of each practice experience using the appropriate evaluation tool. Each of the IPPE and APPE evaluation tools included the professionalism rubric as a requirement for assessment. Use of the Professionalism Rubric as a high-stakes assessment tool highlighted professionalism as an important component of the program, making expectations explicit to students and providing leverage to preceptors for holding students accountable. Summary. The Office of Experiential Affairs at WNE has raised awareness of the importance of professionalism and promoted the professional socialization of PharmD students with the use of a high-stakes professionalism rubric. PMID:28289309
Jones, Ken; Whitford, Betty Lou
Designed to monitor school accountability, KIRIS (Kentucky Instructional Results Information System) offers a powerful lesson about how high-stakes accountability systems can distort and undermine original visions for effective curriculum, instruction, and assessment practices. Changes have been influenced by several interconnected elements:…
Matsummura, Lindsay Clare; Wang, Elaine
In the present exploratory qualitative study we examine the contextual factors that influenced the implementation of a multi-year comprehensive literacy-coaching program (Content-Focused Coaching, CFC). We argue that principals' sensemaking of the dialogic instructional strategies promoted by the program in light of high-stakes accountability…
Trujillo, Tina M.
This case study of an urban school board's experiences under high-stakes accountability demonstrates how the district leaders eschewed democratic governance processes in favor of autocratic behaviors. They possessed narrowly defined goals for teaching and learning that emphasized competitive, individualized means of achievement. Their decision…
Gunzenhauser, Michael G.
In this article, I use concepts from Michel Foucault to analyze the ways in which the high-stakes accountability movement has appropriated the technology of the examination to redefine the educated subject as a normalized case. Partly this has become possible because of the role that educational research has played in laying out the conditions for…
Martin, Susan D.; Chase, Maggie; Cahill, Mary Ann; Gregory, Anne E.
As four teacher educators teaching a course associated with state-mandated assessment of literacy subject matter knowledge and instructional practices, we conducted a self-study of our experiences. In this article, we describe how high-stakes assessment further compounds the problematic nature of teaching and learning literacy in coursework. We…
Community colleges are typically assumed to be nonselective, open-access institutions. Yet access to college-level courses at such institutions is far from guaranteed: the vast majority of two-year institutions administer high-stakes exams to entering students that determine their placement into either college-level or remedial education. Despite…
Newhouse, C. Paul
This paper reports on the outcomes of a three-year study investigating the use of digital technologies to increase the authenticity of high-stakes summative assessment in four Western Australian senior secondary courses. The study involved 82 teachers and 1015 students and a range of digital forms of assessment using computer-based exams, digital…
Kuentzel, Jeffrey G.; Hetterscheidt, Lesley A.; Barnett, Douglas
The rigors of standardized testing make for numerous opportunities for examiner error, including simple computational mistakes in scoring. Although experts recommend that test scoring be double-checked, the extent to which independent double-checking would reduce scoring errors is not known. A double-checking procedure was established at a…
Liu, Ou Lydia; Bridgeman, Brent; Gu, Lixiong; Xu, Jun; Kong, Nan
Research on examinees' response changes on multiple-choice tests over the past 80 years has yielded some consistent findings, including that most examinees make score gains by changing answers. This study expands the research on response changes by focusing on a high-stakes admissions test--the Verbal Reasoning and Quantitative Reasoning measures…
Rubin, Daniel Ian
There has been a universal movement towards government-regulated standardisation and high-stakes assessment. In the United States, this has resulted in the No Child Left Behind Act (2001). Because of the predominant focus on high-stakes reading and writing assessments required by NCLB, teachers in the subject area of English/Language Arts (ELA)…
In all the elementary schools in the county, benchmark assessments were given six times a year in math and three times in reading; they were modeled after the questions anticipated on the Maryland School Assessment (MSA). Although results were sent to the school board, there were no cosmic consequences for the hourlong tests; they were supposed to…
Brennan, Robert L.
Koretz, in his article published in this issue, provides compelling arguments that the high stakes currently associated with accountability testing lead to behavioral changes in students, teachers, and other stakeholders that often have negative consequences, such as inflated scores. Koretz goes on to argue that these negative consequences require…
This article reports an empirical study that examined the pattern of test preparation for College English Test Band 4 (CET4) and the differential effects of test preparation practices on its scores, thereby drawing implications for CET4 score validity. Data collection involved 1,003 test takers of CET4. A pretest was administered at the beginning…
Hall, John D.; Howerton, D. Lynn; Jones, Craig H.
The No Child Left Behind Act and the accountability movement in public education caused many states to develop criterion-referenced academic achievement tests. Scores from these tests are often used to make high stakes decisions. Even so, these tests typically do not receive independent psychometric scrutiny. We evaluated the 2005 Arkansas…
Behizadeh, Nadia; Engelhard, George, Jr.
In his focus article, Koretz (this issue) argues that accountability has become the primary function of large-scale testing in the United States. He then points out that tests being used for accountability purposes are flawed and that the high-stakes nature of these tests creates a context that encourages score inflation. Koretz is concerned about…
Chevalier, Shirley A.
In conventional practice, most educators and educational researchers score cognitive tests using a dichotomous right-wrong scoring system. Although simple and straightforward, this method does not take into consideration other factors, such as partial knowledge or guessing tendencies and abilities. This paper discusses alternative scoring models:…
40. Champagne, D., & Roberts, E., An Exercise in Freedom: A Place Where Test Scores Appear to Be Rising. = 3. Acland , H., If Reading Scores Are...of the nation’s young teachers. Scientific, Engineering, Tech- nical Manpower Comments, November 1979. 3. Acland , Henry, If reading scores are
Wise, Vicki L.; Wise, Steven L.; Bhola, Dennison S.
Accountability for educational quality is a priority at all levels of education. Low-stakes testing is one way to measure the quality of education that students receive and make inferences about what students know and can do. Aggregate test scores from low-stakes testing programs are suspect, however, to the degree that these scores are influenced…
To fairly and accurately interpret candidates’ Pharmacy College Admission Test (PCAT) scores as listed on their official transcripts, it is important to understand how these scores reflect candidates’ performances on cognitive tasks involving the identification, interpretation, analysis, and evaluation of information assumed to have been covered in pre-pharmacy science, math, and general education coursework. This paper attempts to facilitate this understanding by explaining how candidates’ responses to PCAT test items relate to their scaled scores and percentile ranks and how their writing scores reflect their performance. This paper also suggests how differences between candidates’ PCAT subtest scores may reflect different personal experiences, educational backgrounds, and cognitive abilities. PMID:28289307
Johnson, Kary A.; Wilson, Celia M.; Williams-Rossi, Dara
This exploratory study investigated how reading comprehension was conceptualized on the new high-stakes test, the 2011-2012 State of Texas Assessment of Academic Readiness (STAAR). Specifically, comprehension, rate, and accuracy scores on the Gray Oral Reading Test 4 (GORT-4) from a group of struggling, low-SES, Hispanic middle school students (n…
With the increase of high-stakes testing and the subsequent consequences it is essential that educators understand the validity and the inferences based on the scores produced by these tests. The purpose of this study was to determine if a relationship exists between the underlying constructs of the grade six reading TAKS, the Test of Reading…
Sachar, Jane; Suppes, Patrick
It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Wilkins, M. Elaine
In 2001, No Child Left Behind introduced the highly qualified status for k-12 teachers, which mandated the successful scores on a series of high-stakes test; within this series is the Pre-Professional Skills Test (PPST) or PRAXIS I. The PPST measures basic k-12 skills for reading, writing, and mathematics. The mathematics sub-test is a national…
Schafer, William D.; Hou, Xiaodong
This study discusses and presents an example of a use of spline functions to establish and report test scores using a moderated system of any number of cut scores. Our main goals include studying the need for and establishing moderated standards and creating a reporting scale that is referenced to all the standards. Our secondary goals are to make…
Foster, David; Noyce, Pendred
In this article, the authors describe a collaborative effort involving 30 school districts in California's Silicon Valley that are seeking to overcome the ill effects of mandatory high-stakes standardized testing in mathematics. These districts administer, score, and analyze a common set of performance assessments in mathematics in a way that…
Feldt, Leonard S.
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
Riggs, Rose M.
The effect of a calibration strategy requiring students to predict their scores for each topic on a high stakes test was investigated. The utility of self-efficacy towards predicting achievement and calibration accuracy was also explored. One hundred and ten sixth grade math students enrolled in an urban middle school participated. Students were…
Furnham, Adrian; Guenole, Nigel; Levine, Stephen Z; Chamorro-Premuzic, Tomas
This study presents new analyses of NEO Personality Inventory-Revised (NEO-PI-R) responses collected from a large British sample in a high-stakes setting. The authors show the appropriateness of the five-factor model underpinning these responses in a variety of new ways. Using the recently developed exploratory structural equation modeling (ESEM) technique, the authors show that model fits improve markedly over conventional confirmatory factor analyses (CFA) of the same data set, but that (a) factor interpretations do not change under ESEM analyses, (b) ESEM factor scores, just like CFA factors scores, correlate at near unity with sums of observed scores, (c) NEO-PI-R facets under ESEM analyses are invariant across gender, and (d) ESEM highlights the inappropriateness of alpha and beta as a higher order representation of NEO-PI-R facets, whereas a CFA approach might lead researchers to believe in the appropriateness of these higher order factors. These results, coupled with the existing validity evidence for the NEO-PI-R, suggest that the five-factor structure is the most parsimonious structure for summarizing NEO-PI-R responses from high-stakes settings in the United Kingdom.
Otto, Charlotte A.; Everett, Susan A.; Moyer, Richard H.; Zitzewitz, Paul W.
In this study, we looked at the impact of our specially designed inquiry-based science courses for pre-service elementary teachers on their science content knowledge as measured by a high-stakes state certification test for elementary education. We conducted a pre/post-analysis of the certification test scores of 1,003 pre-service teachers. Cohort…
McCabe, Deborah; Hilmo, Joellen
The Goodenough-Harris Draw-a-Person Test, if given at regular intervals during periods of remediation, may show clear evidence of improvement in behavior and attitude of learning disabled students. (CL)
Blowers, E. A.
The efficiency of several visual and auditory predictors of the Metropolitan Readiness Test was examined utilizing 106 grade 1 subjects considered by their teachers to show learning difficulties. (Author/JC)
As a public school English teacher, the author observes standardized testing season each year with a sort of grim fascination. "So this is it," she thinks as she paces around her silent classroom, peering over kids' shoulders at articles about parasailing. Line graphs tracking the rainfall in Tulsa. Parts of speech. Functions of "x." "These are…
Turnipseed, Stephan; Darling-Hammond, Linda
The number one quality business leaders look for in employees is creativity and yet the U.S. education system undermines the development of the higher-order skills that promote creativity by its dogged focus on multiple-choice tests. Stephan Turnipseed and Linda DarlingHammond discuss the kind of rich accountability system that will help students…
This article presents three strategies for teaching students who are taking the IELTS speaking test. The first strategy is aimed at improving confidence and uses a variety of self-help materials from the field of popular psychology. The second encourages students to think critically and invokes a range of academic perspectives. The third strategy…
Horst, S. Jeanne
Despite high-stakes applications of assessment findings, assessment data are frequently collected in situations that are of low-stakes to examinees. Because low-stakes tests are of little consequence to the examinees, test-taking motivation and thus the validity of inferences drawn from unmotivated examinees' scores are of concern. The current…
Frary, Robert B.
Multiple-choice response and scoring methods that attempt to determine an examinee's degree of knowledge about each item in order to produce a total test score are reviewed. There is apparently little advantage to such schemes; however, they may have secondary benefits such as providing feedback to enhance learning. (SLD)
Kurz, Terri Barber
Multiple-choice tests are generally scored using a conventional number right scoring method. While this method is easy to use, it has several weaknesses. These weaknesses include decreased validity due to guessing and failure to credit partial knowledge. In an attempt to address these weaknesses, psychometricians have developed various scoring…
Sireci, Stephen G.; Talento-Miller, Eileen
Admissions data and first-year grade point average (GPA) data from 11 graduate management schools were analyzed to evaluate the predictive validity of Graduate Management Admission Test[R] (GMAT[R]) scores and the extent to which predictive validity held across sex and race/ethnicity. The results indicated GMAT verbal and quantitative scores had…
Lyman, Howard B.
The first edition of this book was written to give information about testing to people whose work gave them access to test results, but whose training included little or nothing about the use and interpretation of tests. Later editions have been intended for a broader audience as the need for understanding what test scores really mean has…
Ramirez, Gerardo; Beilock, Sian L
Two laboratory and two randomized field experiments tested a psychological intervention designed to improve students' scores on high-stakes exams and to increase our understanding of why pressure-filled exam situations undermine some students' performance. We expected that sitting for an important exam leads to worries about the situation and its consequences that undermine test performance. We tested whether having students write down their thoughts about an upcoming test could improve test performance. The intervention, a brief expressive writing assignment that occurred immediately before taking an important test, significantly improved students' exam scores, especially for students habitually anxious about test taking. Simply writing about one's worries before a high-stakes exam can boost test scores.
The responsible and innovative utilization of media, not only in test score reporting but also in other guidance functions, may assist the counselor in permitting him more time to function with clients in counseling relationships. (Author)
Verona, Gail S.; Young, John W.
The New Jersey High School Proficiency Test (HSPT) is a "high stakes" test administered as a graduation requirement to all 11th grade students in New Jersey high schools. High school principals have been held increasingly accountable for successful HSPT scores. This study used Leithwood's model of transformational leadership (K.…
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability.
It is challenging for parents and the general public to make sense of the reports on test scores that appear in the mass media. This article offers some things for readers to consider as they bring a critical eye to what is read in the papers. Usually reports on test scores in the media are quite short and focus on one or two aspects of test…
Bedard, Kelly; Ferrall, Christopher
Compares the distribution of test scores at age 13 in 1964 and 1982 and wages later in life across 11 countries. Finds that wage dispersion later in life is never greater than test-score dispersion. For three countries (U.S., UK, and Japan), finds evidence of skill-biased changes in wage dispersion between the early 1970s and the late 1980s.…
Teachers interact with their students on behalf of the entire educational system. The aim of this study is to explore how biology teachers understand and construct their practice in a high-stakes accountability environment that is likely to be riddled with tensions. By critically questioning the technical paradigms of accountability this study challenges the fundamental assumptions of accountability. Such a critical approach may help teachers develop empowerment strategies that can free them from the de-skilling effects of the educational accountability system. This interpretive case study of a high-school in Maryland is grounded in three streams of research literature: quality science instruction based on scientific inquiry, the effects of educational accountability on the curriculum, and the influence of policy on classroom practice with a specific focus on how teachers balance competing tensions. This study theoretically occurs at the intersection of educational accountability and pedagogy. In terms of data collection, I conduct two interviews with all six biology teachers in the school. I observe each teacher for at least fifteen class periods. I review high-stakes accountability policy documents from the federal, state, and district levels of the education system. Three themes emerge from the research. The first theme, "re-defining science teaching," captures how deeply accountability structures have penetrated the science curriculum. The second theme, "the pressure mounts," explores how high-stakes accountability in science has increased the stress placed on teachers. The third theme, "teaching-in-between," explores how teachers compromise between accountability mandates and their own understandings of quality teaching. Together, the three themes shed light on the current high-stakes climate in which teachers currently work. This study's findings inform the myriad paradoxes at all levels of the educational system. As Congress and advocacy groups battle over
Willingham, Warren W.; Pollack, Judith M.; Lewis, Charles
Proposed a framework of possible differences between grades and test scores and tested the framework with data on 8,454 high school seniors from the National Education Longitudinal Study. Identified differences and correlations among achievement factors. Differences between grades and tests give these measures complementary strengths in…
Leuba, Richard J.
Explains how multiple choice test items can be devised to measure higher-order learning, including engineering problem solving. Discusses the value and information provided in item analysis procedures with machine-scored tests. Suggests elements to consider in test design. (ML)
van der Linden, Wim J.; Luecht, Richard M.
Derives a set of linear conditions of item-response functions that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly. An example illustrates the use of the model for an item pool from the Law School Admissions Test (LSAT). (SLD)
Kopriva, Rebecca J.; Thurlow, Martha L.; Perie, Marianne; Lazarus, Sheryl S.; Clark, Amy
This article argues that test takers are as integral to determining validity of test scores as defining target content and conditioning inferences on test use. A principled sustained attention to how students interact with assessment opportunities is essential, as is a principled sustained evaluation of evidence confirming the validity or calling…
Dearborn Public Schools, MI.
The purpose of the fifth annual Dearborn Achievement Test Score report is to summarize and to help interpret the test results so that Dearborn citizens and educators will have a better understanding of the educational achievements of Dearborn students. The District-wide Testing Program assesses reading readiness, scholastic aptitude, academic…
As more colleges move to "test optional" admissions policies, the debate over the utility and interpretation of standardized-test scores continues. In this article, the author interviews Daniel Koretz, a professor of education at Harvard University and author of "Measuring Up: What Educational Testing Really Tells Us". Koretz…
Shievitz, A. L.; Tudiver, F.; Araujo, A.; Sanghe, P.; Boyle, E.
OBJECTIVE: To determine whether Mini-Mental State Examination (MMSE) scores of elderly family medicine patients are different when the test is administered at home rather than at the clinic. DESIGN: Cross-sectional comparison study. SETTING: University family practice unit in an urban area. PARTICIPANTS: A convenience sample of family practice clinic patients 70 years or older were referred to the study in the sequence seen at the clinic. Of 171 patients approached in person or by telephone, 77 agreed to participate. METHOD: The MMSE was administered at home and at the clinic on the same day for all subjects. Testing site order was randomized across patients. MAIN FINDINGS: Of the 77 patients who agreed to be subjects, only 13 (16.9%) had low MMSE scores (< or = 24). Five (41.7%) of these had normal scores (> 24) at home, but low scores in the clinic. Subjects had significantly higher scores on MMSEs administered at home (P < .01) on the same day. CONCLUSIONS: Previous research has shown patients achieve higher MMSE scores at home; this study demonstrated it in a representative family medicine population. Primary care physicians should be cautious about classifying elderly patients as possibly cognitively impaired based on clinic testing alone. Testing at home could avoid many unnecessary referrals to specialist services for further assessment and diagnostic tests that use up precious health care resources. PMID:9721421
The uses and misuses of standardized test results used for program evaluation as seen by a staff member of an Elementary Secondary Education Act (ESEA) Title I Technical Assistance Center are described. In ESEA Title I, test scores are used to select students for the program. Although federal requirements do not require using standardized test…
Wise, Steven L.
Whenever the purpose of measurement is to inform an inference about a student's achievement level, it is important that we be able to trust that the student's test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student's test event is not unduly influenced by construct-irrelevant factors…
A widely held view is that good schools are essential to a nation's international economic success and that high test scores on international tests of academic skills and knowledge indicate how good a nation's schools are. The widespread belief that good schools are an important contributor to a nation's economic success in the world is supported…
The prevalence of childhood overweight and obesity increased dramatically in the United States during the past three decades. This increase has adverse public health implications, but its implication for children's academic outcomes is less clear. This paper uses data from five waves of the Early Childhood Longitudinal Study-Kindergarten to examine how children's weight is related to their scores on standardized tests and to their teachers' assessments of their academic ability. The results indicate that children's weight is more negatively related to teacher assessments of their academic performance than to test scores.
Banerjee, Manju; Shaw, Stan F.
Given the latest reauthorization of the Individuals with Disabilities Education Act (IDEA) and evolving views on the identification of cognitive disabilities in special education, many high school graduates with learning disabilities and/or attention-deficit/hyperactivity disorder will have a Summary of Performance (SOP) in lieu of a recent…
Rangvid, Beatrice Schindler
We combine data from three studies for Denmark in the PISA 2000 framework to investigate differences in the native-immigrant test score gap by country of origin. In addition to the controls available from PISA data sources, we use student-level data on home background and individual migration histories linked from administrative registers. We find…
Teaching Music, 2007
A recent study found that students in high-quality school music education programs score higher on standardized tests compared to students in schools with deficient music education programs. The study, which was published in the Winter 2006 issue of MENC's Journal for Research in Music Education, is the first to examine the quality of school music…
van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas
This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Petrilli, Michael J.; Wright, Brandon L.
At a time when the national conversation is focused on lagging upward mobility, it is no surprise that many educators point to poverty as the explanation for mediocre test scores among U.S. students compared to those of students in other countries. If American teachers in struggling U.S. schools taught in Finland, says Finnish educator Pasi…
Kposowa, Augustine J.; Valdez, Amanda D.
Objectives: The primary objective of the study was to investigate the relationship between ubiquitous laptop use and academic achievement. It was hypothesized that students with ubiquitous laptops would score on average higher on standardized tests than those without such computers. Methods: Data were obtained from two sources. First, demographic…
Homeschooling, one of the fastest growing educational alternatives, is enjoying increasing respect from educators and parents alike. This is partly because homeschooling children score as well and often better on standardized tests than their publicly schooled counterparts. However, the vast majority of homeschooled students come from the…
Fahle, Erin; Reardon, Sean
Describing the variation in test scores between and within school districts is critical for: (1) for policy-related and descriptive work that investigates the sorting of students among districts and the differential effectiveness of those districts; and (2) for methodological work planning future experiments or interventions. Intraclass…
Dougherty, Jack; Harelson, Jeffrey; Maloney, Laura; Murphy, Drew; Smith, Russell; Snow, Michael; Zannoni, Diane
Home buyers exercise school choice when shopping for a private residence due to its location in a public school district or attendance area. In this quantitative study of one Connecticut suburban district, we measure the effect of elementary school test scores and racial composition on home buyers' willingness to purchase single-family homes over…
To achieve perpetually better test results each year as mandated by the No Child Left Behind Act (NCLB), teachers in successful schools such as Leroy Anderson Elementary in San Jose, California, will "try anything" to raise scores, as the school's principal stated in an interview with "The San Jose Mercury News." In schools…
Negron, Maggie; Breindel, Matthew
This assessment of placement test scores in reading, math, and sentence skills from incoming students at College of the Desert (California) shows that students are overwhelmingly underprepared for study at the college. Only 15% of students were prepared in sentence skills, 27% in reading skills, 7% in math skills; only 3% were prepared in all 3…
Brennan, Robert L.
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
Minor, Elizabeth Covay
Research on achievement gaps has found that achievement gaps are larger for students who take advanced mathematics courses compared to students who do not. Focusing on the advanced mathematics student achievement gap, this study found that African American advanced mathematics students have significantly lower test scores and are less likely to be…
Reinhart, Robert M G; Woodman, Geoffrey F
We can more precisely tune attention to highly rewarding objects than other objects in our environment, but how our brains do this is unknown. After a few trials of searching for the same object, subjects' electrical brain activity indicated that they handed off the memory representations used to control attention from working memory to long-term memory. However, when a large reward was possible, the neural signature of working memory returned as subjects recruited working memory to supplement the cognitive control afforded by the representations accumulated in long-term memory. The amplitude of this neural signature of working memory predicted the magnitude of the subsequent behavioral reward-based attention effects across tasks and individuals, showing the ubiquity of this cognitive reaction to high-stakes situations.
Essay marking is a subjective intellectual exercise in which the score reliability can be influenced by many factors such as the test design, the marker's interpretation of the marking criteria, the procedure, and the method used in the marking process. After conducting a literature review this study investigated the reliability of essay marking…
Dimitrov, Dimiter M.
This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Jones, Sarah B; Knapik, Joseph J; Sharp, Marilyn A; Darakjy, Salima; Jones, Bruce H
Epidemiological studies often have to rely on a participant's self-reporting of information. The validity of the self-report instrument is an important consideration in any study. The purpose of this investigation was to determine the validity of self-reported Army Physical Fitness Test (APFT) scores. The APFT is administered to all soldiers in the U.S. Army twice a year and consists of the maximum number of push-ups completed in 2 minutes, the maximum number of sit-ups completed in 2 minutes, and a 2-mile run for time. Army mechanics responded to a questionnaire in March and June 2004 asking them to report the exact scores of each event on their most recent APFT. Actual APFT scores were obtained from the soldier's military unit. The mean +/- standard deviation (SD) of actual and self-reported numbers of push-ups was 61 +/- 14 and 65 +/- 13, respectively. The mean +/- SD of actual and self-reported numbers of sit-ups were 66 +/- 10 and 68 +/- 10, respectively. The mean +/- SD of actual and self-reported run times (minutes) were 14.8 +/- 1.4 and 14.6 +/- 1.4, respectively. Correlations between actual and self-reported push-ups, sit-ups, and run were 0.83, 0.71, and 0.85, respectively. On average, soldiers tended to slightly over-report performance on all APFT events and individual self-reported scores could vary widely from actual scores based on Bland-Altman plots. Despite this, the close correlations between the actual and self-reported scores suggest that self-reported values are adequate for most epidemiological military studies involving larger sample sizes.
Qian, David D.
In recent years, school-based assessment (SBA) has been incorporated into the English Language subject of a traditional high-stakes public examination, the Hong Kong Certificate of Education Examination. As reactions from various stakeholder groups have been mixed, it was necessary to review this new practice. This paper reports on a study of 33…
Lavigne, Alyson Leah
Background/Context: The stakes are getting higher for teachers daily as more and more states adopt hiring, firing, and tenure-granting policies based on teacher evaluations. Even more concerning is the limited discussion about whether or not high-stakes teacher evaluation can meet the intended outcome of improved student achievement, and at what…
Background/Context: Considerable controversy surrounds the issue of whether high-stakes statewide accountability programs have led to more equitable educational opportunities for all students. Some researchers suggest that these programs have focused attention on improving the achievement of students of color from low socioeconomic backgrounds.…
Theoharis, George; Causton, Julie; Tracy-Bronson, Chelsea P.
Students identified with disabilities are increasingly being educated with the assistance of support services within heterogeneous (i.e., general education) classrooms (United States Department of Education, 2011). Yet, in this era of high stakes accountability, students are labeled, sorted, and differentially treated according to their academic…
Symes, Wendy; Putwain, David W.; Remedios, Richard
Prior to high stakes examinations, teachers may engage in instructional practices to encourage their students to prepare well for their exams, including the use of "fear appeals". The current study examined whether academic buoyancy played a role in student appraisals of fear appeals as threatening or challenging. High school students…
This article explores teachers' experiences under high-stakes accountability and shows how the narrowing of curriculum depleted teachers' intrinsic work rewards. The article analyzes data from an ethnographic study of teachers' work in two high-poverty urban public schools. The study shows that as instructional mandates emphasized a narrowed…
Brown, Christopher P.; Bay-Borelli, Debra E.; Scott, Jill
High-stakes education reforms across the United States and the globe continue to alter the landscape of teaching and teacher education. One key but understudied aspect of this reform process is the experiences of first-year teachers, particularly those who participated in these high-stakes education systems as students and as a…
Anicich, Eric M.; Swaab, Roderick I.; Galinsky, Adam D.
Functional accounts of hierarchy propose that hierarchy increases group coordination and reduces conflict. In contrast, dysfunctional accounts claim that hierarchy impairs performance by preventing low-ranking team members from voicing their potentially valuable perspectives and insights. The current research presents evidence for both the functional and dysfunctional accounts of hierarchy within the same dataset. Specifically, we offer empirical evidence that hierarchical cultural values affect the outcomes of teams in high-stakes environments through group processes. Experimental data from a sample of expert mountain climbers from 27 countries confirmed that climbers expect that a hierarchical culture leads to improved team coordination among climbing teams, but impaired psychological safety and information sharing compared with an egalitarian culture. An archival analysis of 30,625 Himalayan mountain climbers from 56 countries on 5,104 expeditions found that hierarchy both elevated and killed in the Himalayas: Expeditions from more hierarchical countries had more climbers reach the summit, but also more climbers die along the way. Importantly, we established the role of group processes by showing that these effects occurred only for group, but not solo, expeditions. These findings were robust to controlling for environmental factors, risk preferences, expedition-level characteristics, country-level characteristics, and other cultural values. Overall, this research demonstrates that endorsing cultural values related to hierarchy can simultaneously improve and undermine group performance. PMID:25605883
Martin, John D.; And Others
The degree of relationship between scores on the Barron Ego Strength Scale and the scores on the Bender-Gestalt Test was investigated on a sample of college students. Correlations were moderate to low. Racial differences were observed on the Bender-Gestalt Test. (Author/JKS)
Pedulla, Joseph J.; Abrams, Lisa M.; Madaus, George F.; Russell, Michael K.; Ramos, Miguel A.; Miao, Jing
Results from a national survey of teachers are reported for five types of state testing programs, those with: (1) high stakes for districts, schools, or teachers, and students; (2) high stakes for districts, schools, and teachers, and moderate stakes for students; (3) high stakes for districts, schools, and teachers, and low stakes for students;…
Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip
Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record
Gaddis, S Michael; Lauen, Douglas Lee
Since at least the 1960s, researchers have closely examined the respective roles of families, neighborhoods, and schools in producing the black-white achievement gap. Although many researchers minimize the ability of schools to eliminate achievement gaps, the No Child Left Behind Act (NCLB) increased pressure on schools to do so by 2014. In this study, we examine the effects of NCLB's subgroup-specific accountability pressure on changes in black-white math and reading test score gaps using a school-level panel dataset on all North Carolina public elementary and middle schools between 2001 and 2009. Using difference-in-difference models with school fixed effects, we find that accountability pressure reduces black-white achievement gaps by raising mean black achievement without harming mean white achievement. We find no differential effects of accountability pressure based on the racial composition of schools, but schools with more affluent populations are the most successful at reducing the black-white math achievement gap. Thus, our findings suggest that school-based interventions have the potential to close test score gaps, but differences in school composition and resources play a significant role in the ability of schools to reduce racial inequality.
Hsu, Wen-Chuin; Chu, Yi-Chuan; Fung, Hon-Chung; Wai, Yau-Yau; Wang, Jiun-Jie; Lee, Jiann-Der; Chen, Yi-Chun
Abstract Mounting evidence shows that hyperhomocysteinemia is a risk factor for cognitive decline. This study enrolled subjects with normal serum levels of B12 and folate and performed thorough neuropsychological assessments to illuminate the independent role of homocysteine on cognitive functions. Participants between ages 50 and 85 were enrolled with Modified Hachinski ischemic score of <4, adequate visual and auditory acuity to allow neuropsychological testing, and good general health. Subjects with cognitive impairment resulting from secondary causes were excluded. Each of the participants completed evaluations of general intellectual function, including the Mini-Mental State Examination, Cognitive Abilities Screening Instrument, Clinical Dementia Rating, and a battery of neuropsychological assessments. This study enrolled 225 subjects (90 subjects younger than 65 years and 135 subjects aged 65 years or older). The sex proportion was similar between the 2 age groups. Years of education were significantly fewer in the elderly (7.49 ± 5.40 years) than in the young (9.76 ± 4.39 years, P = 0.001). There was no significant difference in body mass index or levels of vitamin B12 and folate between the 2 age groups. Homocysteine levels were significantly higher in the elderly group compared to the younger group (10.8 ± 2.7 vs. 9.5 ± 2.5 μmol/L, respectively, P = 0.0006). After adjusting for age, sex, and education, only the Digit Symbol Substitution (DSS) score was significantly lower in subjects with hyperhomocysteinemia (homocysteine >12 μmol/L) than those with homocysteine ≤12 μmol/L in the elderly group (DSS score: 7.1 ± 2.7 and 9.0 ± 3.0, respectively, beta = −1.6, 95% confidence interval [CI] = −2.8∼−0.5, P = 0.001) and borderline significance was noted in the combined age group (beta = −1.1, 95% CI = −2.1∼−0.1, P = 0.04). We did not find an association between
Hedrick, Wanda B., Ed.
There's accountability and then there's the testing craze an iatrogenic practice that undermines real learning. Hedrick documents the negative effects of testing, giving teachers another weapon in their arsenal against mindless preparation for high-stakes tests.
Wyman, Leisy; Marlow, Patrick; Andrew, Ciquyaq Fannie; Miller, Gayle; Nicholai, Cikigaq Rachel; Rearden, Yurrliq Nita
A growing body of research documents how educational policies and accountability systems can open or close "ideological and implementational spaces" for bilingual education, shaping the language planning efforts of Indigenous communities. Using collaborative research, Indigenous and non-Indigenous researchers investigated the…
Perry, Tonya, Ed.
In this article, the author discusses how to consolidate schools where enrollment have dropped significantly. In addition to financial concerns, effective and developmentally appropriate curriculum choices are important. The author states that the "core" academic courses must be offered no matter the configuration of the schools, but…
Uy, Chin; Manalo, Ronaldo A.; Cabauatan, Ronaldo R.
In the Philippines, students seeking admission to a university are usually required to meet certain entrance requirements, including passing the entrance examinations with questions on IQ and English, mathematics, and science. This paper aims to determine the factors that affect the performance of entrants into business programmes in high-stakes…
Prior research suggests that the No Child Left Behind Act (NCLB) is having an adverse effect on school music programs, particularly in schools that have not made "adequate yearly progress." In many instances, music programs are being reduced or eliminated, music teachers are being required to assist with the teaching of other subjects,…
Inserra, Albert; Bossert, Kenneth R.
The No Child Left Behind Act of 2001, sponsored by President George W. Bush, calls for 100 percent proficiency in reading and mathematics by 2014. This Federal mandate has caused all public schools in the United States to examine the programs in use to meet these requirements. In addition, states across the country have implemented a series of…
Fink, Rosalee, Ed.; Samuels, S. Jay, Ed.
Although recent U.S. legislation has had a profound impact on reading instruction and student achievement, some students continue to fall behind. This provocative text addresses this gap with a new perspective on reading instruction that goes beyond the realms of teacher content knowledge and methodology. The book shows how motivation and interest…
Mathis, Janelle B.; Albright, Lettie K.
Teachers have the privilege and responsibility of helping children discover the joy of reading. This principle underlies the mission of The Children's Literature Assembly (CLA) of the National Council of Teachers of English (NCTE). Unfortunately, as teachers and librarians in this country face the demands of the "No Child Left Behind…
Dougherty Stahl, Katherine A.; Schweid, Jason
Successful implementation of the Common Core State Standards (2010) will require an alignment between learning standards, effective instruction, and the new assessments designed by the Partnership for the Assessment of Readiness for College and Career (PARCC) and Smarter Balanced (SB) consortia. Both the CCSS and the new generation of assessments…
Cole, James S.; Osterlind, Steven J.
There is increasing pressure for institutions of higher education in the United States to objectively document student learning outcomes. Criticism of higher education is mounting, with the result being that education institutions need to be more accountable for student learning on their campuses. As a result, there is increased interest in the…
Giambo, Debra A.
The educational accountability systems of both the No Child Left Behind (NCLB) Act of 2001 and the state of Florida (as of 1999) were modeled after Texas' system, despite its flaws. NCLB reaches for all students to achieve academic proficiency and designates students with limited English proficiency (LEP) as an important subgroup. As we work with…
Dworkin, A. Gary
Central to sociology is the assumption that virtually all forms of social action and public policies have unanticipated consequences for their actors and social systems. Sociologists seek to explore these unanticipated consequences and delineate how they will affect people, policies, and practices. This essay focuses on the unanticipated…
Bullying, a prevalent form of school violence, threatens development and learning. This article reports the findings of a qualitative study conducted in an elementary school, designed to gain an ecological understanding of bullying perceptions of this school community. The three research questions were: (a) How do individuals and groups within…
Rader, Laura Pope
Social promotion is an ongoing issue in education and is frequently seen as a dichotomy with retention. While retention is a commonly researched topic, the information regarding the academic and behavioral outcomes of socially promoted students is much sparser. The problem is that many students who are socially promoted into high school after…
Chin, Margaret M.; Newman, Katherine S.
Two public policy shifts in the past 10 years--the move from welfare to work and the end of social promotion in school--are intertwined in their implementation in the lives of working poor families. This report draws on ethnographic data from a 6-year study of working poor families in New York City over the period in which welfare reform became a…
This book helps educators improve students' ability to write clear, coherent essays in response to on-demand writing prompts. While it focuses on students' abilities to succeed at on-demand writing, it also promotes the teaching of writing as an expression of art and self. For grades 4-12, it provides examples of responses to narrative and…
Cashman, Timothy G.; McDermott, Benjamin R.
A recently constructed border wall stands within walking distance of Border High School (BHS) and was created to impede the flow of people, goods, fauna, and contraband from Mexico into the United States (U.S.). The reality, however, is that this geopolitical border is fluid, allowing connections between sociopolitical zones. The researchers…
Hoffman, Lynn M.
I conducted surveys, focus group interviews, and analyzed the yearbooks of fifty four yearbook students from five rural high schools to investigate students' process of yearbook construction and to determine what was meaningful and memorable to them throughout their high school experience. Chang's (1992) construct of an adolescent ethos, including…
A remarkable radiation of completely eyeless, cave-obligate spider species (Cicurina) has been described from limestone caves of Texas. This radiation includes over 50 described species, with a large number of hypothesized single-cave endemics, and four species listed as US Federally Endangered. Because of this conservation importance, species delimitation in the group is 'high-stakes'- it is imperative that species hypotheses are data rich, objective, and robust. This study focuses on a complex of four cave-dwelling Cicurina distributed on the northwestern edge of Austin, Texas. Several of the existing species hypotheses in this complex are weak, based on morphological comparisons of small samples of adult female specimens; one species description (for C. wartoni) is based on a single adult specimen. Species limits in this group were newly assessed using morphological, mitochondrial and nuclear DNA sequence data evidence, analysed using a variety of approaches. All data support a clear lineage separation between C. buwata versus the C. travisae complex (including C. travisae, C. wartoni and C. reddelli). Observed congruence across multiple analyses indicate that the C. travisae complex represents a single species, and the formal species synonymy presented here has important conservation implications. The integrative framework utilized in this study serves as a potential model for other Texas cave Cicurina, including US Federally Endangered species. More generally, this study illustrates how and why taxon-focused conservation efforts must prioritize modern species delimitation research (if the existing taxonomy is weak), before devoting precious downstream resources to conservation efforts. The study also highlights the issue of taxonomic type II error that diversity biologists increasingly face as species delimitation moves into the genomics era.
Ali, Samina; Thomson, Denise; Graham, Timothy A D; Rickard, Sean E; Stang, Antonia S
Background The high-paced, unpredictable environment of the emergency department (ED) contributes to errors in patient safety. The ED setting becomes even more challenging when dealing with critically ill patients, particularly with children, where variations in size, weight, and form present practical difficulties in many aspects of care. In this commentary, we will explore the impact of the health care providers’ emotional reactions while caring for critically ill patients, and how this can be interpreted and addressed as a patient safety issue. Discussion ED health care providers encounter high-stakes, high-stress clinical scenarios, such as pediatric cardiac arrest or resuscitation. This health care providers’ stress, and at times, distress, and its potential contribution to medical error, is underrepresented in the current medical literature. Most patient safety research is limited to error reporting systems, especially medication-related ones, an approach that ignores the effects of health care provider stress as a source of error, and limits our ability to learn from the event. Ways to mitigate this stress and avoid this type of patient safety concern might include simulation training for rare, high-acuity events, use of pre-determined clinical order sets, and post-event debriefing. Conclusion While there are physiologic and anatomic differences that contribute to patient safety, we believe that they are insufficient to explain the need to address critical life-threatening event-related patient safety issues for both adults and, especially, children. Many factors make patient safety during critical medical events distinct from general patient safety issues, but it is, perhaps, this heightened high-stress, emotional climate that is the most distinct and important part of all. We believe that consideration of this concept is essential when discussing safety improvement in critical medical events. PMID:28176924
One hundred and sixty persons aged from 10 to 69 years (106 women, 54 men) with healthy eyes were studied with the Farnsworth-Munsell 100 hue (FM100) test. The mean of the results in the total scores and in the individual box scores in the right and left eye were calculated. The total score was also separately calculated in women and men. The test was administered under the illumination of Macbeth Easel lamp, 1000 lux, and the right eye was tested first. The results were calculated in six different age groups, 10-19 years, 20-29 years, etc. The mean of the total scores in the right eye varied from 7.44+/-2.46 (SD) to 10.07+/-2.03 in different age groups and in the left eye from 7.56+/-2.36 to 10.16+/-2.68. The scores changed significantly with the age: the correlation between the age and the test scores by linear regression gave significant results, in the right eye (R = 0.308, P = 0.0001), and in the left eye (R = 0.246, P = 0.0021). The present study with the normal error scores in the FM100 test and its individual boxes in persons aged 10-69 years gives clinicians working with colour vision defects a possibility to estimate the normality or abnormality of the results in their patients.
Walter, Richard Barry
This study investigated the relationship between instructional level scores as determined by a cloze test and instructional level scores as determined by an informal reading inventory (IRI). Fifty male and 50 female subjects were randomly selected from the total fifth grade population of five schools chosen from a total of 22 midwestern elementary…
Craven, B J
This paper describes a method for scoring the Farnsworth-Munsell 100-Hue test, based on maximum-likelihood estimation, which in theory reduces test-to-test variability in scores and which is therefore better able to discriminate between different levels of overall colour discrimination than is the original Farnsworth scoring system. Error scores produced by the method are directly comparable to error scores produced by the traditional scoring system. It is hoped that this work will provoke further consideration of the efficiency of the scoring system as far as test-to-test variability is concerned, including the efficient detection of polarity in the subject's hue discrimination function.
Bunch, George C.; Aguirre, Julia M.; Tellez, Kip
Assessing the preparation of preservice candidates for quality teaching, both for mainstream students and for ELs, requires reliable and valid assessments that pay close attention to context, process, and reflection, factors that traditional evaluations of teaching either ignore or undervalue. In this article, the authors focus on one high-stakes…
West, Suzanne M.
Course grades, which often include non-achievement factors such as effort and behavior and are subject to individual teacher grading philosophies, suffer from issues of unreliability. Yet, course grades continue to be utilized as a primary tool for reporting academic achievement to students and parents and are used by most colleges and…
Smith, Teresa C.; Smith, Billy L.
Examined Visual Aural Digit Span Test (VADS) and Bender-Gestalt (BG) scores as predictors of Wide Range Achievement Test-Revised (WRAT-R) scores among 115 elementary school students referred for low academic achievement. Divided children into three age groups. Results suggest BG and VADS Test can be effective screening devices for young children…
te Nijenhuis, Jan; van Vianen, Annelies E. M.; van der Flier, Henk
IQ scores provide the best general predictor of success in education, job training, and work. However, there are many ways in which IQ scores can be increased, for instance by means of retesting or participation in learning potential training programs. What is the nature of these score gains? Jensen [Jensen, A. R. (1998a). "The g factor: The…
Lin, Miao-Hsiang; Hsiung, Chao A.
Two simple empirical approximate Bayes estimators are introduced for estimating domain scores under binomial and hypergeometric distributions respectively. Criteria are established regarding use of these functions over maximum likelihood estimation counterparts. (SLD)
Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J.
Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…
Friedman, A F; Wakefield, J A; Sasek, J; Schroeder, D
A new scoring procedure to be used with Spraings' technique for administering the Bender-Gestalt test in a multiple choice format is presented. Scoring weights are used instead of simply scoring each item right or wrong. The evidence presented suggests that this method of scoring would increase the value of Spraings' test in the diagnosis of perceptual deficits.
Livingston, Samuel A.; Lewis, Charles
This paper presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including the formation of a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate its effective test length in terms of…
Kane, Thomas J.; Staiger, Douglas O.
By the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. In this paper, the authors focus on accountability programs in which states measure the effectiveness of individual…
Rich, John D., Jr.; Fullard, William; Overton, Willis
One Hundred and Twelve Latino students from Philadelphia participated in this study, which examined the development of deductive reasoning across adolescence, and the relation of reasoning to test anxiety and standardized test scores. As predicted, 11th and ninth graders demonstrated significantly more advanced reasoning than seventh graders.…
Zimmerman, Donald W.
Results of this study indicate that the correlation between half-test scores over repeated splits, over persons, and over repeated testings resulting in different sets of observed scores, is given by Kuder-Richardson Formula 21. (RF)
Hageman, Barbara H.; Sigman, Clayton B.; Koslosky, John T.
A Test/Score/Report capability is currently being developed for the Transportable Payload Operations Control Center (TPOCC) Advanced Spacecraft Simulator (TASS) system which will automate testing of the Goddard Space Flight Center (GSFC) Payload Operations Control Center (POCC) and Mission Operations Center (MOC) software in three areas: telemetry decommutation, spacecraft command processing, and spacecraft memory load and dump processing. Automated computer control of the acceptance test process is one of the primary goals of a test team. With the proper simulation tools and user interface, the task of acceptance testing, regression testing, and repeatability of specific test procedures of a ground data system can be a simpler task. Ideally, the goal for complete automation would be to plug the operational deliverable into the simulator, press the start button, execute the test procedure, accumulate and analyze the data, score the results, and report the results to the test team along with a go/no recommendation to the test team. In practice, this may not be possible because of inadequate test tools, pressures of schedules, limited resources, etc. Most tests are accomplished using a certain degree of automation and test procedures that are labor intensive. This paper discusses some simulation techniques that can improve the automation of the test process. The TASS system tests the POCC/MOC software and provides a score based on the test results. The TASS system displays statistics on the success of the POCC/MOC system processing in each of the three areas as well as event messages pertaining to the Test/Score/Report processing. The TASS system also provides formatted reports documenting each step performed during the tests and the results of each step. A prototype of the Test/Score/Report capability is available and currently being used to test some POCC/MOC software deliveries. When this capability is fully operational it should greatly reduce the time necessary
High-stakes testing is one of the hottest topics in education today. Although most states use some form of testing, fewer than half administer tests linked to state education standards and goals, often called criterion referenced tests. Fewer still use statewide tests with high stakes for both the students enrolled in the public schools and the…
Dorans, Neil J.
Score equity assessment (SEA) is introduced, and placed within a fair assessment context that includes differential prediction or fair selection and differential item functioning. The notion of subpopulation invariance of linking functions is central to the assessment of score equity, just as it has been for differential item functioning and…
Hansen, Karsten; Heckman, James J.; Mullen, Kathleen J.
This study developed two methods for estimating the effect of schooling on achievement test scores that control for the endogeneity of schooling by postulating that both schooling and test scores are generated by a common unobserved latent ability. The methods were applied to data on schooling and test scores. Estimates from the two methods are in…
Severo, Milton; Gaio, A. Rita; Povo, Ana; Silva-Pereira, Fernanda; Ferreira, Maria Amélia
In theory the formula scoring methods increase the reliability of multiple-choice tests in comparison with number-right scoring. This study aimed to evaluate the impact of the formula scoring method in clinical anatomy multiple-choice examinations, and to compare it with that from the number-right scoring method, hoping to achieve an…
Royal, Kenneth D; Gilliland, Kurt O; Kernick, Edward T
Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high-stakes testing arena rely on classical test theory (CTT) methods. However, advances in item response theory software have made the application of these techniques much more accessible to classroom instructors. The purpose of this research is to analyze a common medical school anatomy examination using both the traditional CTT scoring method and a Rasch measurement scoring method to determine which technique provides more robust findings, and which set of psychometric indicators will be more meaningful and useful for anatomists looking to improve the psychometric quality and functioning of their examinations. Results produced by the more robust and meaningful methodology will undergo a rigorous psychometric validation process to evaluate construct validity. Implications of these techniques and additional possibilities for advanced applications are also discussed.
Handeland, Katina; Kjellevold, Marian; Wik Markhus, Maria; Eide Graff, Ingvild; Frøyland, Livar; Lie, Øyvind; Skotheim, Siv; Stormark, Kjell Morten; Dahl, Lisbeth; Øyen, Jannike
Assessment of adolescents’ dietary habits is challenging. Reliable instruments to monitor dietary trends are required to promote healthier behaviours in this group. The purpose of this cross-sectional study was to assess adolescents’ adherence to Norwegian dietary recommendations with a diet score and to report results from, and test-retest reliability of, the score. The diet score involved seven food groups and one physical activity indicator, and was applied to answers from a semi-quantitative food frequency questionnaire (FFQ) administered twice. Reproducibility of the score was assessed with Cohen’s Kappa (κ statistics) at an interval of three months. The setting was eight lower-secondary schools in Hordaland County, Norway, and subjects were adolescents (n = 472) aged 14–15 years and their caregivers. Results showed that the proportion of adolescents consistently classified by the diet score was 87.6% (κ = 0.465). For food groups, proportions ranged from 74.0% to 91.6% (κ = 0.249 to κ = 0.573). Less than 40% of the participants were found to adhere to recommendations for frequencies of eating fruits, vegetables, added sugar, and fish. Highest compliance to recommendations was seen for choosing water as beverage and limit the intake of red meat. The score was associated with parental socioeconomic status. The diet score was found to be reproducible at an acceptable level. Health promoting work targeting adolescents should emphasize to increase the intake of recommended foods to approach nutritional guidelines. PMID:27483312
Opponents of so-called high-stakes testing complain that such intense pressure causes teachers to devote virtually all classroom time and resources to preparing students for the standardized test. This phenomenon is called "teaching to the test." Proponents of high-stakes testing respond that that is exactly as it should be. They argue…
Dutro, Elizabeth; Selland, Makenzie K.; Bien, Andrea C.
Drawing on the combined theoretical lenses of positioning theory and academic literacies, this article presents case studies of four children from one urban classroom, two of whom scored at or above proficient on the large-scale writing assessments required by their district and state and two of whom scored below. Using criteria from state…
In order to meet the goals of No Child Left Behind, standardized testing is preeminent as the sole indicator determining whether states all across America demonstrate adequate yearly progress regarding the improvement of student achievement in literacy education. This book will help teachers and parents raise children's scores on standardized…
Hafner, Anne L.
Using a quasi-experimental analysis of variance (ANOVA) design, this project examined the effects of the use of accommodations with students of limited English proficiency (LEP) and non-LEP students and whether the use of accommodations affected the validity of test score interpretations. Major accommodations examined were extra time, and extra…
Blue-Terry, Misty; Letowski, Tomasz
The Callsign Acquisition Test (CAT) is a speech intelligibility test developed by the US Army Research Laboratory. The test has been used to evaluate speech transmission through various communication systems but has not been yet sufficiently standardised and validated. The aim of this study was to compare CAT and Modified Rhyme Test (MRT) performance in the presence of white noise across a range of signal-to-noise ratios (SNRs). A group of 16 normal-hearing listeners participated in the study. The speech items were presented at 65 dB(A) in the background of white noise at SNRs of -18, -15, -12, -9 and -6 dB. The results showed a strong positive association (75.14%) between the two tests, but significant differences between the CAT and MRT absolute scores in the range of investigated SNRs. Based on the data, a function to predict CAT scores based on existing MRT scores and vice versa was formulated. STATEMENT OF RELEVANCE: This work compares performance data of a common speech intelligibility test (MRT) with a new test (CAT) in the presence of white noise. The results here can be used as a part of the standardisation procedures and provide insights to the predictive capabilities of the CAT to quantify speech intelligibility communication in high-noise military environments.
Matton, Nadine; Vautier, Stephane; Raufaste, Eric
Mean gain scores for cognitive ability tests between two sessions in a selection setting are now a robust finding, yet not fully understood. Many authors do not attribute such gain scores to an increase in the target abilities. Our approach consists of testing a longitudinal SEM model suitable to this view. We propose to model the scores' changes…
Reardon, Sean F.; Shear, Benjamin R.; Castellano, Katherine E.; Ho, Andrew D.
Test score distributions of schools or demographic groups are often summarized by frequencies of students scoring in a small number of ordered proficiency categories. We show that heteroskedastic ordered probit (HETOP) models can be used to estimate means and standard deviations of multiple groups' test score distributions from such data. Because…
Omirin, M. S.
The study investigated the comparison of the difficulty and discrimination incides of three multiple choice tests using the confidence scoring procedure (CSP). The study was also set to determine whether or not the difficulty and discrimination indices would be improved, if the tests were scored by the confidence scoring procedure. Two null…
Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott
Background Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. Objective To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. Methods This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. Results 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Conclusions Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions. PMID:27900181
Stickley, Christopher D; Hetzler, Ronald K; Wages, Jennifer J; Freemyer, Bret G; Kimura, Iris F
This study examined the appropriate magnitude of allometric scaling of the Wingate anaerobic test (WAnT) power data for body mass (BM) and established normative data for the WAnT for adult men. Eighty-three men completed a standard WAnT using 0.1 kg·kg(-1) BM resistance. Allometric exponents and percentile ranks for 1-second peak power (PP), 5-second PP, and mean power (MP) were established. The Predicted Residual Sum of Squares (PRESS) procedure was used to assess external validity while avoiding data splitting. The mean 1-second PP, 5-second PP, and MP were 1,049.1 ± 168.8 W, 1,013.4 ± 158.6 W, and 777.9 ± 105.0 W, respectively. Allometric exponents for 1-second PP, 5-second PP, and MP scaled for BM were b = 0.89, 0.88, and 0.86, respectively. Correlations between allometrically scaled 1-second PP, 5-second PP, and MP, and BM were r = -0.03, -0.03, and -0.02, respectively, suggesting that the allometric exponents derived were effective in partialling out the effect of BM on WAnT values. The PRESS procedure values resulted in small decreases in R² (0.03, 0.04, and 0.02 for 1-second PP, 5-second PP, and MP, respectively) suggesting acceptable levels of external validity when applied to independent samples. The allometric exponents and normative values provide a useful tool for comparing WAnT scores in college-aged females without the confounding effect of BM. It is suggested that exponents of b = 0.89 (1-second PP), b = 0.88 (5-second PP), and b = 0.86 (MP) be used for allometrically scaling WAnT power values in healthy adult men and that the confidence limits for these allometric exponents be considered as 0.66-1.0 for PP and 0.69-1.0 for MP. The use of these exponents in allometric scaling of male WAnT power values provide coaches and practitioners with valid means for comparing power production between individuals without the confounding influence of BM.
Zapata-Rivera, Diego, Ed.; Zwick, Rebecca, Ed.
This volume includes 3 papers based on presentations at a workshop on communicating assessment information to particular audiences, held at Educational Testing Service (ETS) on November 4th, 2010, to explore some issues that influence score reports and new advances that contribute to the effectiveness of these reports. Jessica Hullman, Rebecca…
Van Patten, James J.
High stakes testing is associated with controversy and dialogue in this era of calls for accountability on the part of educators, and the controversy has been strengthened by the national testing plans included in the Leave No Child Behind Act (Elementary and Secondary Education Act of 2001). A look at the literature on high stakes testing and…
Gentry, Ruben; Stokes, Dorothy
Many African Americans were imbued with the cliché that they must work twice as hard as others to be a success in life. Entering college, students with this belief put extensive effort into earning top grades to ensure quality preparation for their chosen career; yet, some fail to earn top scores. Why? This is the million dollar question, but the…
Carraway, Cassandra T.
A study was conducted to determine whether participation in a test-taking strategy seminar significantly decreased test anxiety in first-year nursing students. The study also sought to compare nursing test scores of first-year nursing students who participated in the seminar with those who did not. The sample consisted of 30 first-year nursing…
Viliūnas, V; Lukauskiene, R; Svegzda, A; Zukauskas, A
The scoring artefact in the Farnsworth-Munsell 100-Hue test, arising from the grouping of the caps into four boxes, was investigated. The traditional method of scoring performed with the numbers of the anchor caps disregarded and the alternative scoring performed with the numbers of the anchor caps employed, were compared. For the traditional method of scoring, we revealed an increase of the error score of the outside (end-box) caps when the total error score was above 240. On the contrary for scoring performed with the numbers of the anchor caps employed, the difference between the error score of the outside caps and the average error per cap is not significant. To mitigate the end-box artefact and to improve the reliability of the Farnsworth-Munsell 100-Hue test, corrections to the traditional method of scoring are proposed.
... 21 Food and Drugs 8 2014-04-01 2014-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
... 21 Food and Drugs 8 2012-04-01 2012-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
... 21 Food and Drugs 8 2013-04-01 2013-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
Blackburn, McKinley L.
Previous research has suggested that skills reflected in test-score performance on tests such as the Armed Forces Qualification Test (AFQT) can account for some of the racial differences in average wages. I use a more complete set of test scores available with the National Longitudinal Survey of Youth 1979 Cohort to reconsider this evidence, and…
Rosselli, M; Ardila, A; Bateman, J R; Guzmán, M
Limited information is currently available about performance of Spanish-speaking children on different neuropsychological tests. This study was designed to (a) analyze the effects of age and sex on different neuropsychological test scores of a randomly selected sample of Spanish-speaking children, (b) analyze the value of neuropsychological test scores for predicting school performance, and (c) describe the neuropsychological profile of Spanish-speaking children with learning disabilities (LD). Two hundred ninety (141 boys, 149 girls) 6- to 11-year-old children were selected from a school in Bogotá, Colombia. Three age groups were distinguished: 6- to 7-, 8- to 9-, and 10- to 11-year-olds. Performance was measured utilizing the following neuropsychological tests: Seashore Rhythm Test, Finger Tapping Test (FTT), Grooved Pegboard Test, Children's Category Test (CCT), California Verbal Learning Test-Children's Version (CVLT-C), Benton Visual Retention Test (BVRT), and Bateria Woodcock Psicoeducativa en Español (Woodcock, 1982). Normative scores were calculated. Age effect was significant for most of the test scores. A significant sex effect was observed for 3 test scores. Intercorrelations were performed between neuropsychological test scores and academic areas (science, mathematics, Spanish, social studies, and music). In a post hoc analysis, children presenting very low scores on the reading, writing, and arithmetic achievement scales of the Woodcock battery were identified in the sample, and their neuropsychological test scores were compared with a matched normal group. Finally, a comparison was made between Colombian and American norms.
Morash, Valerie S; McKerracher, Amanda
The most common and advocated assessment approach when a child cannot access visual materials is to use the verbal subscales of a test the psychologist already has and is familiar with. However, previous research indicates that children with visual impairments experience atypical verbal development. This raises the question of whether verbal subscale scores retain their reliability and interpretation validity when given to children with visual impairments. To answer this question, we administered a vocabulary subscale from a common intelligence test along with several nonverbal subscales to 15 early-blind adolescents (onset of ≤2 years). Reliability of only the vocabulary test scores was insufficient for high-stakes testing. This finding points to the broader issue of difficulties in assessing populations of exceptional children who experience atypical development trajectories, possibly making their assessment with common tests inappropriate. (PsycINFO Database Record
Troll, Lillian E.; And Others
After seven years, a group (N=32) of originally nonemployed poverty-level older people (over 60) now employed as foster grandparents were retested with the WAIS. Three subtest scores showed stability and Digit Span showed a statistically significant drop. Neither age nor initial level of health or WAIS scores was related to test-score changes over…
Zwick, Rebecca; Zapata-Rivera, Diego; Hegarty, Mary
Research has shown that many educators do not understand the terminology or displays used in test score reports and that measurement error is a particularly challenging concept. We investigated graphical and verbal methods of representing measurement error associated with individual student scores. We created four alternative score reports, each…
Over the past five years, both DC Public Schools (DCPS) and public charter schools (PCS) have seen significant growth in secondary reading and math scores on the state test known as the District of Columbia Comprehensive Assessment System (DC CAS). However, scores have not improved as much at the elementary level. Reading and math scores for DCPS…
Cobb, Ora, Jr.; Lindle, Jane Clark; Rinehart, James S.
This paper explores Kentucky's Education Reform Act (KERA) for improving at-risk students' scores to see if the strategies in one middle school improved standardized and state-performance-based assessment results. The study encompasses two purposes: to use a forced-entry regression model to detect which independent variables were predictors of…
The purpose of this study was to determine if there was a significant difference in scores on the Mississippi Algebra I SATP2 when one group was allowed to use programs and the other group was not allowed to use programs on TI-84 calculators. An additional purpose of the study was also to determine if there was a significant difference in the…
Brannigan, Gary G.; And Others
Compares the Qualitative Scoring System and the Developmental Scoring Systems, both Bender-Gestalt tests, in predicting achievement on the Metropolitan Achievement Test (MAT). In this study, first through fourth graders (n=409) from regular elementary schools were subjected to both tests; both systems correlated significantly with school…
Das, Jishnu; Dercon, Stefan; Habyarimana, James; Krishnan, Pramila; Muralidharan, Karthik; Sundararaman, Venkatesh
Empirical studies of the relationship between school inputs and test scores typically do not account for the fact that households will respond to changes in school inputs. We present a dynamic household optimization model relating test scores to school and household inputs, and test its predictions in two very different low-income country…
Klesch, Heather S.
The reporting of scores on educational tests is at times misunderstood, misinterpreted, and potentially confusing to examinees and other stakeholders who may need to interpret test scores. In reporting test results to examinees, there is a need for clarity in the message communicated. As pressure rises for students to demonstrate performance at a…
Carroll, John B.
The problem of determining relative weights for quantity and quality in scoring foreign language speaking and writing fluency tests is studied. French speaking and writing fluency tests were administered to students of French in several schools in England. Data from these tests was analyzed to support the suggestion that scoring formulas should…
The relationship between the National League for Nursing (NLN) achievement test scores and performance on the State Board Test Pool Examination (SBTPE) was studied with 166 graduates of a diploma degree school of nursing between 1976 and 1978. It was found that NLN achievement test scores had a highly significant correlation with SBTPE results.…
Cornwell, Christopher; Mustard, David B.; Van Parys, Jessica
Using data from the 1998-99 ECLS-K cohort, we show that the grades awarded by teachers are not aligned with test scores. Girls in every racial category outperform boys on reading tests, while boys score at least as well on math and science tests as girls. However, boys in all racial categories across all subject areas are not represented in…
An earlier Digest described the shortcomings of three methods commonly used to summarize changes in test scores. This Digest describes two less commonly used approaches for examining changes in test scores, those of Standardized Growth Estimates and Effect Sizes. Aspects of these two approaches are combined and applied to the Iowa Test of Basic…
Papanastasiou, Elena C.; Reckase, Mark D.
Because of the increased popularity of computerized adaptive testing (CAT), many admissions tests, as well as certification and licensure examinations, have been transformed from their paper-and-pencil versions to computerized adaptive versions. A major difference between paper-and-pencil tests and CAT from an examinee's point of view is that in…
Ven, A. H. G. S. van der
A more generalized error model for time-limit tests is developed. Model estimates are derived for right-attempted and wrong-attempted correlations both within the same test and between different tests. A comparison is made between observed correlations and their model counterparts and a fair agreement is found between observed and expected…
Gavett, Brandon E
The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed.
Williams, Thomas O; Eaves, Ronald C; Woods-Groves, Suzanne; Mariano, Gina
The test-retest stability of the Slosson Full-Range Intelligence Test by Algozzine, Eaves, Mann, and Vance was investigated with test scores from a sample of 103 students. With a mean interval of 13.7 mo. and different examiners for each of the two test administrations, the test-retest reliability coefficients for the Full-Range IQ, Verbal Reasoning, Abstract Reasoning, Quantitative Reasoning, and Memory were .93, .85, .80, .80, and .83, respectively. Mean differences from the test-retest scores were not statistically significantly different for any of the scales. Results suggest that Slosson scores are stable over time even when different examiners administer the test.
Lee, Yi-Hsuan; von Davier, Alina A
Maintaining a stable score scale over time is critical for all standardized educational assessments. Traditional quality control tools and approaches for assessing scale drift either require special equating designs, or may be too time-consuming to be considered on a regular basis with an operational test that has a short time window between an administration and its score reporting. Thus, the traditional methods are not sufficient to catch unusual testing outcomes in a timely manner. This paper presents a new approach for score monitoring and assessment of scale drift. It involves quality control charts, model-based approaches, and time series techniques to accommodate the following needs of monitoring scale scores: continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and assessment of autocorrelation. Performance of the methodologies is evaluated using manipulated data based on real responses from 71 administrations of a large-scale high-stakes language assessment.
Ahn, Shin; Lee, Hyeji; Choi, Wookjin; Ahn, Ryeok; Hong, Jung-Suk; Sohn, Chang Hwan; Seo, Dong Woo; Lee, Yoon-Seon; Lim, Kyung Soo; Kim, Won Young
Objective We tried to evaluate the accuracy of the heel drop test in patients with suspected appendicitis and tried to develop a new clinical score, which incorporates the heel drop test and other parameters, for the diagnosis of this condition. Methods We performed a prospective observational study on adult patients with suspected appendicitis at two academic urban emergency departments between January and August 2015. The predictive characteristics of each parameter, along with heel drop test results were calculated. A composite score was generated by logistic regression analysis. The performance of the generated score was compared to that of the Alvarado score. Results Of the 292 enrolled patients, 165 (56.5%) had acute appendicitis. The heel drop test had a higher predictive value than rebound tenderness. Variables and their points included in the new (MESH) score were pain migration (2), elevated white blood cell (WBC) >10,000/μL (3), shift to left (2), and positive heel drop test (3). The MESH score had a higher AUC than the Alvarado score (0.805 vs. 0.701). Scores of 5 and 11 were chosen as cut-off values; a MESH score ≥5 compared to an Alvarado score ≥5, and a MESH score ≥8 compared to an Alvarado score ≥7 showed better performance in diagnosing appendicitis. Conclusion MESH (migration, elevated WBC, shift to left, and heel drop test) is a simple clinical scoring system for assessing patients with suspected appendicitis and is more accurate than the Alvarado score. Further validation studies are needed. PMID:27723842
Jancarík, Antonín; Kostelecká, Yvona
Electronic testing has become a regular part of online courses. Most learning management systems offer a wide range of tools that can be used in electronic tests. With respect to time demands, the most efficient tools are those that allow automatic assessment. The presented paper focuses on one of these tools: matching questions in which one…
Simner, Marvin L.
The Printing Performance School Readiness Test is an empirically derived instrument designed to aid in the early identification of preschool children who are at risk for school failure. The test is based on the outcome of a research program dealing with various aspects of children's printing that involved over 400 normal, non-repeating, native…
Zou, Xiao-Ling; Chen, Yan-Min
The effects of computer and paper test media on EFL test-takers with different computer familiarity in writing scores and in the cognitive writing process have been comprehensively explored from the learners' aspect as well as on the basis of related theories and practice. The results indicate significant differences in test scores among the…
King, Molly Elizabeth
The purpose of this quantitative, causal-comparative study was to compare the effect elementary music and visual arts lessons had on third through sixth grade standardized mathematics test scores. Inferential statistics were used to compare the differences between test scores of students who took in-school, elementary, music instruction during the…
Ebuoh, Casmir N.; Ezeudu, S. A.
The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…
Schachter, Steven; And Others
Examined relative utility of two scoring systems for Modified Version of Bender-Gestalt Test in predicting performance on Developmental Test of Visual-Motor Integration. Findings from 53 kindergarten and 47 first grade students indicated that Qualitative Scoring System was significantly better predictor of visual-motor integration skills than…
A substantial body of evidence has shown large academic test score gaps between black and white students in early childhood. These gaps remain, and probably grow, as students progress through school. Many researchers have sought to explain these persistent test score gaps, and particularly, to understand the role of students' socio-economic status…
We apply a quantile version of the Oaxaca-Blinder decomposition to estimate the counterfactual distribution of the test scores of Black students. In the Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999 (ECLS-K), we find that the gap initially appears only at the top of the distribution of test scores. As children age, however,…
This paper assesses the magnitude of the non-indigenous/indigenous test-score gap for third-year and fourth-year primary school pupils in Peru, in relation to the main family, school and peer inputs contributing to the test-score gap using the estimation method of feasible generalized least squares. The article then decomposes the gap into its…
May, Deborah C.; Welch, Edward L.
Examined the relationship between early school retention as a result of preschool and kindergarten developmental testing and children's later academic achievement (N=223). Results showed children who scored as immature on the Gesell Screening Test and who were retained a year had the lowest scores on all measures. (JAC)
Jones, Maryann Clementi
A study determined the relationship between life stress and reading comprehension test scores on the IOWA Tests of Basic Skills. Subjects, 41 middle-school students attending Lincoln School in Garwood, New Jersey, were surveyed as to the amount of life stress prevalent in their lives. In addition, the Iowa scores for reading comprehension were…
Xi, Xiaoming; Mollaun, Pam
We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…
Silles, Mary A.
This article, using longitudinal data from the National Child Development Study, presents new evidence on the effects of family size and birth order on test scores and behavioral development at age 7, 11 and 16. Sibling size is shown to have an adverse causal effect on test scores and behavioral development. For any given family size, first-borns…
Pellicer-Sanchez, Ana; Schmitt, Norbert
Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there…
Berends, Mark; Penaloza, Roberto V.
Background/Context: Although there has been progress in closing the test score gaps among student groups over past decades, that progress has stalled. Many researchers have speculated why the test score gaps closed between the early 1970s and the early 1990s, but only a few have been able to empirically study how changes in school factors and…
Mertler, Craig A.
This book is designed to help K-12 teachers and administrators understand the nature of standardized tests and, in particular, the scores that result from them. This useful manual helps teachers develop the skills necessary to incorporate these test scores into various types of instructional decision making--a process known as "data-driven…
Cascallar, Alicia S.; Dorans, Neil J.
This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…
Cascallar, Alicia S.; Dorans, Neil J.
This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…
Increasing standardized test scores in reading and math is of high importance to the California Department of Education to meet requirements mandated by the No Child Left Behind (NCLB) act of 2001. More research is needed to understand the best ways to improve tests scores to meet concerns of the NCLB act. The purpose of the study was to evaluate…
Correlational evidence suggests that high school GPA is better than admission test scores in predicting first-year college GPA, although test scores have incremental predictive validity. The usefulness of a selection variable in making admission decisions depends in part on its predictive validity, but also on institutions' selectivity and…
Lockwood, J. R.; McCaffrey, Daniel F.
A common strategy for estimating treatment effects in observational studies using individual student-level data is analysis of covariance (ANCOVA) or hierarchical variants of it, in which outcomes (often standardized test scores) are regressed on pretreatment test scores, other student characteristics, and treatment group indicators. Measurement…
Müller, Thomas; Meisel, Margareta; Russ, Herrmann; Przuntek, Horst
Farnsworth-Munsell 100 Hue test (FMT) error scores and peg insertion abilities significantly differ between Parkinson's disease (PD) patients and controls. Both tasks ask for performance of voluntary movements. The objective of this study was to demonstrate a relation between FMT error scores and peg insertion outcomes. We successively performed both tasks in 28 previously untreated PD patients. The FMT error score was significantly (p=0.016) lower in patients with better peg insertion outcome. A significant (Spearman R=0.47, p=0.012) correlation between peg insertion results and the FMT error scores appeared. Motor impairment influences FMT error scores in PD patients.