standardized test score: Topics by Science.gov

Sample records for standardized test score

Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests.

PubMed

Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas

2016-11-14

Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.
How Accurate Is a Test Score?

ERIC Educational Resources Information Center

Doppelt, Jerome E.

1956-01-01

The standard error of measurement as a means for estimating the margin of error that should be allowed for in test scores is discussed. The true score measures the performance that is characteristic of the person tested; the variations, plus and minus, around the true score describe a characteristic of the test. When the standard deviation is used…
The Impact of Inclusion and Resource Instruction on Standardized Test Scores of Special Education Students

ERIC Educational Resources Information Center

Derico, Vontrice L.

2017-01-01

The purpose of the proposed quasi-experimental quantitative study was to determine if students who were taught in the inclusive setting yielded higher standardized test scores compared to students who were taught in the resource setting. The researcher analyzed the standardized test scores, in the areas of Language Arts, Reading, and Mathematics…
Difficulties Using Standardized Tests to Identify the Receptive Expressive Gap in Bilingual Children's Vocabularies.

PubMed

Gibson, Todd A; Oller, D Kimbrough; Jarmulowicz, Linda

2018-03-01

Receptive standardized vocabulary scores have been found to be much higher than expressive standardized vocabulary scores in children with Spanish as L1, learning L2 (English) in school (Gibson et al., 2012). Here we present evidence suggesting the receptive-expressive gap may be harder to evaluate than previously thought because widely-used standardized tests may not offer comparable normed scores. Furthermore monolingual Spanish-speaking children tested in Mexico and monolingual English-speaking children in the US showed other, yet different statistically significant discrepancies between receptive and expressive scores. Results suggest comparisons across widely used standardized tests in attempts to assess a receptive-expressive gap are precarious.
The Truth about Scores Children Achieve on Tests.

ERIC Educational Resources Information Center

Brown, Jonathan R.

1989-01-01

The importance of using the standard error of measurement (SEm) in determining reliability in test scores is emphasized. The SEm is compared to the hypothetical true score for standardized tests, and procedures for calculation of the SEm are explained. (JDD)
Low aerobic fitness and obesity are associated with lower standardized test scores in children.

PubMed

Roberts, Christian K; Freed, Benjamin; McCarthy, William J

2010-05-01

To investigate whether aerobic fitness and obesity in school children are associated with standardized test performance. Ethnically diverse (n = 1989) 5th, 7th, and 9th graders attending California schools comprised the sample. Aerobic fitness was determined by a 1-mile run/walk test; body mass index (BMI) was obtained from state-mandated measurements. California standardized test scores were obtained from the school district. Students whose mile run/walk times exceeded California Fitnessgram standards or whose BMI exceeded Centers for Disease Control sex- and age-specific body weight standards scored lower on California standardized math, reading, and language tests than students with desirable BMI status or fitness level, even after controlling for parent education among other covariates. Ethnic differences in standardized test scores were consistent with ethnic differences in obesity status and aerobic fitness. BMI-for-age was no longer a significant multivariate predictor when covariates included fitness level. Low aerobic fitness is common among youth and varies among ethnic groups, and aerobic fitness level predicts performance on standardized tests across ethnic groups. More research is needed to uncover the physiological mechanisms by which aerobic fitness may contribute to performance on standardized academic tests.
The Impact of Scholastic Instrumental Music and Scholastic Chess Study on the Standardized Test Scores of Students in Grades Three, Four, and Five

ERIC Educational Resources Information Center

Martinez, Edwin E.

2012-01-01

This study examines the impact of instrumental music study and group chess lessons on the standardized test scores of suburban elementary public school students (grades three through five) in Levittown, New York. The study divides the students into the following groups and compares the standardized test scores of each: a) instrumental music…
Assessing Growth in Young Children: A Comparison of Raw, Age-Equivalent, and Standard Scores Using the Peabody Picture Vocabulary Test

ERIC Educational Resources Information Center

Sullivan, Jeremy R.; Winter, Suzanne M.; Sass, Daniel A.; Svenkerud, Nicole

2014-01-01

Many tests provide users with several different types of scores to facilitate interpretation and description of students' performance. Common examples include raw scores, age- and grade-equivalent scores, and standard scores. However, when used within the context of assessing growth among young children, these scores should not be interchangeable…
Towards reporting standards for neuropsychological study results: A proposal to minimize communication errors with standardized qualitative descriptors for normalized test scores.

PubMed

Schoenberg, Mike R; Rum, Ruba S

2017-11-01

Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of consistency in how neuropsychological scores are communicated. A major limitation in the communication of neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the use of vague terminology. PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus statements for the description and reporting of qualitative terms to communicate neuropsychological test scores was conducted. The review found the use of confusing and overlapping terms to describe various ranges of percentile standardized test scores. In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple) as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: 'very superior', 'superior', 'high average', 'average', 'low average', 'borderline' and 'abnormal/impaired'. A case example illustrates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for neurosurgical planning. The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize communication of standardized neuropsychological test scores. Research are needed to further evaluate neuropsychological communication errors. Conveying the clinical implications of neuropsychological results in a manner that minimizes risk for communication errors is a quintessential component of evidence-based practice. Copyright © 2017 Elsevier B.V. All rights reserved.
Verification of learner’s differences by team-based learning in biochemistry classes

PubMed Central

2017-01-01

Purpose We tested the effect of team-based learning (TBL) on medical education through the second-year premedical students’ TBL scores in biochemistry classes over 5 years. Methods We analyzed the results based on test scores before and after the students’ debate. The groups of students for statistical analysis were divided as follows: group 1 comprised the top-ranked students, group 3 comprised the low-ranked students, and group 2 comprised the medium-ranked students. Therefore, group T comprised 382 students (the total number of students in group 1, 2, and 3). To calibrate the difficulty of the test, original scores were converted into standardized scores. We determined the differences of the tests using Student t-test, and the relationship between scores before, and after the TBL using linear regression tests. Results Although there was a decrease in the lowest score, group T and 3 showed a significant increase in both original and standardized scores; there was also an increase in the standardized score of group 3. There was a positive correlation between the pre- and the post-debate scores in group T, and 2. And the beta values of the pre-debate scores and “the changes between the pre- and post-debate scores” were statistically significant in both original and standardized scores. Conclusion TBL is one of the educational methods for helping students improve their grades, particularly those of low-ranked students. PMID:29207457
Comparing the Effects of Elementary Music and Visual Arts Lessons on Standardized Mathematics Test Scores

ERIC Educational Resources Information Center

King, Molly Elizabeth

2016-01-01

The purpose of this quantitative, causal-comparative study was to compare the effect elementary music and visual arts lessons had on third through sixth grade standardized mathematics test scores. Inferential statistics were used to compare the differences between test scores of students who took in-school, elementary, music instruction during the…
A Brief Look at: Test Scores and the Standard Error of Measurement. E&R Report No. 10.13

ERIC Educational Resources Information Center

Holdzkom, David; Sumner, Brian; McMillen, Brad

2010-01-01

In the context of standardized testing, the standard error of measurement (SEM) is a measure of the factors other than the student's actual knowledge of the tested material that may affect the student's test score. Such factors may include distractions in the testing environment, fatigue, hunger, or even luck. This means that a student's observed…
Impact of a standardized test package on exit examination scores and NCLEX-RN outcomes.

PubMed

Homard, Catherine M

2013-03-01

The purpose of this ex post facto correlational study was to compare exit examination scores and NCLEX-RN(®) pass rates of baccalaureate nursing students who differed in level of participation in a standardized test package. Three cohort groups emerged as a standardized test package was introduced: (a) students who did not participate in a standardized test package; (b) students with two semesters of a standardized test package; and (c) students with four semesters of a standardized test package. Benner's novice-to-expert theory framed the study in the belief that students best acquire knowledge and skills through practice and reflection. Students participating in four semesters of a standardized test package demonstrated higher exit examination scores and NCLEX-RN pass rates compared with students who did not participate in this package. This study's results could inform nurse educators about strategies to facilitate nursing student success on exit examinations and the NCLEX-RN. Copyright 2013, SLACK Incorporated.
Kernel Equating Under the Non-Equivalent Groups With Covariates Design

PubMed Central

Bränberg, Kenny

2015-01-01

When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests. PMID:29881012
Kernel Equating Under the Non-Equivalent Groups With Covariates Design.

PubMed

Wiberg, Marie; Bränberg, Kenny

2015-07-01

When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests.
Comparing Standard Deviation Effects across Contexts

ERIC Educational Resources Information Center

Ost, Ben; Gangopadhyaya, Anuj; Schiman, Jeffrey C.

2017-01-01

Studies using tests scores as the dependent variable often report point estimates in student standard deviation units. We note that a standard deviation is not a standard unit of measurement since the distribution of test scores can vary across contexts. As such, researchers should be cautious when interpreting differences in the numerical size of…
Consistency of Standard Setting in an Augmented State Testing System

ERIC Educational Resources Information Center

Lissitz, Robert W.; Wei, Hua

2008-01-01

In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing…
Proficiency Standards and Cut-Scores for Language Proficiency Tests.

ERIC Educational Resources Information Center

Moy, Raymond H.

1984-01-01

Discusses the problems associated with "grading on a curve," the approach often used for standard setting on language proficiency tests. Proposes four main steps presented in the setting of a non-arbitrary cut-score. These steps not only establish a proficiency standard checked by external criteria, but also check to see that the test covers the…
Comparison of Standardized Test Scores from Traditional Classrooms and Those Using Problem-Based Learning

ERIC Educational Resources Information Center

Needham, Martha Elaine

2010-01-01

This research compares differences between standardized test scores in problem-based learning (PBL) classrooms and a traditional classroom for 6th grade students using a mixed-method, quasi-experimental and qualitative design. The research shows that problem-based learning is as effective as traditional teaching methods on standardized tests. The…
Improved auscultation skills in paramedic students using a modified stethoscope.

PubMed

Simon, Erin L; Lecat, Paul J; Haller, Nairmeen A; Williams, Carolyn J; Martin, Scott W; Carney, John A; Pakiela, John A

2012-12-01

The Ventriloscope® (Lecat's SimplySim, Tallmadge, OH) is a modified stethoscope used as a simulation training device for auscultation. To test the effectiveness of the Ventriloscope as a training device in teaching heart and lung auscultatory findings to paramedic students. A prospective, single-hospital study conducted in a paramedic-teaching program. The standard teaching group learned heart and lung sounds via audiocassette recordings and lecture, whereas the intervention group utilized the modified stethoscope in conjunction with patient volunteers. Study subjects took a pre-test, post-test, and a follow-up test to measure recognition of heart and lung sounds. The intervention group included 22 paramedic students and the standard group included 18 paramedic students. Pre-test scores did not differ using two-sample t-tests (standard group: t [16]=-1.63, p=0.12) and (intervention group: t [20]=-1.17, p=0.26). Improvement in pre-test to post-test scores was noted within each group (standard: t [17]=2.43, p=0.03; intervention: t [21]=4.81, p<0.0001). Follow-up scores for the standard group were not different from pre-test scores of 16.06 (t [17]=0.94, p=0.36). However, follow-up scores for the intervention group significantly improved from their respective pre-test score of 16.05 (t [21]=2.63, p=0.02). Simulation training using a modified stethoscope in conjunction with standardized patients allows for realistic learning of heart and lung sounds. This technique of simulation training achieved proficiency and better retention of heart and lung sounds in a safe teaching environment. Copyright © 2012 Elsevier Inc. All rights reserved.

The academic penalty for gaining weight: a longitudinal, change-in-change analysis of BMI and perceived academic ability in middle school students.

PubMed

Kenney, E L; Gortmaker, S L; Davison, K K; Bryn Austin, S

2015-09-01

Worse educational outcomes for obese children regardless of academic ability may begin early in the life course. This study tested whether an increase in children's relative weight predicted lower teacher- and child-perceived academic ability even after adjusting for standardized test scores. Three thousand three hundred and sixty-two children participating in the Early Childhood Longitudinal Study-Kindergarten Cohort were studied longitudinally from fifth to eighth grade. Heights, weights, standardized test scores in maths and reading, and teacher and self-ratings of ability in maths and reading were measured at each wave. Longitudinal, within-child linear regression models estimated the impact of a change in body mass index (BMI) z-score on change in normalized teacher and student ratings of ability in reading and maths, adjusting for test score. A change in BMI z-score from fifth to eighth grade was not independently associated with a change in standardized test scores. However, adjusting for standardized test scores, an increasing BMI z-score was associated with significant reductions in teacher's perceptions of girls' ability in reading (-0.12, 95% confidence interval (CI): -0.23, -0.03, P=0.03) and boys' ability in math (-0.30, 95% CI: -0.43, -0.17, P<0.001). Among children who were overweight at fifth grade and increased in BMI z-score, there were even larger reductions in teacher ratings for boys' reading ability (-0.37, 95% CI: -0.71, -0.03, P=0.03) and in girls' self-ratings of maths ability (-0.47, 95% CI: -0.83, -0.11, P=0.01). From fifth to eighth grade, increase in BMI z-score was significantly associated with worsening teacher perceptions of academic ability for both boys and girls, regardless of objectively measured ability (standardized test scores). Future research should examine potential interventions to reduce bias and promote positive school climate.
A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement. ACT Research Report Series, 2013 (7)

ERIC Educational Resources Information Center

Woodruff, David; Traynor, Anne; Cui, Zhongmin; Fang, Yu

2013-01-01

Professional standards for educational testing recommend that both the overall standard error of measurement and the conditional standard error of measurement (CSEM) be computed on the score scale used to report scores to examinees. Several methods have been developed to compute scale score CSEMs. This paper compares three methods, based on…
The Influence of Foreign Language Learning during Early Childhood on Standardized Test Scores

ERIC Educational Resources Information Center

Shaw, Tommetta

2010-01-01

Increasing standardized test scores in reading and math is of high importance to the California Department of Education to meet requirements mandated by the No Child Left Behind (NCLB) act of 2001. More research is needed to understand the best ways to improve tests scores to meet concerns of the NCLB act. The purpose of the study was to evaluate…
Cognitive skills, student achievement tests, and schools.

PubMed

Finn, Amy S; Kraft, Matthew A; West, Martin R; Leonard, Julia A; Bish, Crystal E; Martin, Rebecca E; Sheridan, Margaret A; Gabrieli, Christopher F O; Gabrieli, John D E

2014-03-01

Cognitive skills predict academic performance, so schools that improve academic performance might also improve cognitive skills. To investigate the impact schools have on both academic performance and cognitive skills, we related standardized achievement-test scores to measures of cognitive skills in a large sample (N = 1,367) of eighth-grade students attending traditional, exam, and charter public schools. Test scores and gains in test scores over time correlated with measures of cognitive skills. Despite wide variation in test scores across schools, differences in cognitive skills across schools were negligible after we controlled for fourth-grade test scores. Random offers of enrollment to oversubscribed charter schools resulted in positive impacts of such school attendance on math achievement but had no impact on cognitive skills. These findings suggest that schools that improve standardized achievement-test scores do so primarily through channels other than improving cognitive skills.
IQ Scores Should Be Corrected for the Flynn Effect in High-Stakes Decisions

ERIC Educational Resources Information Center

Fletcher, Jack M.; Stuebing, Karla K.; Hughes, Lisa C.

2010-01-01

IQ test scores should be corrected for high stakes decisions that employ these assessments, including capital offense cases. If scores are not corrected, then diagnostic standards must change with each generation. Arguments against corrections, based on standards of practice, information present and absent in test manuals, and related issues,…
Do School-Based Tutoring Programs Significantly Improve Student Performance on Standardized Tests?

ERIC Educational Resources Information Center

Rothman, Terri; Henderson, Mary

2011-01-01

This study used a pre-post, nonequivalent control group design to examine the impact of an in-district, after-school tutoring program on eighth grade students' standardized test scores in language arts and mathematics. Students who had scored in the near-passing range on either the language arts or mathematics aspect of a standardized test at the…
Understanding the Role of "SES," Ethnicity, and Discipline Infractions in Students' Standardized Test Scores

ERIC Educational Resources Information Center

Koca, Fatih

2017-01-01

The goal of the current study is to examine the impact of students' social economic status, ethnicity, and discipline infractions on their standardized test scores in Indiana, the USA. Data from this study extracted from Indiana Department of Education. ISTEP is a criterion-referenced standardized test. It consists of items that assess a student's…
What "No Child Left Behind" Leaves behind: The Roles of IQ and Self-Control in Predicting Standardized Achievement Test Scores and Report Card Grades

ERIC Educational Resources Information Center

Duckworth, Angela L.; Quinn, Patrick D.; Tsukayama, Eli

2012-01-01

The increasing prominence of standardized testing to assess student learning motivated the current investigation. We propose that standardized achievement test scores assess competencies determined more by intelligence than by self-control, whereas report card grades assess competencies determined more by self-control than by intelligence. In…
The impact of testing accommodations on MCAT scores: descriptive results.

PubMed

Julian, Ellen R; Ingersoll, Deborah J; Etienne, Patricia M; Hilger, Anthony E

2004-04-01

Medical College Admission Test (MCAT) examinees with disabilities who receive accommodations receive flagged scores indicating nonstandard administration. This report compares MCAT examinees who received accommodations and their performances with standard examinees. Aggregate history records of all 1994-2000 MCAT examinees were identified as flagged (2,401) or standard (297,880), then further sorted by race/ethnicity (broadly identified as underrepresented minority and non-URM, at the time of testing) and gender. Those with flagged scores were also classified by disability (LD = learning disability, ADHD = attention deficit hyperactivity disorder, LD/ADHD = learning disability and attention deficit hyperactivity disorder, and Other = other disability) and type of accommodation. Mean MCAT scores were calculated for all groups. A group of 866 examinees took the MCAT first as a standard administration and subsequently with accommodations. In a separate analysis, their two sets of scores were compared. Less than 1% of examinees (2,401) had accommodations; of these, 55% were LD, 17% ADHD, 5% LD/ADHD, and 23% Other. Extended time was the most frequently provided accommodation. Mean flagged scores slightly exceeded mean standard scores on all MCAT sections. Examinees who retook the MCAT with accommodations after a standard administration increased their scores by six points, quadrupling the average gain Standard-Standard retest cohort from another study. The small but statistically significant different higher flagged scores may reflect either appropriate compensation or overly generous accommodations. Extended time had a positive impact on the scores of those who retested with this accommodation. The validity the flagged MCAT in predicting success in medical school is not known, and further investigation is underway.
Conditional standard errors of measurement for composite scores on the Wechsler Preschool and Primary Scale of Intelligence-Third Edition.

PubMed

Price, Larry R; Raju, Nambury; Lurie, Anna; Wilkins, Charles; Zhu, Jianjun

2006-02-01

A specific recommendation of the 1999 Standards for Educational and Psychological Testing by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education is that test publishers report estimates of the conditional standard error of measurement (SEM). Procedures for calculating the conditional (score-level) SEM based on raw scores are well documented; however, few procedures have been developed for estimating the conditional SEM of subtest or composite scale scores resulting from a nonlinear transformation. Item response theory provided the psychometric foundation to derive the conditional standard errors of measurement and confidence intervals for composite scores on the Wechsler Preschool and Primary Scale of Intelligence-Third Edition.
Conditional Standard Errors of Measurement for Composite Scores Using IRT

ERIC Educational Resources Information Center

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan

2012-01-01

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
How Much Do Test Scores Vary among School Districts? New Estimates Using Population Data, 2009-2015. CEPA Working Paper No. 17-02

ERIC Educational Resources Information Center

Fahle, Erin M.; Reardon, Sean F.

2017-01-01

This paper provides the first population-based evidence on how much standardized test scores vary among public school districts within each state and how segregation explains that variation. Using roughly 300 million standardized test score records in math and ELA for grades 3 through 8 from every U.S. public school district during the 2008-09 to…
The value of Bayes' theorem for interpreting abnormal test scores in cognitively healthy and clinical samples.

PubMed

Gavett, Brandon E

2015-03-01

The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed.
Predictors of medical school clerkship performance: a multispecialty longitudinal analysis of standardized examination scores and clinical assessments.

PubMed

Casey, Petra M; Palmer, Brian A; Thompson, Geoffrey B; Laack, Torrey A; Thomas, Matthew R; Hartz, Martha F; Jensen, Jani R; Sandefur, Benjamin J; Hammack, Julie E; Swanson, Jerry W; Sheeler, Robert D; Grande, Joseph P

2016-04-27

Evidence suggests that poor performance on standardized tests before and early in medical school is associated with poor performance on standardized tests later in medical school and beyond. This study aimed to explore relationships between standardized examination scores (before and during medical school) with test and clinical performance across all core clinical clerkships. We evaluated characteristics of 435 students at Mayo Medical School (MMS) who matriculated 2000-2009 and for whom undergraduate grade point average, medical college aptitude test (MCAT), medical school standardized tests (United States Medical Licensing Examination [USMLE] 1 and 2; National Board of Medical Examiners [NBME] subject examination), and faculty assessments were available. We assessed the correlation between scores and assessments and determined USMLE 1 cutoffs predictive of poor performance (≤10th percentile) on the NBME examinations. We also compared the mean faculty assessment scores of MMS students vs visiting students, and for the NBME, we determined the percentage of MMS students who scored at or below the tenth percentile of first-time national examinees. MCAT scores correlated robustly with USMLE 1 and 2, and USMLE 1 and 2 independently predicted NBME scores in all clerkships. USMLE 1 cutoffs corresponding to poor NBME performance ranged from 220 to 223. USMLE 1 scores were similar among MMS and visiting students. For most academic years and clerkships, NBME scores were similar for MMS students vs all first-time examinees. MCAT, USMLE 1 and 2, and subsequent clinical performance parameters were correlated with NBME scores across all core clerkships. Even more interestingly, faculty assessments correlated with NBME scores, affirming patient care as examination preparation. USMLE 1 scores identified students at risk of poor performance on NBME subject examinations, facilitating and supporting implementation of remediation before the clinical years. MMS students were representative of medical students across the nation.
Technical analysis of the Slosson Written Expression Test.

PubMed

Erford, Bradley T; Hofler, Donald B

2004-06-01

The Slosson Written Expression Test was designed to assess students ages 8-17 years at risk for difficulties in written expression. Scores from three independent samples were used to evaluate the test's reliability and validity for measuring students' written expression. Test-retest reliability of the SWET subscales ranged from .80 to .94 (n = 151), and .95 for the Written Expression Total Standard Scores. The median alternate-form reliability for students' Written Expression Total Standard Scores was .81 across the three forms. Scores on the Slosson test yielded concurrent validity coefficients (n = 143) of .60 with scores from the Woodcock-Johnson: Tests of Achievement-Third Edition Broad Written Language Domain and .49 with scores on the Test of Written Language-Third Edition Spontaneous Writing Quotient. Exploratory factor analytic procedures suggested the Slosson test is comprised of two dimensions, Writing Mechanics and Writing Maturity (47.1% and 20.1% variance accounted for, respectively). In general, the Slosson Written Expression Test presents with sufficient technical characteristics to be considered a useful written expression screening test.
The Relationship between Deductive Reasoning Ability, Test Anxiety, and Standardized Test Scores in a Latino Sample

ERIC Educational Resources Information Center

Rich, John D., Jr.; Fullard, William; Overton, Willis

2011-01-01

One Hundred and Twelve Latino students from Philadelphia participated in this study, which examined the development of deductive reasoning across adolescence, and the relation of reasoning to test anxiety and standardized test scores. As predicted, 11th and ninth graders demonstrated significantly more advanced reasoning than seventh graders.…
Utilizing the Six Realms of Meaning in Improving Campus Standardized Test Scores through Team Teaching and Strategic Planning

ERIC Educational Resources Information Center

Stevenson, Rosnisha D.; Kritsonis, William Allan

2009-01-01

This article will seek to utilize Dr. William Allan Kritsonis' book "Ways of Knowing Through the Realms of Meaning" (2007) as a framework to improve a campus's standardized test scores, more specifically, their TAKS (Texas Assessment of Knowledge and Skills) scores. Many campuses have an improvement plan, also known as a Campus…
Ethnic identity, school connectedness, and achievement in standardized tests among Mexican-origin youth.

PubMed

Santos, Carlos E; Collins, Mary Ann

2016-07-01

The aim of this study was to investigate the association between school connectedness and performance in standardized test scores and whether this association was moderated by ethnic private regard. The study combines self-report data with school district reported data on standardized test scores in reading and math and free and reduced lunch status. Participants included 436 Mexican-origin youth attending a middle school in a southwestern U.S. state. Participants were on average 12.34 years of age (SD = .95) and 51.8% female and 48.2% male. After controlling for age, gender, free and reduced lunch status, and generational status, school connectedness and ethnic private regard were both positive predictors of standardized test scores in reading and math. Results also revealed a significant interaction between school connectedness and ethnic private regard in predicting standardized test scores in reading, such that participants who were low on ethnic private regard and low on school connectedness reported lower levels of achievement compared to participants who were low on ethnic private regard but high on school connectedness. At high levels of ethnic private regard, high or low levels of school connectedness were not associated with higher or lower standardized test scores in reading. The findings in this study provide support for the protective role that ethnic private regard plays in the educational experiences of Mexican-origin youth and highlights how the local school context may play a role in shaping this finding. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Standardized Testing Practices: Effect on Graduation and NCLEX® Pass Rates.

PubMed

Randolph, Pamela K

The use standardized testing in pre-licensure nursing programs has been accompanied by conflicting reports of effective practices. The purpose of this project was to describe standardized testing practices in one states' nursing programs and discover if the use of a cut score or oversight of remediation had any effect on (a) first time NCLEX® pass rates, (b) on-time graduation (OTG) or (c) the combination of (a) and (b). Administrators of 38 nursing programs in one Southwest state were sent surveys; surveys were returned by 34 programs (89%). Survey responses were compared to each program's NCLEX pass rate and on-time graduation rate; t-tests were conducted for significant differences associated with a required minimum score (cut score) and oversight of remediation. There were no significant differences in NCLEX pass or on-time graduation rates related to establishment of a cut score. There was a significant difference when the NCLEX pass rate and on-time graduation rate were combined (Outcome Index "OI") with significantly higher program outcomes (P=.02.) for programs without cut-scores. There were no differences associated with faculty oversight of remediation. The results of this study do not support establishment of a cut-score when implementing a standardized testing. Copyright © 2016. Published by Elsevier Inc.
The Relationship between Mathematics Achievement and Socio-Economic Status

ERIC Educational Resources Information Center

Hernandez, Marilys

2014-01-01

This study investigated the relationship between the mathematics scores of public middle school students in Miami-Dade County on Florida's standardized test, the Florida Comprehensive Assessment Test (FCAT) 2.0, and students' socio-economic status. The study found that SES had a strong correlation with the standardized test mathematics scores (r =…

Do Test Scores Buy Happiness?

ERIC Educational Resources Information Center

McCluskey, Neal

2017-01-01

Since at least the enactment of No Child Left Behind in 2002, standardized test scores have served as the primary measures of public school effectiveness. Yet, such scores fail to measure the ultimate goal of education: maximizing happiness. This exploratory analysis assesses nation level associations between test scores and happiness, controlling…
Testing Intelligently Includes Double-Checking Wechsler IQ Scores

ERIC Educational Resources Information Center

Kuentzel, Jeffrey G.; Hetterscheidt, Lesley A.; Barnett, Douglas

2011-01-01

The rigors of standardized testing make for numerous opportunities for examiner error, including simple computational mistakes in scoring. Although experts recommend that test scoring be double-checked, the extent to which independent double-checking would reduce scoring errors is not known. A double-checking procedure was established at a…
A Comparison of Standardized Achievement Test Scores on Right and Left Brain Dominant Fourth-Grade Students.

ERIC Educational Resources Information Center

Bell, Michael L.; Roubinek, Darrell L.

1989-01-01

Compares fourth-graders' subtest scores on the Stanford Achievement Test (SAT), the Iowa Test of Basic Skills (ITBS), and the Metropolitan Achievement Test (MAT). Finds right-brain dominant students scored better on four SAT subtests, and left-brain dominant students scored better on four ITBS subtests and two MAT subtests. (NH)
Adults with poor reading skills: How lexical knowledge interacts with scores on standardized reading comprehension tests

PubMed Central

McKoon, Gail; Ratcliff, Roger

2016-01-01

Millions of adults in the United States lack the necessary literacy skills for most living wage jobs. For students from adult learning classes, we used a lexical decision task to measure their knowledge of words and we used a decision-making model (Ratcliff’s, 1978, diffusion model) to abstract the mechanisms underlying their performance from their RTs and accuracy. We also collected scores for each participant on standardized IQ tests and standardized reading tests used commonly in the education literature. We found significant correlations between the model’s estimates of the strengths with which words are represented in memory and scores for some of the standardized tests but not others. The findings point to the feasibility and utility of combining a test of word knowledge, lexical decision, that is well-established in psycholinguistic research, a decision-making model that supplies information about underlying mechanisms, and standardized tests. The goal for future research is to use this combination of approaches to understand better how basic processes relate to standardized tests with the eventual aim of understanding what these tests are measuring and what the specific difficulties are for individual, low-literacy adults. PMID:26550803
Adults with poor reading skills: How lexical knowledge interacts with scores on standardized reading comprehension tests.

PubMed

McKoon, Gail; Ratcliff, Roger

2016-01-01

Millions of adults in the United States lack the necessary literacy skills for most living wage jobs. For students from adult learning classes, we used a lexical decision task to measure their knowledge of words and we used a decision-making model (Ratcliff's, 1978, diffusion model) to abstract the mechanisms underlying their performance from their RTs and accuracy. We also collected scores for each participant on standardized IQ tests and standardized reading tests used commonly in the education literature. We found significant correlations between the model's estimates of the strengths with which words are represented in memory and scores for some of the standardized tests but not others. The findings point to the feasibility and utility of combining a test of word knowledge, lexical decision, that is well-established in psycholinguistic research, a decision-making model that supplies information about underlying mechanisms, and standardized tests. The goal for future research is to use this combination of approaches to understand better how basic processes relate to standardized tests with the eventual aim of understanding what these tests are measuring and what the specific difficulties are for individual, low-literacy adults. Copyright © 2015. Published by Elsevier B.V.
The Score-Boosting Game.

ERIC Educational Resources Information Center

Popham, W. James

2000-01-01

Teachers everywhere are playing the score-boosting game to raise scores on mandated standardized achievement tests, although five nationally recognized assessments compare student performance instead of measuring classroom learning. Since curriculum standards are often vague and misaligned with assessments, teachers sprinkle instruction with…
Cognitive Skills, Student Achievement Tests, and Schools

PubMed Central

Finn, Amy S.; Kraft, Matthew A.; West, Martin R.; Leonard, Julia A.; Bish, Crystal E.; Martin, Rebecca E.; Sheridan, Margaret A.; Gabrieli, Christopher F. O.; Gabrieli, John D. E.

2014-01-01

Cognitive skills predict academic performance, so schools that improve academic performance might also improve cognitive skills. To investigate the impact schools have on both academic performance and cognitive skills, we related standardized achievement test scores to measures of cognitive skills in a large sample (N=1,367) of 8th-grade students attending traditional, exam, and charter public schools. Test scores and gains in test scores over time correlated with measures of cognitive skills. Despite wide variation in test scores across schools, differences in cognitive skills across schools were negligible after controlling for 4th-grade test scores. Random offers of enrollment to over-subscribed charter schools resulted in positive impacts of such school attendance on math achievement, but had no impact on cognitive skills. These findings suggest that schools that improve standardized achievement tests do so primarily through channels other than cognitive skills. PMID:24434238
Examining the Validity of GED[R] Tests Scores with Scheduling and Setting Accommodations. GED Testing Service Research Studies, 2004-1

ERIC Educational Resources Information Center

George-Ezzelle, Carol E.; Skaggs, Gary

2004-01-01

Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…
The Perceptions of Standardized Tests, Academic Self-Efficacy, and Academic Performance of African American Graduate Students: a Correlational and Comparative Analysis

ERIC Educational Resources Information Center

Marrah, Arleezah K.

2012-01-01

The academic performance of African American students continues to be a concern for educators, researchers, and most importantly their community. This issue is particularly prevalent in the standardized test scores of African American students where they score on average one or more standard deviations below their Caucasian and Asian American…
The Impact of Stability Balls, Activity Breaks, and a Sedentary Classroom on Standardized Math Scores

ERIC Educational Resources Information Center

Mead, Tim; Scibora, Lesley

2016-01-01

The purpose of the study was to determine if standardized math test scores improve by administering different types of exercise during math instruction. Three sixth grade classes were assessed on the Measures of Academic Progress (MAP) and the Minnesota Comprehensive Assessment (MCA) standardized math tests during the 2012 and 2013 academic year.…
Standardized Testing of Special Education Students: A Comparison of Service Type and Test Scores

ERIC Educational Resources Information Center

Hogan-Young, Christine

2013-01-01

The purpose of this study was to determine if there was a difference in Tennessee Comprehensive Assessment Program Modified Academic Achievement Standards (TCAP MAAS) achievement test scores for special education students who receive their instruction in the resource classroom or in an inclusion classroom. The study involved third, fourth, and…
Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method

ERIC Educational Resources Information Center

Liu, Yuming; Schulz, E. Matthew; Yu, Lei

2008-01-01

A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
The Uses and Misuses of Test Scores: Technical Assistance Perspective.

ERIC Educational Resources Information Center

Echternacht, Gary

The uses and misuses of standardized test results used for program evaluation as seen by a staff member of an Elementary Secondary Education Act (ESEA) Title I Technical Assistance Center are described. In ESEA Title I, test scores are used to select students for the program. Although federal requirements do not require using standardized test…
Effects of handcuffs on neuropsychological testing: Implications for criminal forensic evaluations.

PubMed

Biddle, Christine M; Fazio, Rachel L; Dyshniku, Fiona; Denney, Robert L

2018-01-01

Neuropsychological evaluations are increasingly performed in forensic contexts, including in criminal settings where security sometimes cannot be compromised to facilitate evaluation according to standardized procedures. Interpretation of nonstandardized assessment results poses significant challenges for the neuropsychologist. Research is limited in regard to the validation of neuropsychological test accommodation and modification practices that deviate from standard test administration; there is no published research regarding the effects of hand restraints upon neuropsychological evaluation results. This study provides preliminary results regarding the impact of restraints on motor functioning and common neuropsychological tests with a motor component. When restrained, performance on nearly all tests utilized was significantly impacted, including Trail Making Test A/B, a coding test, and several tests of motor functioning. Significant performance decline was observed in both raw scores and normative scores. Regression models are also provided in order to help forensic neuropsychologists adjust for the effect of hand restraints on raw scores of these tests, as the hand restraints also resulted in significant differences in normative scores; in the most striking case there was nearly a full standard deviation of discrepancy.
An Argument against Using Standardized Test Scores for Placement of International Undergraduate Students in English as a Second Language (ESL) Courses

ERIC Educational Resources Information Center

Kokhan, Kateryna

2013-01-01

Development and administration of institutional ESL placement tests require a great deal of financial and human resources. Due to a steady increase in the number of international students studying in the United States, some US universities have started to consider using standardized test scores for ESL placement. The English Placement Test (EPT)…
Development of an Itemwise Efficiency Scoring Method: Concurrent, Convergent, Discriminant, and Neuroimaging-Based Predictive Validity Assessed in a Large Community Sample

PubMed Central

Moore, Tyler M.; Reise, Steven P.; Roalf, David R.; Satterthwaite, Theodore D.; Davatzikos, Christos; Bilker, Warren B.; Port, Allison M.; Jackson, Chad T.; Ruparel, Kosha; Savitt, Adam P.; Baron, Robert B.; Gur, Raquel E.; Gur, Ruben C.

2016-01-01

Traditional “paper-and-pencil” testing is imprecise in measuring speed and hence limited in assessing performance efficiency, but computerized testing permits precision in measuring itemwise response time. We present a method of scoring performance efficiency (combining information from accuracy and speed) at the item level. Using a community sample of 9,498 youths age 8-21, we calculated item-level efficiency scores on four neurocognitive tests, and compared the concurrent, convergent, discriminant, and predictive validity of these scores to simple averaging of standardized speed and accuracy-summed scores. Concurrent validity was measured by the scores' abilities to distinguish men from women and their correlations with age; convergent and discriminant validity were measured by correlations with other scores inside and outside of their neurocognitive domains; predictive validity was measured by correlations with brain volume in regions associated with the specific neurocognitive abilities. Results provide support for the ability of itemwise efficiency scoring to detect signals as strong as those detected by standard efficiency scoring methods. We find no evidence of superior validity of the itemwise scores over traditional scores, but point out several advantages of the former. The itemwise efficiency scoring method shows promise as an alternative to standard efficiency scoring methods, with overall moderate support from tests of four different types of validity. This method allows the use of existing item analysis methods and provides the convenient ability to adjust the overall emphasis of accuracy versus speed in the efficiency score, thus adjusting the scoring to the real-world demands the test is aiming to fulfill. PMID:26866796
Using Reading Rate and Comprehension CBM to Predict High-Stakes Achievement

ERIC Educational Resources Information Center

Miller, Kelli Caldwell; Bell, Sherry Mee; McCallum, R. Steve

2015-01-01

Because of the increased emphasis on standardized testing results, scores from a high-stakes, end-of-year test (Tennessee Comprehensive Assessment Program [TCAP] Reading Composite) were used as the standard against which scores from a group-administered, curriculum-based measure (CBM), Monitoring Instructional Responsiveness: Reading (MIR:R), were…
Background Variables, Levels of Aggregation, and Standardized Test Scores

ERIC Educational Resources Information Center

Paulson, Sharon E.; Marchant, Gregory J.

2009-01-01

This article examines the role of student demographic characteristics in standardized achievement test scores at both the individual level and aggregated at the state, district, school levels. For several data sets, the majority of the variance among states, districts, and schools was related to demographic characteristics. Where these background…
Student Laptop Use and Scores on Standardized Tests

ERIC Educational Resources Information Center

Kposowa, Augustine J.; Valdez, Amanda D.

2013-01-01

Objectives: The primary objective of the study was to investigate the relationship between ubiquitous laptop use and academic achievement. It was hypothesized that students with ubiquitous laptops would score on average higher on standardized tests than those without such computers. Methods: Data were obtained from two sources. First, demographic…
Evaluating the Rank-Ordering Method for Standard Maintaining

ERIC Educational Resources Information Center

Bramley, Tom; Gill, Tim

2010-01-01

The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…

Using Heteroskedastic Ordered Probit Models to Recover Moments of Continuous Test Score Distributions from Coarsened Data

ERIC Educational Resources Information Center

Reardon, Sean F.; Shear, Benjamin R.; Castellano, Katherine E.; Ho, Andrew D.

2017-01-01

Test score distributions of schools or demographic groups are often summarized by frequencies of students scoring in a small number of ordered proficiency categories. We show that heteroskedastic ordered probit (HETOP) models can be used to estimate means and standard deviations of multiple groups' test score distributions from such data. Because…
Measuring the Outcome of At-Risk Students on Biology Standardized Tests When Using Different Instructional Strategies

NASA Astrophysics Data System (ADS)

Burns, Dana

Over the last two decades, online education has become a popular concept in universities as well as K-12 education. This generation of students has grown up using technology and has shown interest in incorporating technology into their learning. The idea of using technology in the classroom to enhance student learning and create higher achievement has become necessary for administrators, teachers, and policymakers. Although online education is a popular topic, there has been minimal research on the effectiveness of online and blended learning strategies compared to the student learning in a traditional K-12 classroom setting. The purpose of this study was to investigate differences in standardized test scores from the Biology End of Course exam when at-risk students completed the course using three different educational models: online format, blended learning, and traditional face-to-face learning. Data was collected from over 1,000 students over a five year time period. Correlation analyzed data from standardized tests scores of eighth grade students was used to define students as "at-risk" for failing high school courses. The results indicated a high correlation between eighth grade standardized test scores and Biology End of Course exam scores. These students were deemed "at-risk" for failing high school courses. Standardized test scores were measured for the at-risk students when those students completed Biology in the different models of learning. Results indicated significant differences existed among the learning models. Students had the highest test scores when completing Biology in the traditional face-to-face model. Further evaluation of subgroup populations indicated statistical differences in learning models for African-American populations, female students, and for male students.
49 CFR 383.131 - Test procedures.

Code of Federal Regulations, 2010 CFR

2010-10-01

... provide standardized scoring sheets for the skills tests, as well as standardized driving instructions for... 49 Transportation 5 2010-10-01 2010-10-01 false Test procedures. 383.131 Section 383.131... STANDARDS; REQUIREMENTS AND PENALTIES Tests § 383.131 Test procedures. (a) Driver information manuals...
Associations Between United States Medical Licensing Examination (USMLE) and Internal Medicine In-Training Examination (IM-ITE) Scores

PubMed Central

Zeger, Scott L.; Kolars, Joseph C.

2008-01-01

Background Little is known about the associations of previous standardized examination scores with scores on subsequent standardized examinations used to assess medical knowledge in internal medicine residencies. Objective To examine associations of previous standardized test scores on subsequent standardized test scores. Design Retrospective cohort study. Participants One hundred ninety-five internal medicine residents. Methods Bivariate associations of United States Medical Licensing Examination (USMLE) Steps and Internal Medicine In-Training Examination (IM-ITE) scores were determined. Random effects analysis adjusting for repeated administrations of the IM-ITE and other variables known or hypothesized to affect IM-ITE score allowed for discrimination of associations of individual USMLE Step scores on IM-ITE scores. Results In bivariate associations, USMLE scores explained 17% to 27% of the variance in IME-ITE scores, and previous IM-ITE scores explained 66% of the variance in subsequent IM-ITE scores. Regression coefficients (95% CI) for adjusted associations of each USMLE Step with IM-ITE scores were USMLE-1 0.19 (0.12, 0.27), USMLE-2 0.23 (0.17, 0.30), and USMLE-3 0.19 (0.09, 0.29). Conclusions No single USMLE Step is more strongly associated with IM-ITE scores than the others. Because previous IM-ITE scores are strongly associated with subsequent IM-ITE scores, appropriate modeling, such as random effects methods, should be used to account for previous IM-ITE administrations in studies for which IM-ITE score is an outcome. PMID:18612735
Associations between United States Medical Licensing Examination (USMLE) and Internal Medicine In-Training Examination (IM-ITE) scores.

PubMed

McDonald, Furman S; Zeger, Scott L; Kolars, Joseph C

2008-07-01

Little is known about the associations of previous standardized examination scores with scores on subsequent standardized examinations used to assess medical knowledge in internal medicine residencies. To examine associations of previous standardized test scores on subsequent standardized test scores. Retrospective cohort study. One hundred ninety-five internal medicine residents. Bivariate associations of United States Medical Licensing Examination (USMLE) Steps and Internal Medicine In-Training Examination (IM-ITE) scores were determined. Random effects analysis adjusting for repeated administrations of the IM-ITE and other variables known or hypothesized to affect IM-ITE score allowed for discrimination of associations of individual USMLE Step scores on IM-ITE scores. In bivariate associations, USMLE scores explained 17% to 27% of the variance in IME-ITE scores, and previous IM-ITE scores explained 66% of the variance in subsequent IM-ITE scores. Regression coefficients (95% CI) for adjusted associations of each USMLE Step with IM-ITE scores were USMLE-1 0.19 (0.12, 0.27), USMLE-2 0.23 (0.17, 0.30), and USMLE-3 0.19 (0.09, 0.29). No single USMLE Step is more strongly associated with IM-ITE scores than the others. Because previous IM-ITE scores are strongly associated with subsequent IM-ITE scores, appropriate modeling, such as random effects methods, should be used to account for previous IM-ITE administrations in studies for which IM-ITE score is an outcome.
The Effect of Four Intervention Programs on Standardized Test Scores by Gender

ERIC Educational Resources Information Center

Cryder, Rebecca E.

2012-01-01

This quantitative correlational study involved the analysis, by gender, of the effect of four intervention programs at an Arizona middle school as seen on Arizona's Instrument to Measure Standards (AIMS) test scores. These four intervention programs included: Advancement Via Individual Determination (AVID), a planner stamping system, a World…
Self-Esteem, Locus of Control, and Student Achievement.

ERIC Educational Resources Information Center

Sterbin, Allan; Rakow, Ernest

The direct effects of locus of control and self-esteem on standardized test scores were studied. The relationships among the standardized test scores and measures of locus of control and self-esteem for 12,260 students from the National Education Longitudinal Study 1994 database were examined, using the same definition of locus of control and…
Linking School Goals and Learning Standards to Teacher Evaluation and Compensation.

ERIC Educational Resources Information Center

Mathis, William J.

It is possible to tie teacher compensation to professional growth, without reference to standardized test scores. Tying pay to students' achievement scores does not account for the different levels of students, and teacher testing does not separate good teachers from bad. In Rutland Northeast, Vermont, each school has its own locally elected…
Improving IQ measurement in intellectual disabilities using true deviation from population norms

PubMed Central

2014-01-01

Background Intellectual disability (ID) is characterized by global cognitive deficits, yet the very IQ tests used to assess ID have limited range and precision in this population, especially for more impaired individuals. Methods We describe the development and validation of a method of raw z-score transformation (based on general population norms) that ameliorates floor effects and improves the precision of IQ measurement in ID using the Stanford Binet 5 (SB5) in fragile X syndrome (FXS; n = 106), the leading inherited cause of ID, and in individuals with idiopathic autism spectrum disorder (ASD; n = 205). We compared the distributional characteristics and Q-Q plots from the standardized scores with the deviation z-scores. Additionally, we examined the relationship between both scoring methods and multiple criterion measures. Results We found evidence that substantial and meaningful variation in cognitive ability on standardized IQ tests among individuals with ID is lost when converting raw scores to standardized scaled, index and IQ scores. Use of the deviation z- score method rectifies this problem, and accounts for significant additional variance in criterion validation measures, above and beyond the usual IQ scores. Additionally, individual and group-level cognitive strengths and weaknesses are recovered using deviation scores. Conclusion Traditional methods for generating IQ scores in lower functioning individuals with ID are inaccurate and inadequate, leading to erroneously flat profiles. However assessment of cognitive abilities is substantially improved by measuring true deviation in performance from standardization sample norms. This work has important implications for standardized test development, clinical assessment, and research for which IQ is an important measure of interest in individuals with neurodevelopmental disorders and other forms of cognitive impairment. PMID:26491488
Improving IQ measurement in intellectual disabilities using true deviation from population norms.

PubMed

Sansone, Stephanie M; Schneider, Andrea; Bickel, Erika; Berry-Kravis, Elizabeth; Prescott, Christina; Hessl, David

2014-01-01

Intellectual disability (ID) is characterized by global cognitive deficits, yet the very IQ tests used to assess ID have limited range and precision in this population, especially for more impaired individuals. We describe the development and validation of a method of raw z-score transformation (based on general population norms) that ameliorates floor effects and improves the precision of IQ measurement in ID using the Stanford Binet 5 (SB5) in fragile X syndrome (FXS; n = 106), the leading inherited cause of ID, and in individuals with idiopathic autism spectrum disorder (ASD; n = 205). We compared the distributional characteristics and Q-Q plots from the standardized scores with the deviation z-scores. Additionally, we examined the relationship between both scoring methods and multiple criterion measures. We found evidence that substantial and meaningful variation in cognitive ability on standardized IQ tests among individuals with ID is lost when converting raw scores to standardized scaled, index and IQ scores. Use of the deviation z- score method rectifies this problem, and accounts for significant additional variance in criterion validation measures, above and beyond the usual IQ scores. Additionally, individual and group-level cognitive strengths and weaknesses are recovered using deviation scores. Traditional methods for generating IQ scores in lower functioning individuals with ID are inaccurate and inadequate, leading to erroneously flat profiles. However assessment of cognitive abilities is substantially improved by measuring true deviation in performance from standardization sample norms. This work has important implications for standardized test development, clinical assessment, and research for which IQ is an important measure of interest in individuals with neurodevelopmental disorders and other forms of cognitive impairment.
The Relationship between Mean Square Differences and Standard Error of Measurement: Comment on Barchard (2012)

ERIC Educational Resources Information Center

Pan, Tianshu; Yin, Yue

2012-01-01

In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…
Ethnic differences in the Goodenough-Harris draw-a-man and draw-a-woman tests.

PubMed

Dugdale, A E; Chen, S T

1979-11-01

The draw-a-man (DAM) and draw-a-woman (DAW) tests were given to 307 schoolchildren in Petaling Jaya, Malaysia. The children were ethnically Malay, Chinese, or Indian (Tamil), and all came from lower socioeconomic groups. The standard scores of the Chinese children averaged 118 in the DAM and 112 in the DAW tests. These scores were significantly better than the American standards. Malay children scored significantly lower than Chinese, and Tamil children scored lower again. The nutritional status of the children had no influence on the scores. Chinese and Tamil children scored better in the DAM than the DAW, while in Malay boys the reverse was true. Malay children tended to emphasise clothing in the DAM, but Chinese and Tamil children scored better on items relating to facial features and body proportions. The Goodenough-Harris draw-a-person tests are obviously not culture-free, but the causes of ethnic differences have not been elucidated.
An approach to analyzing a single subject's scores obtained in a standardized test with application to the Aachen Aphasia Test (AAT).

PubMed

Willmes, K

1985-08-01

Methods for the analysis of a single subject's test profile(s) proposed by Huber (1973) are applied to the Aachen Aphasia Test (AAT). The procedures are based on the classical test theory model (Lord & Novick, 1968) and are suited for any (achievement) test with standard norms from a large standardization sample and satisfactory reliability estimates. Two test profiles of a Wernicke's aphasic, obtained before and after a 3-month period of speech therapy, are analyzed using inferential comparisons between (groups of) subtest scores on one test application and between two test administrations for single (groups of) subtests. For each of these comparisons, the two aspects of (i) significant (reliable) differences in performance beyond measurement error and (ii) the diagnostic validity of that difference in the reference population of aphasic patients are assessed. Significant differences between standardized subtest scores and a remarkably better preserved reading and writing ability could be found for both test administrations using the multiple test procedure of Holm (1979). Comparison of both profiles revealed an overall increase in performance for each subtest as well as changes in level of performance relations between pairs of subtests.
Kindergarten Predictors of Math Learning Disability

PubMed Central

Mazzocco, Michèle M. M.; Thompson, Richard E.

2009-01-01

The aim of the present study was to address how to effectively predict mathematics learning disability (MLD). Specifically, we addressed whether cognitive data obtained during kindergarten can effectively predict which children will have MLD in third grade, whether an abbreviated test battery could be as effective as a standard psychoeducational assessment at predicting MLD, and whether the abbreviated battery corresponded to the literature on MLD characteristics. Participants were 226 children who enrolled in a 4-year prospective longitudinal study during kindergarten. We administered measures of mathematics achievement, formal and informal mathematics ability, visual-spatial reasoning, and rapid automatized naming and examined which test scores and test items from kindergarten best predicted MLD at grades 2 and 3. Statistical models using standardized scores from the entire test battery correctly classified ~80–83 percent of the participants as having, or not having, MLD. Regression models using scores from only individual test items were less predictive than models containing the standard scores, except for models using a specific subset of test items that dealt with reading numerals, number constancy, magnitude judgments of one-digit numbers, or mental addition of one-digit numbers. These models were as accurate in predicting MLD as was the model including the entire set of standard scores from the battery of tests examined. Our findings indicate that it is possible to effectively predict which kindergartners are at risk for MLD, and thus the findings have implications for early screening of MLD. PMID:20084182
Diagnostic Accuracy, Sensitivity, and Specificity of Executive Function Tests in Moderate Traumatic Brain Injury in Ghana.

PubMed

Adjorlolo, Samuel

2018-06-01

The sociocultural differences between Western and sub-Saharan African countries make it imperative to standardize neuropsychological tests in the latter. However, Western-normed tests are frequently administered in sub-Saharan Africa because of challenges hampering standardization efforts. Yet a salient topical issue in the cross-cultural neuropsychology literature relates to the utility of Western-normed neuropsychological tests in minority groups, non-Caucasians, and by extension Ghanaians. Consequently, this study investigates the diagnostic accuracy, sensitivity, and specificity of executive function (EF) tests (The Stroop Test, Trail Making Test, and Controlled Oral Word Association Test), and a Revised Quick Cognitive Screening Test (RQCST) in a sample of 50 patients diagnosed with moderate traumatic brain injury and 50 healthy controls in Ghana. The EF test scores showed good diagnostic accuracy, with area under the curve (AUC) values of the Trail Making Test scores ranging from .746 to .902. With respect to the Stroop Test scores, the AUC values ranged from .793 to .898, while Controlled Oral Word Association Test had AUC value of .787. The RQCST scores discriminated between the groups, with AUC values ranging from .674 to .912. The AUC values of composite EF score and a neuropsychological score created from EF and RQCST scores were .936 and. 942, respectively. Additionally, the Stroop Test, Trail Making Test, EF composite score, and RQCST scores showed good to excellent sensitivities and specificities. In general, this study has shown that commonly used EF tests in Western countries have diagnostic accuracy, sensitivity, and specificity when administered in Ghanaian samples. The findings and implications of the study are discussed.
Posturography and locomotor tests of dynamic balance after long-duration spaceflight.

PubMed

Cohen, Helen S; Kimball, Kay T; Mulavara, Ajitkumar P; Bloomberg, Jacob J; Paloski, William H

2012-01-01

The currently approved objective clinical measure of standing balance in astronauts after space flight is the Sensory Organization Test battery of computerized dynamic posturography. No tests of walking balance are currently approved for standard clinical testing of astronauts. This study determined the sensitivity and specificity of standing and walking balance tests for astronauts before and after long-duration space flight. Astronauts were tested on an obstacle avoidance test known as the Functional Mobility Test (FMT) and on the Sensory Organization Test using sway-referenced support surface motion with eyes closed (SOT 5) before and six months after (n=15) space flight on the International Space Station. They were tested two to seven days after landing. Scores on SOT tests decreased and scores on FMT increased significantly from pre- to post-flight. In other words, post-flight scores were worse than pre-flight scores. SOT and FMT scores were not significantly related. ROC analyses indicated supra-clinical cut-points for SOT 5 and for FMT. The standard clinical cut-point for SOT 5 had low sensitivity to post-flight astronauts. Higher cut-points increased sensitivity to post-flight astronauts but decreased specificity to pre-flight astronauts. Using an FMT cut-point that was moderately highly sensitive and highly specific plus SOT 5 at the standard clinical cut-point was no more sensitive than SOT 5, alone. FMT plus SOT 5 at higher cut-points was more specific and more sensitive. The total correctly classified was highest for FMT, alone, and for FMT plus SOT 5 at the highest cut-point. These findings indicate that standard clinical comparisons are not useful for identifying problems. Testing both standing and walking balance will be more likely to identify balance deficits.
Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

PubMed

Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

2015-08-01

The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
The effect of human immunodeficiency virus type 1 antibody status on military applicant aptitude test scores.

PubMed

Arday, D R; Brundage, J F; Gardner, L I; Goldenbaum, M; Wann, F; Wright, S

1991-06-15

The authors conducted a population-based study to attempt to estimate the effect of human immunodeficiency virus type 1 (HIV-1) seropositivity on Armed Services Vocational Aptitude Battery test scores in otherwise healthy individuals with early HIV-1 infection. The Armed Services Vocational Aptitude Battery is a 10-test written multiple aptitude battery administered to all civilian applicants for military enlistment prior to serologic screening for HIV-1 antibodies. A total of 975,489 induction testing records containing both Armed Services Vocational Aptitude Battery and HIV-1 results from October 1985 through March 1987 were examined. An analysis data set (n = 7,698) was constructed by choosing five controls for each of the 1,283 HIV-1-positive cases, matched on five-digit ZIP code, and a multiple linear regression analysis was performed to control for demographic and other factors that might influence test scores. Years of education was the strongest predictor of test scores, raising an applicant's score on a composite test nearly 0.16 standard deviation per year. The HIV-1-positive effect on the composite score was -0.09 standard deviation (99% confidence interval -0.17 to -0.02). Separate regressions on each component test within the battery showed HIV-1 effects between -0.39 and +0.06 standard deviation. The two Armed Services Vocational Aptitude Battery component tests felt a priori to be the most sensitive to HIV-1-positive status showed the least decrease with seropositivity. Much of the variability in test scores was not predicted by either HIV-1 serostatus or the demographic and other factors included in the model. There appeared to be little evidence of a strong HIV-1 effect.
The Use of Tests in Admissions to Higher Education.

ERIC Educational Resources Information Center

Fruen, Mary

1978-01-01

There are both strengths and weaknesses of using standardized test scores as a criterion for admission to institutions of higher education. The relative importance of scores is dependent on the institution's degree of selectivity. In general, decision processes and admissions criteria are not well defined. Advantages of test scores include: use of…
A Bad Idea: National Standards Based on Test Scores

ERIC Educational Resources Information Center

Baker, Keith

2010-01-01

The justification for national standards is that test scores predict a nation's future economic success. There is no evidence that supports this assumption. There is evidence that it is wrong. For more than half a century, reformers have been trying to fix our schools with little success. The obvious conclusion is that something that can't be…

Comprehensive School Reform and Standardized Test Scores in Illinois Elementary and Middle Schools

ERIC Educational Resources Information Center

McEnroe, James D.

2010-01-01

The study examined the effects of the federally funded Comprehensive School Reform (CSR) program on student performance on mandated standardized tests. The study focused on the mathematics and reading scores of Illinois public elementary and middle and junior high school students. The federal CSR program provided Illinois schools with an annual…
A Regression Model Approach to First-Year Honors Program Admissions Serving a High-Minority Population

ERIC Educational Resources Information Center

Rhea, David M.

2017-01-01

Many honors programs make admissions decisions based on student high school GPA and a standardized test score. However, McKay argued that standardized test scores can be a barrier to honors program participation, particularly for minority students. Minority students, particularly Hispanic and African American students, are apt to have lower…
End of Course Grades and Standardized Test Scores: Are Grades Predictive of Student Achievement?

ERIC Educational Resources Information Center

Ricketts, Christine R.

2010-01-01

This study examined the extent to which end-of-course grades are predictive of Virginia Standards of Learning test scores in nine high school content areas. It also analyzed the impact of the variables school cluster attended, gender, ethnicity, disability status, Limited English Proficiency status, and socioeconomic status on the relationship…
Integrating GIS in the Middle School Curriculum: Impacts on Diverse Students' Standardized Test Scores

ERIC Educational Resources Information Center

Goldstein, Donna; Alibrandi, Marsha

2013-01-01

This case study conducted with 1,425 middle school students in Palm Beach County, Florida, included a treatment group receiving GIS instruction (256) and a control group without GIS instruction (1,169). Quantitative analyses on standardized test scores indicated that inclusion of GIS in middle school curriculum had a significant effect on student…
Falling Behind: New Evidence on the Black-White Achievement Gap

ERIC Educational Resources Information Center

Levitt, Steven D.; Fryer, Roland G.

2004-01-01

On average, black students typically score one standard deviation below white students on standardized tests--roughly the difference in performance between the average 4th grader and the average 8th grader. Historically, what has come to be known as the black-white test-score gap has emerged before children enter kindergarten and has tended to…
The Fight's Not Always Fixed: Using Literary Response to Transcend Standardized Test Scores

ERIC Educational Resources Information Center

Avila, JuliAnna

2012-01-01

In 2004, the National Endowment for the Arts (NEA) concluded that "literature reading is fading as a meaningful activity, especially among younger people." How can educators continue to teach students about the power of literary response when the priority is for them to achieve proficiency on standardized tests, whose scores can only be narrowly…
A Program Evaluation of a Seminar Program

ERIC Educational Resources Information Center

Lyons, Robert Dale, Jr.

2013-01-01

When students do not score well on standardized tests, their school can suffer. In an attempt to improve standardized test scores, a district placed students into a program called Seminar to help them work on weak areas of content through personalized instruction. The purpose of this project study was to assess if the Seminar program had a…
Does Television Rot Your Brain? New Evidence from the Coleman Study. NBER Working Paper No. 12021

ERIC Educational Resources Information Center

Gentzkow, Matthew; Shapiro, Jesse M.

2006-01-01

We use heterogeneity in the timing of television's introduction to different local markets to identify the effect of preschool television exposure on standardized test scores later in life. Our preferred point estimate indicates that an additional year of preschool television exposure raises average test scores by about .02 standard deviations. We…
Development of a reliable simulation-based test for diagnostic abdominal ultrasound with a pass/fail standard usable for mastery learning.

PubMed

Østergaard, Mia L; Nielsen, Kristina R; Albrecht-Beste, Elisabeth; Konge, Lars; Nielsen, Michael B

2018-01-01

This study aimed to develop a test with validity evidence for abdominal diagnostic ultrasound with a pass/fail-standard to facilitate mastery learning. The simulator had 150 real-life patient abdominal scans of which 15 cases with 44 findings were selected, representing level 1 from The European Federation of Societies for Ultrasound in Medicine and Biology. Four groups of experience levels were constructed: Novices (medical students), trainees (first-year radiology residents), intermediates (third- to fourth-year radiology residents) and advanced (physicians with ultrasound fellowship). Participants were tested in a standardized setup and scored by two blinded reviewers prior to an item analysis. The item analysis excluded 14 diagnoses. Both internal consistency (Cronbach's alpha 0.96) and inter-rater reliability (0.99) were good and there were statistically significant differences (p < 0.001) between all four groups, except the intermediate and advanced groups (p = 1.0). There was a statistically significant correlation between experience and test scores (Pearson's r = 0.82, p < 0.001). The pass/fail-standard failed all novices (no false positives) and passed all advanced (no false negatives). All intermediate participants and six out of 14 trainees passed. We developed a test for diagnostic abdominal ultrasound with solid validity evidence and a pass/fail-standard without any false-positive or false-negative scores. • Ultrasound training can benefit from competency-based education based on reliable tests. • This simulation-based test can differentiate between competency levels of ultrasound examiners. • This test is suitable for competency-based education, e.g. mastery learning. • We provide a pass/fail standard without false-negative or false-positive scores.
Factors Associated With Negative Attitudes Toward Speaking in Preschool-Age Children Who Do and Do Not Stutter.

PubMed

Groner, Stephen; Walden, Tedra; Jones, Robin

2016-01-01

This study explored relations between the negativity of children's speech-related attitudes as measured by the Communication Attitude Test for Preschool and Kindergarten Children Who Stutter (KiddyCAT; Vanryckeghem & Brutten, 2007) and (a) age; (b) caregiver reports of stuttering and its social consequences; (c) types of disfluencies; and (d) standardized speech, vocabulary, and language scores. Participants were 46 preschool-age children who stutter (CWS; 12 females, 34 males) and 66 preschool-age children who do not stutter (CWNS; 35 females, 31 males). After a conversation, children completed standardized tests and the KiddyCAT while their caregivers completed scales on observed stuttering behaviors and their consequences. The KiddyCAT scores of both the CWS and the CWNS were significantly negatively correlated with age. Both groups' KiddyCAT scores increased with higher scores on the Speech Fluency Rating Scale of the Test of Childhood Stuttering (Gillam, Logan, & Pearson, 2009). Repetitions were a significant contributor to the CWNS's KiddyCAT scores, but no specific disfluency significantly contributed to the CWS's KiddyCAT scores. Greater articulation errors were associated with higher KiddyCAT scores in the CWNS. No standardized test scores were associated with KiddyCAT scores in the CWS. Attitudes that speech is difficult are not associated with similar aspects of communication for CWS and CWNS. Age significantly contributed to negative speech attitudes for CWS, whereas age, repetitions, and articulation errors contributed to negative speech attitudes for CWNS.
A Practical Method for Identifying Significant Change Scores

ERIC Educational Resources Information Center

Cascio, Wayne F.; Kurtines, William M.

1977-01-01

A test of significance for identifying individuals who are most influenced by an experimental treatment as measured by pre-post test change score is presented. The technique requires true difference scores, the reliability of obtained differences, and their standard error of measurement. (Author/JKS)
The effect of instructional methodology on high school students natural sciences standardized tests scores

NASA Astrophysics Data System (ADS)

Powell, P. E.

Educators have recently come to consider inquiry based instruction as a more effective method of instruction than didactic instruction. Experience based learning theory suggests that student performance is linked to teaching method. However, research is limited on inquiry teaching and its effectiveness on preparing students to perform well on standardized tests. The purpose of the study to investigate whether one of these two teaching methodologies was more effective in increasing student performance on standardized science tests. The quasi experimental quantitative study was comprised of two stages. Stage 1 used a survey to identify teaching methods of a convenience sample of 57 teacher participants and determined level of inquiry used in instruction to place participants into instructional groups (the independent variable). Stage 2 used analysis of covariance (ANCOVA) to compare posttest scores on a standardized exam by teaching method. Additional analyses were conducted to examine the differences in science achievement by ethnicity, gender, and socioeconomic status by teaching methodology. Results demonstrated a statistically significant gain in test scores when taught using inquiry based instruction. Subpopulation analyses indicated all groups showed improved mean standardized test scores except African American students. The findings benefit teachers and students by presenting data supporting a method of content delivery that increases teacher efficacy and produces students with a greater cognition of science content that meets the school's mission and goals.
Sensitivity and specificity of a digit symbol recognition trial in the identification of response bias.

PubMed

Kim, Nancy; Boone, Kyle B; Victor, Tara; Lu, Po; Keatinge, Carolyn; Mitchell, Cary

2010-08-01

Recently published practice standards recommend that multiple effort indicators be interspersed throughout neuropsychological evaluations to assess for response bias, which is most efficiently accomplished through use of effort indicators from standard cognitive tests already included in test batteries. The present study examined the utility of a timed recognition trial added to standard administration of the WAIS-III Digit Symbol subtest in a large sample of "real world" noncredible patients (n=82) as compared with credible neuropsychology clinic patients (n=89). Scores from the recognition trial were more sensitive in identifying poor effort than were standard Digit Symbol scores, and use of an equation incorporating Digit Symbol Age-Corrected Scaled Scores plus accuracy and time scores from the recognition trial was associated with nearly 80% sensitivity at 88.7% specificity. Thus, inclusion of a brief recognition trial to Digit Symbol administration has the potential to provide accurate assessment of response bias.
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Utilization of standardized patients to evaluate clinical and interpersonal skills of surgical residents.

PubMed

Hassett, James M; Zinnerstrom, Karen; Nawotniak, Ruth H; Schimpfhauser, Frank; Dayton, Merril T

2006-10-01

This project was designed to determine the growth of interpersonal skills during the first year of a surgical residency. All categorical surgical residents were given a clinical skills examination of abdominal pain using standardized patients during their orientation (T1). The categorical residents were retested after 11 months (T2). The assessment tool was based on a 12-item modified version of the 5-point Likert Interpersonal Scale (IP) used on the National Board of Medical Examiners prototype Clinical Skills Examination and a 24-item, done-or-not-done, history-taking checklist. Residents' self-evaluation scores were compared to standardized patients' assessment scores. Data were analyzed using the Pearson correlation coefficient, Wilcoxon signed rank test, Student t test, and Cronbach alpha. Thirty-eight categorical residents were evaluated at T1 and T2. At T1, in the history-taking exercise, the scores of the standardized patients and residents correlated (Pearson = .541, P = .000). In the interpersonal skills exercise, the scores of the standardized patients and residents did not correlate (Pearson = -0.238, P = .150). At T2, there was a significant improvement in the residents' self-evaluation scores in both the history-taking exercise (t = -3.280, P = .002) and the interpersonal skills exercise (t = 2.506, P = 0.017). In the history-taking exercise, the standardized patients' assessment scores correlated with the residents' self-evaluation scores (Pearson = 0.561, P = .000). In the interpersonal skills exercise, the standardized patients' assessment scores did not correlate with the residents' self-evaluation scores (Pearson = 0.078, P = .646). Surgical residents demonstrate a consistently low level of self-awareness regarding their interpersonal skills. Observed improvement in resident self-evaluation may be a function of growth in self-confidence.
Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

ERIC Educational Resources Information Center

Andersson, Björn

2016-01-01

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Comparing Standardized Test Scores among Arts-Integrated and Non-Arts Integrated Schools in Central Mississippi

ERIC Educational Resources Information Center

Dean, Darlene

2014-01-01

The topic of arts integration creates continuing dialog among educators and arts advocates. This study examined the degree to which student achievement was affected when arts education is limited or eliminated from schools to meet the mandates of NCLB (2001) legislation. Standardized test scores from 12 schools in Central Mississippi were used to…
Elementary School Teachers' Experiences in a Professional Learning Community Addressing the Gender Gap

ERIC Educational Resources Information Center

Boyles, Glenda F.

2011-01-01

International research and academic performance measures indicate that boys are scoring lower on standardized tests than girls in reading and writing. At the time of this study, boys had lower standardized test scores than girls in an elementary school in a southeastern state in the United States. The purpose of this qualitative case study was to…
Co-Educational Tutorial Classes and Their Significance on Gendered Test Scores of Wollo University Students: A Before-After Analyses

ERIC Educational Resources Information Center

Gidey, Mu'uz

2015-01-01

This action research is carried out in a practical class room setting to devise an innovative way of administering tutorial classes to improve students' learning competence with particular reference to gendered test scores. A before-after test score analyses of mean and standard deviations along with t-statistical tests of hypotheses of second…
Development and field test of psychophysical tests for DWI arrest

DOT National Transportation Integrated Search

1981-03-01

Administration and scoring procedures were standardized for a sobriety test battery consisting of the walk-and-turn test, the one leg stand test, and horizontal gaze nystagmus. The effectiveness of the standardized battery was then evaluated in the l...

What No Child Left Behind Leaves Behind: The Roles of IQ and Self-Control in Predicting Standardized Achievement Test Scores and Report Card Grades

PubMed Central

Duckworth, Angela L.; Quinn, Patrick D.; Tsukayama, Eli

2013-01-01

The increasing prominence of standardized testing to assess student learning motivated the current investigation. We propose that standardized achievement test scores assess competencies determined more by intelligence than by self-control, whereas report card grades assess competencies determined more by self-control than by intelligence. In particular, we suggest that intelligence helps students learn and solve problems independent of formal instruction, whereas self-control helps students study, complete homework, and behave positively in the classroom. Two longitudinal, prospective studies of middle school students support predictions from this model. In both samples, IQ predicted changes in standardized achievement test scores over time better than did self-control, whereas self-control predicted changes in report card grades over time better than did IQ. As expected, the effect of self-control on changes in report card grades was mediated in Study 2 by teacher ratings of homework completion and classroom conduct. In a third study, ratings of middle school teachers about the content and purpose of standardized achievement tests and report card grades were consistent with the proposed model. Implications for pedagogy and public policy are discussed. PMID:24072936
Middle school science grades: Can they be used to forecast performance on standardized tests?

NASA Astrophysics Data System (ADS)

Hubbard, Gary L.

2007-12-01

The purpose of this study was to determine if classroom science grades could be used to forecast standardized testing readiness for the Florida Comprehensive Assessment Test (FCAT). Participants for this study consisted of 647 eighth grade students assigned to a public middle school in Florida. Using annual classroom science grades and the corresponding year's FCAT Science scale scores for each student, scatter plot graphs and Pearson product-moment correlations were used to determine their relationships. Correlation strengths were determined for several segmented student populations. First, the grade and FCAT score relationship for the entire middle school population was calculated and, then, the relationship between grades and FCAT scores for students grouped by their individual assigned science teacher was determined. Next, a second look at students grouped as above was conducted, this time focusing only on students with unacceptable FCAT scores (levels 1 and 2). The correlation between grades and FCAT scores for the entire middle school was moderate and ranged from high to weak for students assigned to individual science teachers. The relationship of grades and FCAT scores for middle school students that scored at levels 1 and 2 was weakly correlated and ranged from moderate to weak for students as they were assigned to their science teachers. Generally, classroom grades were found to be inefficient predictors for standardized testing readiness for students assigned to this middle school.
Maintaining Equivalent Cut Scores for Small Sample Test Forms

ERIC Educational Resources Information Center

Dwyer, Andrew C.

2016-01-01

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Grade Equivalents: We Report Them, You Should Too.

ERIC Educational Resources Information Center

Ligon, Glynn; Battaile, Richard

In certain situations, grade equivalent scores are the most appropriate statistic available for reporting achievement test data. It is noted that testing practitioners have found that raw scores, normal curve equivalents, stanines, and standard scores are very useful. However, it is best to convert to either grade equivalents or percentiles before…
The relationship between standards-based reporting systems and third-grade mathematics and science achievement

NASA Astrophysics Data System (ADS)

Prejean-Harris, Rose M.

Over the last decade, accountability has been the driving force for many changes in education in the United States. One major educational reform effort is the standards-based movement with a focus of combining a number of processes that involve aligning curriculum, instruction, assessment and feedback to specific standards that are measureable and indicative of student achievement. The purpose of this study is to determine if the type of report card is a possible predictor of third grade student achievement on standardized tests in mathematics and science for the 2012 Criterion-Referenced Competency Test (CRCT). The results of this study concluded that the difference in test scores in mathematics and science for students in the traditional report card group was not statistically significant when compared to the scores of students in the standards-based report card group when controlling for poverty level, school locale, and school district. However, students in the traditional report card group scored an average of 1.01 point higher in mathematics and 2.27 points higher in science than students in the standards-based report card group.
Inter-rater Agreement on Final Competency Testing Utilizing Standardized Patients.

PubMed

Bowman, Dixie H; Ferber, Kyle L; Sima, Adam P

2016-01-01

The purpose of this study was to determine whether licensed physical therapists (n=8) serving as standardized patients (SPs) for practical examinations evaluate physical therapy students (n=51) equivalently to the physical therapy course instructor (n=1). The SPs completed the same assessment based on the evaluation criteria as did the instructor. The scores for the practical examination, answers to three questions, and the documentation note were summarized separately for the SP and the instructor by means and standard deviations. A paired t-test and an intraclass correlation coefficient (ICC) for each aspect of the score were calculated. ICC(1,1) values were reported along with corresponding 95% confidence intervals. The instructor had significantly higher scores for the practical exam and the overall score compared to the ratings from the SPs. No differences were observed between the instructor and SP scores on the three answers to the questions and documentation note scores. Based on the ICC values identified in this study, a physical therapist serving as an SP may not be an adequate replacement for an instructor when it comes to grading physical therapy students on all aspects of their competency tests.
Factor Structure of the Comprehensive Trail Making Test in Children and Adolescents with Brain Dysfunction

ERIC Educational Resources Information Center

Allen, Daniel N.; Thaler, Nicholas S.; Barchard, Kimberly A.; Vertinski, Mary; Mayfield, Joan

2012-01-01

The Comprehensive Trail Making Test (CTMT) is a relatively new version of the Trail Making Test that has a number of appealing features, including a large normative sample that allows raw scores to be converted to standard "T" scores adjusted for age. Preliminary validity information suggests that CTMT scores are sensitive to brain…
An Investigation of Indicators of Success in Graduates of a Progressive, Urban, Public High School

ERIC Educational Resources Information Center

Kunkel, Christine D.

2016-01-01

Using standardized test scores to measure success in schools is a controversial topic in education today. Many feel that test scores are not a valid indicator of success, or are being overused to the detriment of the curriculum. But if not test scores, then what is the alternative? This study examines potential alternatives, or more authentic…
First-Grade Spelling Scores within the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) Screening: An Exploratory Study

ERIC Educational Resources Information Center

Munger, Kristen A.; Murray, Maria S.

2017-01-01

The purpose of this study was to examine the validity evidence of first-grade spelling scores from a standardized test of nonsense word spellings and their potential value within universal literacy screening. Spelling scores from the Test of Phonological Awareness: Second Edition PLUS for 47 first-grade children were scored using a standardized…
The Ability of Standardized Test Instruments to Differentiate Membership in Different Vocational-Technical Curricula. Project MINI-SCORE, Final Technical Report.

ERIC Educational Resources Information Center

Pucel, David J.; And Others

Using post-secondary vocational education students as the populations, these two sub-studies of the Project MINI-SCORE sought to determine the extent to which pre-enrollment standardized test data can be used to predict vocational success. For the purpose of the study, vocational success was defined either as successful graduation or successful…
The Impact of Year-Round Education on Fifth Grade African American Reading Achievement Scores in an Urban Illinois School

ERIC Educational Resources Information Center

Merrill, Carolyn Ann

2012-01-01

The purpose of this quantitative, causal-comparative study was to determine the impact of the year-round education school calendar on the standardized test performance of fifth grade African American students, as measured by the Illinois Standards Achievement Test (ISAT) in reading. The ISAT reading scores from two year-round education (YRE)…
Counting-backward test for executive function in idiopathic normal pressure hydrocephalus.

PubMed

Kanno, S; Saito, M; Hayashi, A; Uchiyama, M; Hiraoka, K; Nishio, Y; Hisanaga, K; Mori, E

2012-10-01

The aim of this study was to develop and validate a bedside test for executive function in patients with idiopathic normal pressure hydrocephalus (INPH). Twenty consecutive patients with INPH and 20 patients with Alzheimer's disease (AD) were enrolled in this study. We developed the counting-backward test for evaluating executive function in patients with INPH. Two indices that are considered to be reflective of the attention deficits and response suppression underlying executive dysfunction in INPH were calculated: the first-error score and the reverse-effect index. Performance on both the counting-backward test and standard neuropsychological tests for executive function was assessed in INPH and AD patients. The first-error score, reverse-effect index and the scores from the standard neuropsychological tests for executive function were significantly lower for individuals in the INPH group than in the AD group. The two indices for the counting-backward test in the INPH group were strongly correlated with the total scores for Frontal Assessment Battery and Phonemic Verbal Fluency. The first-error score was also significantly correlated with the error rate of the Stroop colour-word test and the score of the go/no-go test. In addition, we found that the first-error score highly distinguished patients with INPH from those with AD using these tests. The counting-backward test is useful for evaluating executive dysfunction in INPH and for differentiating between INPH and AD patients. In particular, the first-error score may reflect deficits in the response suppression related to executive dysfunction in INPH. © 2012 John Wiley & Sons A/S.
Estimating Achievement Gaps from Test Scores Reported in Ordinal "Proficiency" Categories

ERIC Educational Resources Information Center

Ho, Andrew D.; Reardon, Sean F.

2012-01-01

Test scores are commonly reported in a small number of ordered categories. Examples of such reporting include state accountability testing, Advanced Placement tests, and English proficiency tests. This paper introduces and evaluates methods for estimating achievement gaps on a familiar standard-deviation-unit metric using data from these ordered…
Proficiency Standards and Cut-Scores for Language Proficiency Tests.

ERIC Educational Resources Information Center

Moy, Raymond H.

The problem of standard setting on language proficiency tests is often approached by the use of norms derived from the group being tested, a process commonly known as "grading on the curve." One particular problem with this ad hoc method of standard setting is that it will usually result in a fluctuating standard dependent on the particular group…
The Standardized Growth Expectation: Implications for Educational Evaluation.

ERIC Educational Resources Information Center

Stenner, A. Jackson; And Others

Three assumptions underlying the use of norm referenced tests are examined: (1) that expressing treatment effects in a standard score metric permits aggregation of effects across grades; (2) commonly used standardized tests are sufficiently comparable to permit aggregation of results across tests; and (3) the summer loss of achievement observed in…
Research Says…/High-Stakes Testing Narrows the Curriculum

ERIC Educational Resources Information Center

David, Jane L.

2011-01-01

The current rationale for standards-based reform goes like this: If standards are demanding and tests accurately measure achievement of those standards, then curriculum and instruction will become richer and more rigorous. By attaching serious consequences to schools that fail to increase test scores, U.S. policymakers believe that educators will…
Association between the Medical College Admission Test scores and Alpha Omega Alpha Medical Honors Society membership.

PubMed

Gauer, Jacqueline L; Jackson, J Brooks

2017-01-01

Medical schools worldwide are faced with the challenge of selecting from among many qualified applicants. One factor that might help admissions committees identify future exceptional medical students is scores on standardized entrance exams. The purpose of this study was to determine the association between scores on the most commonly used standardized medical school entrance exam in the USA, the Medical College Admission Test (MCAT), and election to the US medical honors society, Alpha Omega Alpha (AOA). MCAT scores and AOA membership data were analyzed for all the students pursuing Doctor of Medicine degrees at the University of Minnesota Medical School and who graduated between 2012-2016 (n=1,309). An independent-samples t -test found a significant difference (t=6.132, p <0.001) in MCAT scores between those who were elected to AOA (n=179) and those who were not (n=1,130). On average, students who were elected to AOA had composite MCAT scores of 1.65 points higher than those who were not. Percentages of students elected to AOA gradually but inconsistently increased with MCAT score. No student who scored <27 on the MCAT was elected to AOA. Among students with MCAT scores at the 99th percentile or above (scores of ≥38), 13 of 48 (27.1%) were elected to AOA. Election to AOA during medical school was significantly associated with higher MCAT scores. Admissions committees should carefully consider the role of standardized entrance exam scores, in the context of a holistic review, when selecting for exceptional medical students.
Association between the Medical College Admission Test scores and Alpha Omega Alpha Medical Honors Society membership

PubMed Central

Gauer, Jacqueline L; Jackson, J Brooks

2017-01-01

Introduction Medical schools worldwide are faced with the challenge of selecting from among many qualified applicants. One factor that might help admissions committees identify future exceptional medical students is scores on standardized entrance exams. The purpose of this study was to determine the association between scores on the most commonly used standardized medical school entrance exam in the USA, the Medical College Admission Test (MCAT), and election to the US medical honors society, Alpha Omega Alpha (AOA). Method MCAT scores and AOA membership data were analyzed for all the students pursuing Doctor of Medicine degrees at the University of Minnesota Medical School and who graduated between 2012–2016 (n=1,309). Results An independent-samples t-test found a significant difference (t=6.132, p<0.001) in MCAT scores between those who were elected to AOA (n=179) and those who were not (n=1,130). On average, students who were elected to AOA had composite MCAT scores of 1.65 points higher than those who were not. Percentages of students elected to AOA gradually but inconsistently increased with MCAT score. No student who scored <27 on the MCAT was elected to AOA. Among students with MCAT scores at the 99th percentile or above (scores of ≥38), 13 of 48 (27.1%) were elected to AOA. Discussion Election to AOA during medical school was significantly associated with higher MCAT scores. Admissions committees should carefully consider the role of standardized entrance exam scores, in the context of a holistic review, when selecting for exceptional medical students. PMID:28979178
Effective communication of molecular genetic test results to primary care providers.

PubMed

Scheuner, Maren T; Edelen, Maria Orlando; Hilborne, Lee H; Lubin, Ira M

2013-06-01

We evaluated a template for molecular genetic test reports that was developed as a strategy to reduce communication errors between the laboratory and ordering clinician. We surveyed 1,600 primary care physicians to assess satisfaction, ease of use, and effectiveness of genetic test reports developed using our template and reports developed by clinical laboratories. Mean score differences of responses between the reports were compared using t-tests. Two-way analysis of variance evaluated the effect of template versus standard reports and the influence of physician characteristics. There were 396 (24%) respondents. Template reports had higher scores than the standard reports for each survey item. The gender and specialty of the physician did not influence scores; however, younger physicians gave higher scores regardless of report type. There was significant interaction between report type and whether physicians ordered or reviewed any genetic tests (none versus at least one) in the past year, P = 0.005. For each survey item assessing satisfaction, ease of use, and effectiveness, physicians gave higher ratings to genetic test reports developed with the template than standard reports used by clinical laboratories. Physicians least familiar with genetic test reports, and possibly having the greatest need for better communication, were best served by the template reports.
Modeling Floor Effects in Standardized Vocabulary Test Scores in a Sample of Low SES Hispanic Preschool Children under the Multilevel Structural Equation Modeling Framework.

PubMed

Zhu, Leina; Gonzalez, Jorge

2017-01-01

Researchers and practitioners often use standardized vocabulary tests such as the Peabody Picture Vocabulary Test-4 (PPVT-4; Dunn and Dunn, 2007) and its companion, the Expressive Vocabulary Test-2 (EVT-2; Williams, 2007), to assess English vocabulary skills as an indicator of children's school readiness. Despite their psychometric excellence in the norm sample, issues arise when standardized vocabulary tests are used to asses children from culturally, linguistically and ethnically diverse backgrounds (e.g., Spanish-speaking English language learners) or delayed in some manner. One of the biggest challenges is establishing the appropriateness of these measures with non-English or non-standard English speaking children as often they score one to two standard deviations below expected levels (e.g., Lonigan et al., 2013). This study re-examines the issues in analyzing the PPVT-4 and EVT-2 scores in a sample of 4-to-5-year-old low SES Hispanic preschool children who were part of a larger randomized clinical trial on the effects of a supplemental English shared-reading vocabulary curriculum (Pollard-Durodola et al., 2016). It was found that data exhibited strong floor effects and the presence of floor effects made it difficult to differentiate the invention group and the control group on their vocabulary growth in the intervention. A simulation study is then presented under the multilevel structural equation modeling (MSEM) framework and results revealed that in regular multilevel data analysis, ignoring floor effects in the outcome variables led to biased results in parameter estimates, standard error estimates, and significance tests. Our findings suggest caution in analyzing and interpreting scores of ethnically and culturally diverse children on standardized vocabulary tests (e.g., floor effects). It is recommended appropriate analytical methods that take into account floor effects in outcome variables should be considered.

Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

ERIC Educational Resources Information Center

Kolen, Michael J.; Lee, Won-Chan

2011-01-01

This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
The Relative Influence of Faculty Mobility on NJ HSPA Scores

ERIC Educational Resources Information Center

Graziano, Dana

2013-01-01

In this study, the researcher examined the strength and direction of relationships between New Jersey School Report Card Variables, in particular Faculty Mobility, and 2009-2010 New Jersey High School Proficiency Assessment (HSPA) Math and Language Arts Literacy test scores. Variables found to have an influence on standardized test scores in the…
A Case Study of Rural New Mexico K-12 Teachers' Perceptions of Standardized Testing

ERIC Educational Resources Information Center

Hite-Pope, Kim

2017-01-01

The purpose of this paper was to examine K-12 teachers' classroom experiences with standardized testing in rural New Mexico schools. Standardized tests have significantly changed the landscape of education with the use of students' test scores as a determining factor for advancement or failure for teachers (Simpson, Lacava, & Graner, 2013).…
Standard intelligence tests are valid instruments for measuring the intellectual potential of urban children: comments on pitfalls in the measurement of intelligence.

PubMed

Sattler, J M

1979-05-01

Hardy, Welcher, Mellitis, and Kagan altered standard WISC administrative and scoring procedures and, from the resulting higher subtest scores, concluded that IQs based on standardized tests are inappropriate measures for inner-city children. Careful examination of their study reveals many methodological inadequacies and problematic interpretations. Three of these are as follows: (a) failure to use any external criterion to evaluate the validity of their testing-of-limits procedures; (b) the possibility of examiner and investigator bias; and (c) lack of any comparison group that might demonstrate that poor children would be helped more than others by the probes recommended. Their report creates misleading doubts about existing intelligence tests and does a disservice to inner-city children who need the benefits of the judicious use of diagnostic procedures, which include standardized intelligence tests. Consequently, their assertion concerning the inappropriateness of standardized test results for inner-city children is not only premature and misleading, but it is unwarranted as well.
End-of-Course Multiple-Choice Test Results, 2008-09. Measuring Up. E&R Report No. 10.04

ERIC Educational Resources Information Center

McMillen, Brad

2010-01-01

End-of-Course (EOC) tests are given statewide in 10 courses typically taken in high school. Results for 2008-09 (and prior years, where available) are reported in terms of both average scale scores and the percentage of students scoring proficient. After the recent introduction of new EOC tests with higher standards, scores in WCPSS have begun to…
The impact of using standardized patients in psychiatric cases on the levels of motivation and perceived learning of the nursing students.

PubMed

Sarikoc, Gamze; Ozcan, Celale Tangul; Elcin, Melih

2017-04-01

The use of standardized patients is not very common in psychiatric nursing education and there has been no study conducted in Turkey. This study evaluated the impact of using standardized patients in psychiatric cases on the levels of motivation and perceived learning of the nursing students. This manuscript addressed the quantitative aspect of a doctoral thesis study in which both quantitative and qualitative methods were used. A pre-test and post-test were employed in the quantitative analysis in a randomized and controlled study design. The motivation scores, and interim and post-test scores for perceived learning were higher in the experimental group compared to pre-test scores and the scores of the control group. The students in the experimental group reported that they felt more competent about practical training in clinical psychiatry, as well as in performing interviews with patients having mental problems, and reported less anxiety about performing an interview when compared to students in the control group. It is considered that the inclusion of standardized patient methodology in the nursing education curriculum in order to improve the knowledge level and skills of students would be beneficial in the training of mental health nurses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rocking at 81 and Rolling at 34: ROC Cut-Off Scores for the Negative Acts Questionnaire–Revised in Serbia

PubMed Central

Petrović, Ivana B.; Vukelić, Milica; Čizmić, Svetlana

2017-01-01

Researchers are still searching for the ways to identify different categories of employees according to their exposure to negative acts and psychological experience of workplace bullying. We followed Notelaers and Einarsen’s application of the ROC analysis to determine the NAQ-R cut-off scores applying a “lower” and “higher” threshold. The main goal of this research was to develop and test different gold standards of personal and organizational relevance in determining the NAQ-R cut-off scores in a specific cultural and economic context of Serbia. Apart from combining self-labeling as a victim with self-perceived health, the objectives were to test the gold standards developed as a combination of self-labeling with life satisfaction, self-labeling with intention to leave and a complex gold standard based on self-labeling, self-perceived health, life satisfaction and intention to leave taken together. The ROC analysis on Serbian workforce data supports applying of different gold standards. For identifying employees in a preliminary stage of bullying, the most applicable was the gold standard based on self-labeling and intention to leave (score 34 and higher). The most accurate identification of victims could be based on the most complex gold standard (score 81 and higher). This research encourages further investigation of gold standards in different cultures. PMID:28119652
Validity and Reliability of Baseline Testing in a Standardized Environment.

PubMed

Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur

2017-08-11

The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Long-term language levels and reading skills in mandarin-speaking prelingually deaf children with cochlear implants.

PubMed

Wu, Che-Ming; Chen, Yen-An; Chan, Kai-Chieh; Lee, Li-Ang; Hsu, Kuang-Hung; Lin, Bao-Guey; Liu, Tien-Chen

2011-01-01

The aim of this study was to document receptive and expressive language levels and reading skills achieved by Mandarin-speaking children who had received cochlear implants (CIs) and used them for 4.75-7.42 years. The effects of possible associated factors were also analyzed. Standardized Mandarin language and reading tests were administered to 39 prelingually deaf children with Nucleus 24 devices. The Mandarin Chinese version of the Peabody Picture Vocabulary Test was used to assess their receptive vocabulary knowledge and the Revised Primary School Language Assessment Test for their receptive and expressive language skills. The Graded Chinese Character Recognition Test was used to test their written word recognition ability and the Reading Comprehension Test for their reading comprehension ability. Raw scores from both language and reading measurements were compared to normative data of nor- mal-hearing children to obtain standard scores. The results showed that the mean standard score for receptive vocabulary measurement and the mean T scores for the receptive language, expressive language and total language measurement were all in the low-average range in comparison to the normative sample. In contrast, the mean T scores for word and text reading comprehension were almost the same as for their age-matched hearing counterparts. Among all children with CIs, 75.7% scored within or above the normal range of their age-matched hearing peers on receptive vocabulary measurement. For total language, Chinese word recognition and reading scores, 71.8, 77 and 82% of children with CIs were age appropriate, respectively. A strong correlation was found between language and reading skills. Age at implantation and sentence perception scores account for 37% of variance for total language outcome. Sentence perception scores and preimplantation residual hearing were revealed to be associated with the outcome of reading comprehension. We concluded that by using standard tests, the language development and reading skill of Mandarin-speaking children who use CIs from a young age appear to fall within the normal range of their hearing age mates, at least after 4.8-7.4 years of experience. However, to fully evaluate the fine linguistic skills of these subjects, a more detailed study and longer follow-up period are needed. Copyright © 2010 S. Karger AG, Basel.
A comparison of intellectual assessments over video conferencing and in-person for individuals with ID: preliminary data.

PubMed

Temple, V; Drummond, C; Valiquette, S; Jozsvai, E

2010-06-01

Video conferencing (VC) technology has great potential to increase accessibility to healthcare services for those living in rural or underserved communities. Previous studies have had some success in validating a small number of psychological tests for VC administration; however, VC has not been investigated for use with persons with intellectual disabilities (ID). A comparison of test results for two well known and widely used assessment instruments was undertaken to establish if scores for VC administration would differ significantly from in-person assessments. Nineteen individuals with ID aged 23-63 were assessed once in-person and once over VC using the Wechsler Abbreviated Scale of Intelligence (WASI) and the Beery-Buktenica Test of Visual-Motor Integration (VMI). Highly similar results were found for test scores. Full-scale IQ on the WASI and standard scores for the VMI were found to be very stable across the two administration conditions, with a mean difference of less than one IQ point/standard score. Video conferencing administration does not appear to alter test results significantly for overall score on a brief intelligence test or a test of visual-motor integration.
Rehearsal significantly improves immediate and delayed recall on the Rey Auditory Verbal Learning Test.

PubMed

Hessen, Erik

2011-10-01

A repeated observation during memory assessment with the Rey Auditory Verbal Learning Test (RAVLT) is that patients who spontaneously employ a memory rehearsal strategy by repeating the word list more than once achieve better scores than patients who only repeat the word list once. This observation led to concern about the ability of the standard test procedure of RAVLT and similar tests in eliciting the best possible recall scores. The purpose of the present study was to test the hypothesis that a rehearsal recall strategy of repeating the word list more than once would result in improved scores of recall on the RAVLT. We report on differences in outcome after standard administration and after experimental administration on Immediate and Delayed Recall measures from the RAVLT of 50 patients. The experimental administration resulted in significantly improved scores for all the variables employed. Additionally, it was found that patients who failed effort screening showed significantly poorer improvement on Delayed Recall compared with those who passed the effort screening. The general clear improvement both in raw scores and T-scores demonstrates that recall performance can be significantly influenced by the strategy of the patient or by small variations in instructions by the examiner.
Intelligence--Individually Administered, Grades 4-6. Annotated Bibliography of Tests.

ERIC Educational Resources Information Center

Educational Testing Service, Princeton, NJ. Test Collection.

Among the individually administered 34 intelligence tests described in this bibliography are those for deaf persons, Spanish speakers, and other special populations. Tests requiring nonverbal responses are included. Most of the tests described in this bibliography provide I.Q. scores which are standard scores, with a mean of 100 and standard…
Intelligence--Individually Administered, Preschool-Grade 3. Annotated Bibliography of Tests.

ERIC Educational Resources Information Center

Educational Testing Service, Princeton, NJ. Test Collection.

Among the individually administered 26 intelligence tests described in this bibliography are those for deaf persons, Spanish speakers, and other special populations. Tests requiring nonverbal responses are included. Most of the tests described in this bibliography provide I.Q. scores which are standard scores, with a mean of 100 and standard…
New Tests Put States on Spot

ERIC Educational Resources Information Center

Ujifusa, Andrew

2012-01-01

As states begin to demand more rigor on their high-stakes tests--and the tests evolve to incorporate revised academic standards--many officials are gambling that an initial wave of lower scores will give way to greater student achievement in the future. Changes to statewide tests and subsequent plummeting scores sparked controversy and emergency…
42 CFR 493.859 - Standard; ABO group and D (Rho) typing.

Code of Federal Regulations, 2013 CFR

2013-10-01

... attain a score of at least 100 percent of acceptable responses for each analyte or test in each testing event is unsatisfactory analyte performance for the testing event. (b) Failure to attain an overall.... (2) For any unacceptable analyte or unsatisfactory testing event score, remedial action must be taken...
42 CFR 493.859 - Standard; ABO group and D (Rho) typing.

Code of Federal Regulations, 2012 CFR

2012-10-01

... attain a score of at least 100 percent of acceptable responses for each analyte or test in each testing event is unsatisfactory analyte performance for the testing event. (b) Failure to attain an overall.... (2) For any unacceptable analyte or unsatisfactory testing event score, remedial action must be taken...
42 CFR 493.859 - Standard; ABO group and D (Rho) typing.

Code of Federal Regulations, 2014 CFR

2014-10-01

... attain a score of at least 100 percent of acceptable responses for each analyte or test in each testing event is unsatisfactory analyte performance for the testing event. (b) Failure to attain an overall.... (2) For any unacceptable analyte or unsatisfactory testing event score, remedial action must be taken...
Standardized Tests and Other Criteria in Admissions Decisions: A Classroom Activity

ERIC Educational Resources Information Center

Pawlow, Laura A.

2010-01-01

This exercise aims to provide a hands-on, role-playing activity that requires students to evaluate the strengths and limitations of standardized tests in making admission decisions. Small groups pretend to be an admissions committee and review fictitious student applications containing both standardized test scores and other information admissions…
49 CFR 383.131 - Test manuals.

Code of Federal Regulations, 2012 CFR

2012-10-01

...; (ix) Causes for automatic failure of skills tests; (x) Standardized scoring sheets for the skills tests; and (xi) Standardized driving instructions for the applicants. (2) A State may include any... 49 Transportation 5 2012-10-01 2012-10-01 false Test manuals. 383.131 Section 383.131...
The Development and Structure of Professional Examinations Planned for National Use.

ERIC Educational Resources Information Center

Hecht, James T.

The process typically employed by testing services in developing professional tests for national use is described: (1) determination of professional standards; (2) development of test specifications; (3) test construction; (4) test registration and administration; and (5) scoring, analysis, and reporting. To determine professional standards, input…

The relationship between selected standardized test scores and performance in advanced placement math and science exams: Analyzing the differential effectiveness of scores for course identification and placement

NASA Astrophysics Data System (ADS)

Urbina, Josue N.

There is a national need to increase the STEM-related workforce. Among factors leading towards STEM careers include the number of advanced high school mathematics and science courses students complete. Florida's enrollment patterns in STEM-related Advanced Placement (AP) courses, however, reveal that only a small percentage of students enroll into these classes. Therefore, screening tools are needed to find more students for these courses, who are academically ready, yet have not been identified. The purpose of this study was to investigate the extent to which scores from a national standardized test, Preliminary Scholastic Assessment Test/ National Merit Qualifying Test (PSAT/NMSQT), in conjunction with and compared to a state-mandated standardized test, Florida Comprehensive Assessment Test (FCAT), are related to selected AP exam performance in Seminole County Public Schools. An ex post facto correlational study was conducted using 6,189 student records from the 2010 - 2012 academic years. Multiple regression analyses using simultaneous Full Model testing showed differential moderate to strong relationships between scores in eight of the nine AP courses (i.e., Biology, Environmental Science, Chemistry, Physics B, Physics C Electrical, Physics C Mechanical, Statistics, Calculus AB and BC) examined. For example, the significant unique contribution to overall variance in AP scores was a linear combination of PSAT Math (M), Critical Reading (CR) and FCAT Reading (R) for Biology and Environmental Science. Moderate relationships for Chemistry included a linear combination of PSAT M, W (Writing) and FCAT M; a combination of FCAT M and PSAT M was most significantly associated with Calculus AB performance. These findings have implications for both research and practice. FCAT scores, in conjunction with PSAT scores, can potentially be used for specific STEM-related AP courses, as part of a systematic approach towards AP course identification and placement. For courses with moderate to strong relationships, validation studies and development of expectancy tables, which estimate the probability of successful performance on these AP exams, are recommended. Also, findings established a need to examine other related research issues including, but not limited to, extensive longitudinal studies and analyses of other available or prospective standardized test scores.
The Role of Social-Emotional and Social Network Factors in the Relationship Between Academic Achievement and Risky Behaviors.

PubMed

Wong, Mitchell D; Strom, Danielle; Guerrero, Lourdes R; Chung, Paul J; Lopez, Desiree; Arellano, Katherine; Dudovitz, Rebecca N

2017-08-01

We examined whether standardized test scores and grades are related to risky behaviors among low-income minority adolescents and whether social networks and social-emotional factors explained those relationships. We analyzed data from 929 high school students exposed by natural experiment to high- or low-performing academic environments in Los Angeles. We collected information on grade point average (GPA), substance use, sexual behaviors, participation in fights, and carrying a weapon from face-to-face interviews and obtained California math and English standardized test results. Logistic regression and mediation analyses were used to examine the relationship between achievement and risky behaviors. Better GPA and California standardized test scores were strongly associated with lower rates of substance use, high-risk sexual behaviors, and fighting. The unadjusted relative odds of monthly binge drinking was 0.72 (95% confidence interval, 0.56-0.93) for 1 SD increase in standardized test scores and 0.46 (95% confidence interval, 0.29-0.74) for GPA of B- or higher compared with C+ or lower. Most associations disappeared after controlling for social-emotional and social network factors. Averaged across the risky behaviors, mediation analysis revealed social-emotional factors accounted for 33% of the relationship between test scores and risky behaviors and 43% of the relationship between GPA with risky behaviors. Social network characteristics accounted for 31% and 38% of the relationship between behaviors with test scores and GPA, respectively. Demographic factors, parenting, and school characteristics were less important explanatory factors. Social-emotional factors and social network characteristics were the strongest explanatory factors of the achievement-risky behavior relationship and might be important to understanding the relationship between academic achievement and risky behaviors. Published by Elsevier Inc.
Simple exercise test score versus cardiac stress test for the prediction of coronary artery disease in patients with type 2 diabetes.

PubMed

Pikto-Pietkiewicz, Witold; Przewłocka, Monika; Chybowska, Barbara; Cyciwa, Alona; Pasierski, Tomasz

2014-01-01

Type 2 diabetes markedly increases the risk of coronary heart disease (CHD), and screening for CHD is suggested by the guidelines. The aim of the study was to compare the diagnostic usefulness of the simple exercise test score, incorporating the clinical data and cardiac stress test results, with the standard stress test in patients with type 2 diabetes. A total of 62 consecutive patients (aged 65.4 ±8.5 years; 32 men) with type 2 diabetes and clinical symptoms suggesting CHD underwent a stress test followed by coronary angiography. The simple score was calculated for all patients. Significant coronary stenosis was observed in 41 patients (66.1%). Stress test results were positive in 36 patients (58.1%). The mean simple score was high (65.5 ±14.3 points). A positive linear relationship was observed between the score and the prevalence of CHD (R2 = 0.19; P <0.001) as well as its severity (R² = 0.23; P <0.001). The area under the receiver-operating characteristic curve for the simple score was 0.74 (95% confidence interval [CI], 0.62-0.86). At the original cut-off value of 60 points, the score had a similar prognostic value to that of the standard stress test. However, in a multivariate analysis, only the simple score (odds ratio [OR], 1.46; 95% CI, 1.11-1.94; P <0.01 for an increase in the score by 1 point) and male sex (OR, 1.57; 95% CI, 1.24-1.98; P <0.001) remained independent predictors of CHD. In patients with type 2 diabetes, the simple score correlated with the prevalence and severity of CHD. However, the cut-off value of 60 points was inadequate in the population of diabetic patients with high risk of CHD. The simple score used instead of or together with the stress test was a better predictor of CHD than the stress test alone.
Predicting Achievement in Grades Three through Ten Using the Metropolitan Readiness Test.

ERIC Educational Resources Information Center

Weller, L. David; And Others

1992-01-01

Assessed correlations between 415 first graders' scores on the Metropolitan Readiness Test (MRT), and their scores on standardized achievement tests in mathematics and reading in grades 3, 6, 9, and 10. Concluded that the MRT has potential for contributing to readiness decisions in early grades. (MM)
Relationships of Declining Test Scores and Grade Inflation.

ERIC Educational Resources Information Center

Bellott, Fred K.

The relationship between declining scores on national standardized tests and grade inflation is explored. Grade inflation refers to the indicated measure of evaluation of student performance having higher placement than is usual based on the performances. Data for this study were taken from the American College Testing (ACT) Program Class Profile…
Use of Standardized Test Scores to Predict Success in a Computer Applications Course

ERIC Educational Resources Information Center

Harris, Robert V.; King, Stephanie B.

2016-01-01

The purpose of this study was to see if a relationship existed between American College Testing (ACT) scores (i.e., English, reading, mathematics, science reasoning, and composite) and student success in a computer applications course at a Mississippi community college. The study showed that while the ACT scores were excellent predictors of…
Universality, correlations, and rankings in the Brazilian universities national admission examinations

NASA Astrophysics Data System (ADS)

da Silva, Roberto; Lamb, Luis C.; Barbosa, Marcia C.

2016-09-01

We analyze the scores obtained by students who have taken the ENEM examination, The Brazilian High School National Examination which is used in the admission process at Brazilian universities. The average high schools scores from different disciplines are compared through the Pearson correlation coefficient. The results show a very large correlation between the performance in the different school subjects. Even though the students' scores in the ENEM form a Gaussian due to the standardization, we show that the high schools' scores form a bimodal distribution that cannot be used to evaluate and compare students performance over time. We also show that this high schools distribution reflects the correlation between school performance and the economic level (based on the average family income) of the students. The ENEM scores are compared with a Brazilian non standardized exam, the entrance examination from the Universidade Federal do Rio Grande do Sul. The analysis of the performance of the same individuals in both tests shows that the two tests not only select different abilities, but also lead to the admission of different sets of individuals. Our results indicate that standardized tests might be an interesting tool to compare performance of individuals over the years, but not of institutions.
Specificity and false positive rates of the Test of Memory Malingering, Rey 15-item Test, and Rey Word Recognition Test among forensic inpatients with intellectual disabilities.

PubMed

Love, Christopher M; Glassmire, David M; Zanolini, Shanna Jordan; Wolf, Amanda

2014-10-01

This study evaluated the specificity and false positive (FP) rates of the Rey 15-Item Test (FIT), Word Recognition Test (WRT), and Test of Memory Malingering (TOMM) in a sample of 21 forensic inpatients with mild intellectual disability (ID). The FIT demonstrated an FP rate of 23.8% with the standard quantitative cutoff score. Certain qualitative error types on the FIT showed promise and had low FP rates. The WRT obtained an FP rate of 0.0% with previously reported cutoff scores. Finally, the TOMM demonstrated low FP rates of 4.8% and 0.0% on Trial 2 and the Retention Trial, respectively, when applying the standard cutoff score. FP rates are reported for a range of cutoff scores and compared with published research on individuals diagnosed with ID. Results indicated that although the quantitative variables on the FIT had unacceptably high FP rates, the TOMM and WRT had low FP rates, increasing the confidence clinicians can place in scores reflecting poor effort on these measures during ID evaluations. © The Author(s) 2014.
A new IRT-based standard setting method: application to eCat-listening.

PubMed

García, Pablo Eduardo; Abad, Francisco José; Olea, Julio; Aguado, David

2013-01-01

Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.
Special Education Students: To Be (Tested) or Not To Be (Tested)? That's a Good Question.

ERIC Educational Resources Information Center

Wilkinson, L. David; Matter, M. Kevin

This paper outlines how the Austin Independent School District (Texas) tried to deal with the following questions: (1) Should special education students be included in the administration of standardized tests?; (2) Should their scores be included or excluded in the reporting of test results?; and (3) What are the evidences that test scores reflect…
Effects of Extended Time on the SAT® I: Reasoning Test Score Growth for Students with Learning Disabilities. Research Report No. 1998-7

ERIC Educational Resources Information Center

Camara, Wayne J.; Copeland, Tina; Rothschild, Brian

2005-01-01

Tests administered with accommodations to persons with disabilities have been considered nonequivalent to tests administered under standardized conditions to nondisabled test takers. This study examined the score change patterns for learning disabled students completing extended-time administrations of the SAT I: Reasoning Test in comparison to…
Do Standardized Tests Penalize Deep-Thinking, Creative, or Conscientious Students?: Some Personality Correlates of Graduate Record Examinations Test Scores

ERIC Educational Resources Information Center

Powers, Donald E.; Kaufman, James C.

2004-01-01

The objective of the study reported here was to explore the relationship of Graduate Record Examinations (GRE) General Test scores to selected personality traits--conscientiousness, rationality, ingenuity, quickness, creativity, and depth. A sample of 342 GRE test takers completed short personality inventory scales for each trait. Analyses…
Effect of implementing instructional videos in a physical examination course: an alternative paradigm for chiropractic physical examination teaching.

PubMed

Zhang, Niu; Chawla, Sudeep

2012-01-01

This study examined the effect of implementing instructional video in ophthalmic physical examination teaching on chiropractic students' laboratory physical examination skills and written test results. Instructional video clips of ophthalmic physical examination, consisting of both standard procedures and common mistakes, were created and used for laboratory teaching. The video clips were also available for student review after class. Students' laboratory skills and written test results were analyzed and compared using one-way analysis of variance (ANOVA) and post hoc multiple comparison tests among three study cohorts: the comparison cohort who did not utilize the instructional videos as a tool, the standard video cohort who viewed only the standard procedure of video clips, and the mistake-referenced video cohort who viewed video clips containing both standard procedure and common mistakes. One-way ANOVA suggested a significant difference of lab results among the three cohorts. Post hoc multiple comparisons further revealed that the mean scores of both video cohorts were significantly higher than that of the comparison cohort (p < .001). There was, however, no significant difference of the mean scores between the two video cohorts (p > .05). However, the percentage of students having a perfect score was the highest in the mistake-referenced video cohort. There was no significant difference of written test scores among all three cohorts (p > .05). The instructional video of the standard procedure improves chiropractic students' ophthalmic physical examination skills, which may be further enhanced by implementing a mistake-referenced instructional video.
Performance of non-neurological older adults on the Wisconsin Card Sorting Test and the Stroop Color-Word Test: normal variability or cognitive impairment?

PubMed

Gunner, Jessica H; Miele, Andrea S; Lynch, Julie K; McCaffrey, Robert J

2012-06-01

There is currently no standard criterion for determining abnormal test scores in neuropsychology; thus, a number of different criteria are commonly used. We investigated base rates of abnormal scores in healthy older adults using raw and T-scores from indices of the Wisconsin Card Sorting Test and Stroop Color-Word Test. Abnormal scores were examined cumulatively at seven cutoffs including >1.0, >1.5, >2.0, >2.5, and >3.0 standard deviations (SD) from the mean as well as those below the 10th and 5th percentiles. In addition, the number of abnormal scores at each of the seven cutoffs was also examined. Results showed when considering raw scores, ∼15% of individuals obtained scores>1.0 SD from the mean, around 10% were less than the 10th percentile, and 5% fell >1.5 SD or <5th percentile from the mean. Using T-scores, approximately 15%-20% and 5%-10% of scores were >1.0 and >1.5 SD from the mean, respectively. Roughly 15% and 5% fell at the <10th and <5th percentiles, respectively. Both raw and T-scores>2.0 SD from the mean were infrequent. Although the presence of a single abnormal score at 1.0 and 1.5 SD from the mean or at the 10th and 5th percentiles was not unusual, the presence of ≥2 abnormal scores using any criteria was uncommon. Consideration of base rate data regarding the percentage of healthy individuals scoring in the abnormal range should help avoid classifying normal variability as neuropsychological impairment.
Which Test? Whose Scores? Comparing Standardized Critical Thinking Tests

ERIC Educational Resources Information Center

Hatcher, Donald L.

2011-01-01

In this article, after describing one approach for teaching critical thinking (CT) that was in place at Baker University from 1990 to 2008, the author describes their experience assessing CT using three standardized exams and shows why the choice of a standardized CT test can be problematic and the results misleading. These results can be…
How Standardized Tests Shape--and Limit--Student Learning. A Policy Research Brief

ERIC Educational Resources Information Center

National Council of Teachers of English, 2014

2014-01-01

The term "standardized" tests is often heard along with "high-stakes." Standardized tests are administered, scored, and interpreted in a consistent way, so that the performances of large groups of students can be compared. They are not in themselves high-stakes, but they are often used for high-stakes purposes such as…
Resistance Training Increases the Variability of Strength Test Scores

DTIC Science & Technology

2009-06-08

standard deviations for pretest and posttest strength measurements. This information was recorded for every strength test used in a total of 377 samples...significant if the posttest standard deviation consistently was larger than the pretest standard deviation. This condition could be satisfied even if...the difference in the standard deviations was small. For example, the posttest standard deviation might be 1% larger than the pretest standard
Association between prenatal exposure to ambient diesel particulate matter and perchloroethylene with children's 3rd grade standardized test scores

PubMed Central

Stingone, Jeanette A.; McVeigh, Katharine H.; Claudio, Luz

2016-01-01

The objective of this research was to determine if prenatal exposure to two common urban air pollutants, diesel and perchloroethylene, affects children's 3rd grade standardized test scores in mathematics and English language arts (ELA). Exposure estimates consisted of annual average ambient concentrations of diesel particulate matter and perchloroethylene obtained from the Environmental Protection Agency's 1996 National Air Toxics Assessment for the residential census tract at birth. Outcome data consisted of linked birth and educational records for 201,559 singleton, non-anomalous children born between 1994-1998 who attended New York City public schools. Quantile regression models were used to estimate the effects of these exposures on multiple points within the continuous distribution of standardized test scores. Modified Poisson regression models were used to calculate risk ratios (RR) and 95% confidence intervals (CI) of failing to meet curricula standards, an indicator derived from test scores. Models were adjusted for a number of maternal, neighborhood and childhood factors. Results showed that math scores were approximately 6% of a standard deviation lower for children exposed to the highest levels of both pollutants as compared to children with low levels of both pollutants. Children exposed to high levels of both pollutants also had the largest risk of failing to meet math test standards when compared to children with low levels of exposure to the pollutants (RR 1.10 95%CI 1.07,1.12 RR high perchloroethylene only 1.03 95%CI 1.00,1.06; RR high diesel PM only 1.02 95%CI 0.99,1.06). There was no association observed between exposure to only one of the pollutants and failing to meet ELA standards. This study provides preliminary evidence of associations between prenatal exposure to urban air pollutants and lower academic outcomes. Additionally, these findings suggest that individual pollutants may additively impact health and point to the need to study the collective effects of air pollutant mixtures. Key Words: air toxics, academic outcomes, urban health, tetrachloroethylene, air pollutant mixtures PMID:27058443
Association between prenatal exposure to ambient diesel particulate matter and perchloroethylene with children's 3rd grade standardized test scores.

PubMed

Stingone, Jeanette A; McVeigh, Katharine H; Claudio, Luz

2016-07-01

The objective of this research was to determine if prenatal exposure to two common urban air pollutants, diesel and perchloroethylene, affects children's 3rd grade standardized test scores in mathematics and English language arts (ELA). Exposure estimates consisted of annual average ambient concentrations of diesel particulate matter and perchloroethylene obtained from the Environmental Protection Agency's 1996 National Air Toxics Assessment for the residential census tract at birth. Outcome data consisted of linked birth and educational records for 201,559 singleton, non-anomalous children born between 1994 and 1998 who attended New York City public schools. Quantile regression models were used to estimate the effects of these exposures on multiple points within the continuous distribution of standardized test scores. Modified Poisson regression models were used to calculate risk ratios (RR) and 95% confidence intervals (CI) of failing to meet curricula standards, an indicator derived from test scores. Models were adjusted for a number of maternal, neighborhood and childhood factors. Results showed that math scores were approximately 6% of a standard deviation lower for children exposed to the highest levels of both pollutants as compared to children with low levels of both pollutants. Children exposed to high levels of both pollutants also had the largest risk of failing to meet math test standards when compared to children with low levels of exposure to the pollutants (RR 1.10 95%CI 1.07,1.12 RR high perchloroethylene only 1.03 95%CI 1.00,1.06; RR high diesel PM only 1.02 95%CI 0.99,1.06). There was no association observed between exposure to the pollutants and failing to meet ELA standards. This study provides preliminary evidence of associations between prenatal exposure to urban air pollutants and lower academic outcomes. Additionally, these findings suggest that individual pollutants may additively impact health and point to the need to study the collective effects of air pollutant mixtures. air toxics, academic outcomes, urban health, tetrachloroethylene, air pollutant mixtures. Copyright © 2016 Elsevier Inc. All rights reserved.
Improving Analysis: Dealing with Information Processing Errors

DTIC Science & Technology

2006-11-01

obviating this issue, psychological test data provides information that is normed and scored in a common standardized metric (e.g., a z score. A z score is a...to take these into account when interpreting psychological test information. Clinicians are not alone in their relative inability to outperform...1980); M. Snyder and B. Campbell, " Testing hypotheses about other people: The role of the hypothesis," Personality and Social Psychology Bulletin, No. 6

The Scorer Reliability of Self-Scored Interest Inventories.

ERIC Educational Resources Information Center

O'Shea, Arthur J.; Harrington, Thomas F.

1980-01-01

Describes the procedures the authors of the System for Career Decision-Making (CDM) followed in establishing client scoring reliability. Authors recommend that manuals of self-scored inventories provide data establishing scorer reliability, that scoring be supervised, and that APGA test standards deal directly with scorer reliability. (Author)
Is Test Security an Issue in a Multistation Clinical Assessment?--A Preliminary Study.

ERIC Educational Resources Information Center

Stillman, Paula L.; And Others

1991-01-01

A study investigated possible differences in standardized patient examination scores for three groups of undergraduate (n=176) and graduate (n=221) medical students assessed at different sites over two years. Results show no systematic change in scores over testing dates, suggesting no problems with breach of test security. (MSE)
Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

ERIC Educational Resources Information Center

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.

2015-01-01

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
The Weighted Airman Promotion System: Standardizing Test Scores

DTIC Science & Technology

2008-01-01

This document and trademark( s ) contained herein are protected by law as indicated in a notice appearing later in this work. This electronic...SUBTITLE The Weighted Airman Promotion System. Standardizing Test Scores 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR( S ) 5d...PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME( S ) AND ADDRESS(ES) Rand Corporation,PO Box 2138,Santa Monica
The Relationship Between the Learning Style Perceptual Preferences of Urban Fourth Grade Children and the Acquisition of Selected Physical Science Concepts Through Learning Cycle Instructional Methodology.

NASA Astrophysics Data System (ADS)

Adams, Kenneth Mark

The purpose of this research was to investigate the relationship between the learning style perceptual preferences of fourth grade urban students and the attainment of selected physical science concepts for three simple machines as taught using learning cycle methodology. The sample included all fourth grade children from one urban elementary school (N = 91). The research design followed a quasi-experimental format with a single group, equivalent teacher demonstration and student investigation materials, and identical learning cycle instructional treatment. All subjects completed the Understanding Simple Machines Test (USMT) prior to instructional treatment, and at the conclusion of treatment to measure student concept attainment related to the pendulum, the lever and fulcrum, and the inclined plane. USMT pre and post-test scores, California Achievement Test (CAT-5) percentile scores, and Learning Style Inventory (LSI) standard scores for four perceptual elements for each subject were held in a double blind until completion of the USMT post-test. The hypothesis tested in this study was: Learning style perceptual preferences of fourth grade students as measured by the Dunn, Dunn, and Price Learning Style Inventory (LSI) are significant predictors of success in the acquisition of physical science concepts taught through use of the learning cycle. Analysis of pre and post USMT scores, 18.18 and 30.20 respectively, yielded a significant mean gain of +12.02. A controlled stepwise regression was employed to identify significant predictors of success on the USMT post-test from among USMT pre-test, four CAT-5 percentile scores, and four LSI perceptual standard scores. The CAT -5 Total Math and Total Reading accounted for 64.06% of the variance in the USMT post-test score. The only perceptual element to act as a significant predictor was the Kinesthetic standard score, accounting for 1.72% of the variance. The study revealed that learning cycle instruction does not appear to be sensitive to different perceptual preferences. Students with different preferences for auditory, visual, and tactile modalities, when learning, seem to benefit equally from learning cycle exposure. Increased use of a double blind for future learning styles research was recommended.
42 CFR 493.835 - Standard; Syphilis serology.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 5 2013-10-01 2013-10-01 false Standard; Syphilis serology. 493.835 Section 493.835 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.835 Standard; Syphilis serology. (a) Failure to attain an overall testing event score...
42 CFR 493.835 - Standard; Syphilis serology.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 5 2014-10-01 2014-10-01 false Standard; Syphilis serology. 493.835 Section 493.835 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.835 Standard; Syphilis serology. (a) Failure to attain an overall testing event score...
42 CFR 493.835 - Standard; Syphilis serology.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 5 2012-10-01 2012-10-01 false Standard; Syphilis serology. 493.835 Section 493.835 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.835 Standard; Syphilis serology. (a) Failure to attain an overall testing event score...
Development of a Comprehensive Osteochondral Allograft MRI Scoring System (OCAMRISS) With Histopathologic, Micro–Computed Tomography, and Biomechanical Validation

PubMed Central

Pallante-Kichura, Andrea L.; Bae, Won C.; Du, Jiang; Statum, Sheronda; Wolfson, Tanya; Gamst, Anthony C.; Cory, Esther; Amiel, David; Bugbee, William D.; Sah, Robert L.; Chung, Christine B.

2014-01-01

Objective: To describe and apply a semiquantitative MRI scoring system for multifeature analysis of cartilage defect repair in the knee by osteochondral allografts and to correlate this scoring system with histopathologic, micro–computed tomography (µCT), and biomechanical reference standards using a goat repair model. Design: Fourteen adult goats had 2 osteochondral allografts implanted into each knee: one in the medial femoral condyle and one in the lateral trochlea. At 12 months, goats were euthanized and MRI was performed. Two blinded radiologists independently rated 9 primary features for each graft, including cartilage signal, fill, edge integration, surface congruity, calcified cartilage integrity, subchondral bone plate congruity, subchondral bone marrow signal, osseous integration, and presence of cystic changes. Four ancillary features of the joint were also evaluated, including opposing cartilage, meniscal tears, synovitis, and fat-pad scarring. Comparison was made with histologic and µCT reference standards as well as biomechanical measures. Interobserver agreement and agreement with reference standards was assessed. Cohen’s κ, Spearman’s correlation, and Kruskal-Wallis tests were used as appropriate. Results: There was substantial agreement (κ > 0.6, P < 0.001) for each MRI feature and with comparison against reference standards, except for cartilage edge integration (κ = 0.6). There was a strong positive correlation between MRI and reference standard scores (ρ = 0.86, P < 0.01). Osteochondral allograft MRI scoring system was sensitive to differences in outcomes between the types of allografts. Conclusions: We have described a comprehensive MRI scoring system for osteochondral allografts and have validated this scoring system with histopathologic and µCT reference standards as well as biomechanical indentation testing. PMID:24489999
Enhancing the Interpretability of the Overall Results of an International Test of English-Language Proficiency

ERIC Educational Resources Information Center

Papageorgiou, Spiros; Morgan, Rick; Becker, Valerie

2015-01-01

The purpose of this study was to enhance the meaning of the scores of an English-language test by developing performance levels and descriptors for reporting overall test performance. The levels and descriptors were intended to accompany the total scale scores of TOEFL Junior® Standard, an international test of English as a second/foreign…
Test Scores and Stereotypes.

ERIC Educational Resources Information Center

Gose, Ben

1995-01-01

A psychologist's research suggests that black and female students may have lower standardized test scores and academic achievement because they have accepted stereotypes concerning their ability. Critics feel the researcher, Claude M. Steele, may be overlooking other factors. Steele has developed a program a Stanford University (California) to…
The relationship between clinical and standardized tests for hand-arm vibration syndrome.

PubMed

Poole, C J M; Mason, H; Harding, A-H

2016-06-01

Standardized laboratory tests are undertaken to assist the diagnosis and staging of hand-arm vibration syndrome (HAVS), but the strength of the relationship between the tests and clinical stages of HAVS is unknown. To assess the relationship between the results of thermal aesthesiometry (TA), vibrotactile (VT) thresholds and cold provocation (CP) tests with the modified Stockholm scales for HAVS and to determine whether the relationship is affected by finger skin temperature. Consecutive records of workers referred to a Tier 5 HAVS assessment centre from 2006 to 2015 were identified. The diagnosis and staging of cases was undertaken from the clinical information contained in the records. Cases with alternative or mixed diagnoses were excluded and staging performed according to the modified Stockholm scale without knowledge of the results of the standardized laboratory tests. A total of 279 cases of HAVS were analysed. Although there was a significant trend for sensorineural (SN) and vascular scores to increase with clinical stage (P < 0.01), there was no significant difference in scores between 2SN early and 2SN late or between 2SN late and 3SN. There was moderate correlation between the TA and VT scores and the clinical SN stages (r = 0.6). This correlation did not change when subjects were divided into those with a finger skin temperature <30 and >30°C. CP scores distributed bimodally and correlated poorly with clinical staging (r = 0.2). Standardized SN tests distinguish between the lower Stockholm stages, but not above 2SN early. This has implications for health surveillance and UK policy. © Crown copyright 2016.
Identifying and Evaluating External Validity Evidence for Passing Scores

ERIC Educational Resources Information Center

Davis-Becker, Susan L.; Buckendahl, Chad W.

2013-01-01

A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…
Avoidance temperament and social-evaluative threat in college students' math performance: a mediation model of math and test anxiety.

PubMed

Liew, Jeffrey; Lench, Heather C; Kao, Grace; Yeh, Yu-Chen; Kwok, Oi-man

2014-01-01

Standardized testing has become a common form of student evaluation with high stakes, and limited research exists on understanding the roles of students' personality traits and social-evaluative threat on their academic performance. This study examined the roles of avoidance temperament (i.e., fear and behavioral inhibition) and evaluative threat (i.e., fear of failure and being viewed as unintelligent) in standardized math test and course grades in college students. Undergraduate students (N=184) from a large public university were assessed on temperamental fear and behavioral inhibition. They were then given 15 minutes to complete a standardized math test. After the test, students provided data on evaluative threat and their math performance (scores on standardized college entrance exam and average grades in college math courses). Results indicate that avoidance temperament was linked to social-evaluative threat and low standardized math test scores. Furthermore, evaluative threat mediated the influence of avoidance temperament on both types of math performance. Results have educational and clinical implications, particularly for students at risk for test anxiety and underperformance. Interventions targeting emotion regulation and stress management skills may help individuals reduce their math and test anxieties.
My Stakes Well Done.

ERIC Educational Resources Information Center

Domenech, Daniel A.

2000-01-01

The question of validity, or how high-stakes tests are being used and interpreted, threatens to undermine the entire standards movement. Joint standards developed by three professional associations say decisions affecting students' life chances should not be based on test scores alone. Objectivity and teaching to tests are real concerns. (MLH)
Clinical predictors of older driver performance on a standardized road test.

PubMed

Classen, Sherrilene; Horgas, Ann; Awadzi, Kezia; Messinger-Rapport, Barbara; Shechtman, Orit; Joo, Yongsung

2008-10-01

To determine the relationship between clinical variables (demographics, cognitive testing, comorbidities, and medications) and failing a standardized road test in older adults. Analysis of on-the road studies performed in optimal weather conditions, between January 1, 2005, and May 1, 2007. The standardized testing was held at the National Older Driver Research and Training Center (NODRTC), Florida, and included 127 adults aged 65 and older with current driver licenses, recruited by advertisement from the Gainesville, Florida, community. Measurements consist of demographics, self-reported medications and medical conditions, cognitive testing including Trail Making Part B, global rating score (pass/fail), and driver maneuver score (0-273, with 273 indicating perfect driving or zero errors). A total of 127 older adults completed the protocol. Mean age was 74.8 years (SD = 6.3); 46.5% females. Mean time for Trail Making Part B was 114.3 seconds (SD of 83). Among the 127 drivers, the mean Sum of Maneuvers Score was 238.9 (SD of 25.0) and 24 (19%) failed the driver test. Odds ratio estimates for failing the test included advanced age (6.7, 95% CI 2.2 to 19.8), presence of a neurological disease (2.8, 95% CI 1.2 to 6.5), and prolonged time to complete the Trail Making Part B cognitive test (2.5, 95% CI 1.0 to 5.9). Conversely, odds ratio estimates lowering the risk of failure included taking a non-diabetic hormonal medications (e.g., thyroid and estrogen drugs; 0.3, 95% CI .09 to 0.7) and having a musculoskeletal diagnosis (0.3, 95% CI .1 to 0.7). To our knowledge, this is the first study to examine the medical predictors of failing a standardized road test. Advanced age and prolonged time on Trail Making Part B were the two major predictors of test failure and a lower Sum of Maneuvers Score. Our study also found that having a neurological diagnosis (primarily cerebrovascular and Parkinson's disease) predicted test failure. Medications from neurological class also predicted a lower Sum of Maneuvers Score. Further study needs to be done to explain the apparent protective effect of musculoskeletal conditions and hormonal medications.
Standardized Tests and School Curricula.

ERIC Educational Resources Information Center

Mehrens, William A.; Green, Donald Ross

This paper discusses the relationship of the content of nationally standardized and normed achievement tests and that of local school curricula and the effect that relationship has on the meanings and uses of the test scores. The following questions are considered: (1) whether tests have to match what is taught to be useful; (2) whether it is fair…
Standard Setting in Specific-Purpose Language Testing: What Can a Qualitative Study Add?

ERIC Educational Resources Information Center

Manias, Elizabeth; McNamara, Tim

2016-01-01

This paper explores the views of nursing and medical domain experts in considering the standards for a specific-purpose English language screening test, the Occupational English Test (OET), for professional registration for immigrant health professionals. Since individuals who score performances in the test setting are often language experts…
How Have State Level Standards-Based Tests Related to Norm-Referenced Tests in Alaska?.

ERIC Educational Resources Information Center

Fenton, Ray

This overview of the Alaska system for test development, scoring, and reporting explored differences and similarities between norm-referenced and standards-based tests. The current Alaska testing program is based on legislation passed in 1997 and 1998, and is designed to meet the requirements of the federal No Child Left Behind Legislation. In…
Beyond Testing: Seven Assessments of Students and Schools More Effective than Standardized Tests

ERIC Educational Resources Information Center

Meier, Deborah; Knoester, Matthew

2017-01-01

The authors of the book argue that a fundamentally complex problem--how to assess the knowledge of a child--cannot be reduced to a simple test score. "Beyond Testing" describes seven forms of assessment that are more effective than standardized test results: (1) student self-assessments, (2) direct teacher observations of students and…

Findings from the 2012 West Virginia Online Writing Scoring Comparability Study

ERIC Educational Resources Information Center

Hixson, Nate; Rhudy, Vaughn

2013-01-01

Student responses to the West Virginia Educational Standards Test (WESTEST) 2 Online Writing Assessment are scored by a computer-scoring engine. The scoring method is not widely understood among educators, and there exists a misperception that it is not comparable to hand scoring. To address these issues, the West Virginia Department of Education…
The Effects of Coaching on Standardized Admission Examinations. Staff Memorandum of the Boston Regional Office of the Federal Trade Commission.

ERIC Educational Resources Information Center

Federal Trade Commission, Washington, DC. Bureau of Consumer Protection.

A non-experimental design was used to determine if scores of students enrolled in specified major coaching schools were significantly higher than scores of comparable uncoached groups. Score increases at two Scholastic Aptitude Test (SAT) coaching schools and Law School Admission Test (LSAT) schools were compared. Over 1,400 SAT examinees and…
How Parents Can Help Kids Improve Test Scores: Taking the Stakes out of Literacy Testing

ERIC Educational Resources Information Center

Schneider, Steven

2006-01-01

In order to meet the goals of No Child Left Behind, standardized testing is preeminent as the sole indicator determining whether states all across America demonstrate adequate yearly progress regarding the improvement of student achievement in literacy education. This book will help teachers and parents raise children's scores on standardized…
Automated smartphone audiometry: Validation of a word recognition test app.

PubMed

Dewyer, Nicholas A; Jiradejvong, Patpong; Henderson Sabes, Jennifer; Limb, Charles J

2018-03-01

Develop and validate an automated smartphone word recognition test. Cross-sectional case-control diagnostic test comparison. An automated word recognition test was developed as an app for a smartphone with earphones. English-speaking adults with recent audiograms and various levels of hearing loss were recruited from an audiology clinic and were administered the smartphone word recognition test. Word recognition scores determined by the smartphone app and the gold standard speech audiometry test performed by an audiologist were compared. Test scores for 37 ears were analyzed. Word recognition scores determined by the smartphone app and audiologist testing were in agreement, with 86% of the data points within a clinically acceptable margin of error and a linear correlation value between test scores of 0.89. The WordRec automated smartphone app accurately determines word recognition scores. 3b. Laryngoscope, 128:707-712, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Outcomes of a pilates-based intervention for individuals with lateral epicondylosis: A pilot study.

PubMed

Dale, Lucinda M; Mikuski, Connie; Miller, Jacqueline

2015-01-01

Core stability and flexibility, features of Pilates exercise, can reduce loads to the upper extremities. Reducing loads is essential to improve symptoms for individuals with lateral epicondylosis. Although Pilates exercise has gained popularity in healthy populations, it has not been studied for individuals with lateral epicondylosis. The purpose of this study was to determine if adding Pilates-based intervention to standard occupational therapy intervention improved outcomes as measured by the Patient-Rated Tennis Elbow Evaluation (PRTEE) more than standard intervention for individuals with lateral epicondylosis. Participants (N= 17) were randomized to the standard intervention group or Pilates-based intervention group. All participants received standard intervention. The Pilates-based intervention group additionally completed abdominal strengthening, postural correction, and flexibility. For both groups, paired t-tests showed significantly improved PRTEE scores, 38.1 for the Pilates-based intervention group, and 22.9 for the standard intervention group. Paired t-test showed significantly improved provocative grip strength and pain for both groups. Independent t-tests showed no significant difference between groups in improved scores of PRTEE, pain, and provocative grip. Although the Pilates-based intervention group showed greater improvement in PRTEE outcome, provocative grip, and pain, scores were not significantly better than those of the standard intervention group, warranting further research.
Perfectionism and Social Anxiety: Rethinking the Role of High Standards

PubMed Central

Shumaker, Erik A.; Rodebaugh, Thomas L.

2009-01-01

Some researchers contend that high standards are an essential component of social anxiety. We tested this hypothesis in two independent samples. The consistent finding across samples was that higher scores on measures of high standards from two perfectionism scales predicted lower scores for social anxiety measures. These findings suggest lower, not higher, standards are involved in social anxiety, but more research is needed to clarify the implications of perfectionism, particularly the maladaptive form, in the context of social anxiety. PMID:19447382
Building an Evaluation Scale using Item Response Theory.

PubMed

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory

PubMed Central

Lalor, John P.; Wu, Hao; Yu, Hong

2016-01-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
A job-related fitness test for the Dutch police.

PubMed

Strating, M; Bakker, R H; Dijkstra, G J; Lemmink, K A P M; Groothoff, J W

2010-06-01

The variety of tasks that characterize police work highlights the importance of being in good physical condition. To take a first step at standardizing the administration of a job-related test to assess a person's ability to perform the physical demands of the core tasks of police work. The principal research questions were: are test scores related to gender, age and function and are test scores related to body mass index (BMI) and the number of hours of physical exercise? Data of 6999 police officers, geographically spread over all parts of The Netherlands, who completed a physical competence test over a 1 year period were analysed. Women performed the test significantly more slowly than men. The mean test score was also related to age; the older a person the longer it took to complete the test. A higher BMI was associated with less hours of body exercise a week and a slower test performance, both in women and men. The differences in individual test scores, based on gender and age, have implications for future strategy within the police force. From a viewpoint of 'same job, same standard' one has to accept that test-score differences may lead to the exclusion of certain staff. However, from a viewpoint of 'diversity as a business issue', one may have to accept that on average, both female and older police officers are physically less tailored to their jobs than their male and younger colleagues.
Prior experiences associated with residents' scores on a communication and interpersonal skill OSCE.

PubMed

Yudkowsky, Rachel; Downing, Steven M; Ommert, Dennis

2006-09-01

This exploratory study investigated whether prior task experience and comfort correlate with scores on an assessment of patient-centered communication. A six-station standardized patient exam assessed patient-centered communication of 79 PGY2-3 residents in Internal Medicine and Family Medicine. A survey provided information on prior experiences. t-tests, correlations, and multi-factorial ANOVA explored relationship between scores and experiences. Experience with a task predicted comfort but did not predict communication scores. Comfort was moderately correlated with communication scores for some tasks; residents who were less comfortable were indeed less skilled, but greater comfort did not predict higher scores. Female gender and medical school experiences with standardized patients along with training in patient-centered interviewing were associated with higher scores. Residents without standardized patient experiences in medical school were almost five times more likely to be rejected by patients. Task experience alone does not guarantee better communication, and may instill a false sense of confidence. Experiences with standardized patients during medical school, especially in combination with interviewing courses, may provide an element of "deliberate practice" and have a long-term impact on communication skills. The combination of didactic courses and practice with standardized patients may promote a patient-centered approach.
The woodcock reading mastery test: impact of normative changes.

PubMed

Pae, Hye Kyeong; Wise, Justin C; Cirino, Paul T; Sevcik, Rose A; Lovett, Maureen W; Wolf, Maryanne; Morris, Robin D

2005-09-01

This study examined the magnitude of differences in standard scores, convergent validity, and concurrent validity when an individual's performance was gauged using the revised and the normative update (Woodcock, 1998) editions of the Woodcock Reading Mastery Test in which the actual test items remained identical but norms have been updated. From three metropolitan areas, 899 first to third grade students referred by their teachers for a reading intervention program participated. Results showed the inverse Flynn effect, indicating systematic inflation averaging 5 to 9 standard score points, regardless of gender, IQ, city site, or ethnicity, when calculated using the updated norms. Inflation was greater at lower raw score levels. Implications for using the updated norms for identifying children with reading disabilities and changing norms during an ongoing study are discussed.
The Effect of School Poverty on Racial Gaps in Tests Scores: The Case of the Minnesota Basic Standards Tests

ERIC Educational Resources Information Center

Myers, Samuel L.; Kim, Hyeoneui; Mandala, Cheryl

2004-01-01

A data from 1996,1998 and 1999 Minnesota comprehensive statewide testing on eight graders is used to analyze whether African American students perform worse than the white students who attend the poverty schools. The analyses conclude that African American-White test score gap is attributed more to the racial discriminations and racial treatments…
EDUCATION AND PSYCHOLOGICAL TEST SCORES

PubMed Central

Pershad, Dwarka; Verma, S. K.

1980-01-01

Education, a long neglected variable affecting psychological test score, is in search of reemphasis. Some evidence for this has accumulated on the psychological tests constructed and standardized here at the department of Psychiatry, P.G.I., Chandigarh. Tentative norms prepared education wise on WAIS-Verbal section, PGI-Memory Scale, Proverb and Similarity Tests, Psychoticism Questionnaire, and PGI MQN 2, for adults, in the age range of 16-50, are reported. The results showed marked difference in the mean scores of different educational categories and thus stressed the need for reporting norms separately for different educational levels. PMID:22064617
The effect of constructivist teaching strategies on science test scores of middle school students

NASA Astrophysics Data System (ADS)

Vaca, James L., Jr.

International studies show that the United States is lagging behind other industrialized countries in science proficiency. The studies revealed how American students showed little significant gain on standardized tests in science between 1995 and 2005. Little information is available regarding how reform in American teaching strategies in science could improve student performance on standardized testing. The purpose of this quasi-experimental quantitative study using a pretest/posttest control group design was to examine how the use of a hands-on, constructivist teaching approach with low achieving eighth grade science students affected student achievement on the 2007 Ohio Eighth Grade Science Achievement Test posttest (N = 76). The research question asked how using constructivist teaching strategies in the science classroom affected student performance on standardized tests. Two independent samples of 38 students each consisting of low achieving science students as identified by seventh grade science scores and scores on the Ohio Eighth Grade Science Half-Length Practice Test pretest were used. Four comparisons were made between the control group receiving traditional classroom instruction and the experimental group receiving constructivist instruction including: (a) pretest/posttest standard comparison, (b) comparison of the number of students who passed the posttest, (c) comparison of the six standards covered on the posttest, (d) posttest's sample means comparison. A Mann-Whitney U Test revealed that there was no significant difference between the independent sample distributions for the control group and the experimental group. These findings contribute to positive social change by investigating science teaching strategies that could be used in eighth grade science classes to improve student achievement in science.
An Empirical Investigation of Change in MCAT Scores upon Retest.

ERIC Educational Resources Information Center

Hynes, Kevin; Givner, Nathaniel

1980-01-01

An investigation of Medical College Admission Test (MCAT) retest scores indicates that limited retest improvement may result when initial scores are fairly low or below what might be predicted based on grade point averages. However, when initial scores approach the national, standardized MCAT mean, or are above what might be predicted, significant…
Graphical method for comparative statistical study of vaccine potency tests.

PubMed

Pay, T W; Hingley, P J

1984-03-01

Producers and consumers are interested in some of the intrinsic characteristics of vaccine potency assays for the comparative evaluation of suitable experimental design. A graphical method is developed which represents the precision of test results, the sensitivity of such results to changes in dosage, and the relevance of the results in the way they reflect the protection afforded in the host species. The graphs can be constructed from Producer's scores and Consumer's scores on each of the scales of test score, antigen dose and probability of protection against disease. A method for calculating these scores is suggested and illustrated for single and multiple component vaccines, for tests which do or do not employ a standard reference preparation, and for tests which employ quantitative or quantal systems of scoring.
NAEP Scores Put Spotlight on Standards: Flat Math Results Also Spur Calls for Teaching Reforms

ERIC Educational Resources Information Center

Cavanagh, Sean

2009-01-01

Fourth grade math scores stagnated for the first time in two decades on a prominent nationwide test, prompting calls for new efforts to improve teacher content knowledge and stirring discussion of the potential benefits of setting more-uniform academic standards across states. The results on the National Assessment of Educational Progress,…
Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

ERIC Educational Resources Information Center

Zu, Jiyun; Yuan, Ke-Hai

2012-01-01

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Brief Report: Gum Chewing Affects Standardized Math Scores in Adolescents

ERIC Educational Resources Information Center

Johnston, Craig A.; Tyler, Chermaine; Stansberry, Sandra A.; Moreno, Jennette P.; Foreyt, John P.

2012-01-01

Gum chewing has been shown to improve cognitive performance in adults; however, gum chewing has not been evaluated in children. This study examined the effects of gum chewing on standardized test scores and class grades of eighth grade math students. Math classes were randomized to a gum chewing (GC) condition that provided students with gum…
Teacher Empathy and Its Relationship to the Standardized Test Scores of Diverse Secondary English Students

ERIC Educational Resources Information Center

Bostic, Timothy B.

2014-01-01

The purpose of this research study was to ascertain whether there is a relationship between teachers' cognitive role taking aspect of empathy and the Virginia Standards of Learning (VSOL), English/Reading scores of their students. A correlational research design using hierarchical multiple regression was used to look for this relationship. In…

Critical Thinking: More than Test Scores

ERIC Educational Resources Information Center

Smith, Vernon G.; Szymanski, Antonia

2013-01-01

This article is for practicing or aspiring school administrators. The demand for excellence in public education has lead to an emphasis on standardized test scores. This article explores the development of a professional enhancement program designed to prepare teachers to teach higher order thinking skills. Higher order thinking is the primary…
Opportunity to Learn: Investigating Possible Predictors for Pre-Course "Test Of Astronomy STandards" TOAST Scores

ERIC Educational Resources Information Center

Berryhill, Katie J.; Slater, Timothy F.

2017-01-01

As discipline-based astronomy education researchers become more interested in experimentally testing innovative teaching strategies to enhance learning in undergraduate introductory astronomy survey courses ("ASTRO 101"), scholars are placing increased attention toward better understanding factors impacting student gain scores on the…
The Michigan Context and Performance Report Card: Public Elementary & Middle Schools, 2013

ERIC Educational Resources Information Center

Spalding, Audrey

2013-01-01

The Michigan Context and Performance Report Card measures school performance by adjusting standardized test scores to account for student background. Comparing schools using unadjusted test scores ignores the significant relationship between academic performance and student socioeconomic background--a dynamic outside a school's control. The…
The Michigan Public High School Context and Performance Report Card

ERIC Educational Resources Information Center

Van Beek, Michael; Bowen, Daniel; Mills, Jonathan

2012-01-01

Assessing a high school's effectiveness is not straightforward. Comparing a school's standardized test scores to those of other schools is one approach to measuring effectiveness, but a major objection to this method is that students' test scores tend to be related to students' "socioeconomic" status--family household income, for…
Standards and Criteria. Paper #10 in Occasional Paper Series.

ERIC Educational Resources Information Center

Glass, Gene V.

The logical and psychological bases for setting cutting scores for criterion-referenced tests are examined; they are found to be intrinsically arbitrary and are often examples of misdirected precision and axiomatization. The term, criterion referenced, originally referred to a technique for making test scores meaningful by controlling the test…
Rising Stars: High School's Change Process Produces Higher Test Scores.

ERIC Educational Resources Information Center

McCown, Claire; Runnebaum, Robert

2001-01-01

Presents Bishop Ward High School (Kansas) as a case study that has seen great improvements in standardized testing results by changing its approach. States that realignment of curriculum, adjusting instructional strategies, and accommodating students with special needs are important aspects of raising assessment scores in high schools. (CJW)
Student assessment by objective structured examination in a neurology clerkship

PubMed Central

Adesoye, Taiwo; Smith, Sandy; Blood, Angela; Brorson, James R.

2012-01-01

Objectives: We evaluated the reliability and predictive ability of an objective structured clinical examination (OSCE) in the assessment of medical students at the completion of a neurology clerkship. Methods: We analyzed data from 195 third-year medical students who took the OSCE. For each student, the OSCE consisted of 2 standardized patient encounters. The scores obtained from each encounter were compared. Faculty clinical evaluations of each student for 2 clinical inpatient rotations were also compared. Hierarchical regression analysis was applied to test the ability of the averaged OSCE scores to predict standardized written examination scores and composite clinical scores. Results: Students' OSCE scores from the 2 standardized patient encounters were significantly correlated with each other (r = 0.347, p < 0.001), and the scores for all students were normally distributed. In contrast, students' faculty clinical evaluation scores from 2 different clinical inpatient rotations were uncorrelated, and scores were skewed toward the highest ratings. After accounting for clerkship order, better OSCE scores were predictive of better National Board of Medical Examiners standardized examination scores (R2Δ = 0.131, p < 0.001) and of better faculty clinical scores (R2Δ = 0.078, p < 0.001). Conclusions: Student assessment by an OSCE provides a reliable and predictive objective assessment of clinical performance in a neurology clerkship. PMID:22855865
Military Personnel: Army Needs to Focus on Cost-Effective Use of Financial Incentives and Quality Standards in Managing Force Growth

DTIC Science & Technology

2009-05-01

diplomas and who score in the upper half on the Armed Forces Qualification Test. The Army implemented some new programs to increase the market of...quality of its enlisted personnel, we analyzed data from OSD on educational credentials and aptitude test scores for these personnel, and we collected...recruits to have high-school diplomas and at least 60 percent to have scores in the upper half on the Armed Forces Qualification Test (AFQT). In fiscal
The Impact of Test-Taking Behaviors on WISC-IV Spanish Domain Scores in Its Standardization Sample

ERIC Educational Resources Information Center

Oakland, Thomas; Callueng, Carmelo; Harris, Josette G.

2012-01-01

The use of individually administered measures of intelligence and other cognitive abilities requires clinicians to monitor a client's test behaviors, given the need for a client to be engaged fully, attentive, and cooperative during the testing process. The use of standardized and norm-referenced measures of test-taking behaviors facilitates this…
Do classroom ventilation rates in California elementary schools influence standardized test scores? Results from a prospective study.

PubMed

Mendell, M J; Eliseeva, E A; Davies, M M; Lobscheid, A

2016-08-01

Limited evidence has associated lower ventilation rates (VRs) in schools with reduced student learning or achievement. We analyzed longitudinal data collected over two school years from 150 classrooms in 28 schools within three California school districts. We estimated daily classroom VRs from real-time indoor carbon dioxide measured by web-connected sensors. School districts provided individual-level scores on standard tests in Math and English, and classroom-level demographic data. Analyses assessing learning effects used two VR metrics: average VRs for 30 days prior to tests, and proportion of prior daily VRs above specified thresholds during the year. We estimated relationships between scores and VR metrics in multivariate models with generalized estimating equations. All school districts had median school-year VRs below the California VR standard. Most models showed some positive associations of VRs with test scores; however, estimates varied in magnitude and few 95% confidence intervals excluded the null. Combined-district models estimated statistically significant increases of 0.6 points (P = 0.01) on English tests for each 10% increase in prior 30-day VRs. Estimated increases in Math were of similar magnitude but not statistically significant. Findings suggest potential small positive associations between classroom VRs and learning. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
Clinical decision support tools: personal digital assistant versus online dietary supplement databases.

PubMed

Clauson, Kevin A; Polen, Hyla H; Peak, Amy S; Marsh, Wallace A; DiScala, Sandra L

2008-11-01

Clinical decision support tools (CDSTs) on personal digital assistants (PDAs) and online databases assist healthcare practitioners who make decisions about dietary supplements. To assess and compare the content of PDA dietary supplement databases and their online counterparts used as CDSTs. A total of 102 question-and-answer pairs were developed within 10 weighted categories of the most clinically relevant aspects of dietary supplement therapy. PDA versions of AltMedDex, Lexi-Natural, Natural Medicines Comprehensive Database, and Natural Standard and their online counterparts were assessed by scope (percent of correct answers present), completeness (3-point scale), ease of use, and a composite score integrating all 3 criteria. Descriptive statistics and inferential statistics, including a chi(2) test, Scheffé's multiple comparison test, McNemar's test, and the Wilcoxon signed rank test were used to analyze data. The scope scores for PDA databases were: Natural Medicines Comprehensive Database 84.3%, Natural Standard 58.8%, Lexi-Natural 50.0%, and AltMedDex 36.3%, with Natural Medicines Comprehensive Database statistically superior (p < 0.01). Completeness scores were: Natural Medicines Comprehensive Database 78.4%, Natural Standard 51.0%, Lexi-Natural 43.5%, and AltMedDex 29.7%. Lexi-Natural was superior in ease of use (p < 0.01). Composite scores for PDA databases were: Natural Medicines Comprehensive Database 79.3, Natural Standard 53.0, Lexi-Natural 48.0, and AltMedDex 32.5, with Natural Medicines Comprehensive Database superior (p < 0.01). There was no difference between the scope for PDA and online database pairs with Lexi-Natural (50.0% and 53.9%, respectively) or Natural Medicines Comprehensive Database (84.3% and 84.3%, respectively) (p > 0.05), whereas differences existed for AltMedDex (36.3% vs 74.5%, respectively) and Natural Standard (58.8% vs 80.4%, respectively) (p < 0.01). For composite scores, AltMedDex and Natural Standard online were better than their PDA counterparts (p < 0.01). Natural Medicines Comprehensive Database achieved significantly higher scope, completeness, and composite scores compared with other dietary supplement PDA CDSTs in this study. There was no difference between the PDA and online databases for Lexi-Natural and Natural Medicines Comprehensive Database, whereas online versions of AltMedDex and Natural Standard were significantly better than their PDA counterparts.
Standard of practice and Flynn Effect testimony in death penalty cases.

PubMed

Gresham, Frank M; Reschly, Daniel J

2011-06-01

The Flynn Effect is a well-established psychometric fact documenting substantial increases in measured intelligence test performance over time. Flynn's (1984) review of the literature established that Americans gain approximately 0.3 points per year or 3 points per decade in measured intelligence. The accurate assessment and interpretation of intellectual functioning becomes critical in death penalty cases that seek to determine whether an individual meets the criteria for intellectual disability and thereby is ineligible for execution under Atkins v. Virginia (2002) . We reviewed the literature on the Flynn Effect and demonstrated how failure to adjust intelligence test scores based on this phenomenon invalidates test scores and may be in violation of the Standards for Educational and Psychological Testing as well as the "Ethical Principles for Psychologists and Code of Conduct." Application of the Flynn Effect and score adjustments for obsolete norms clearly is supported by science and should be implemented by practicing psychologists.
Opportunity to learn: Investigating possible predictors for pre-course Test Of Astronomy STandards TOAST scores

NASA Astrophysics Data System (ADS)

Berryhill, Katie J.

As astronomy education researchers become more interested in experimentally testing innovative teaching strategies to enhance learning in introductory astronomy survey courses ("ASTRO 101"), scholars are placing increased attention toward better understanding factors impacting student gain scores on the widely used Test Of Astronomy STandards (TOAST). Usually used in a pre-test and post-test research design, one might naturally assume that the pre-course differences observed between high- and low-scoring college students might be due in large part to their pre-existing motivation, interest, experience in science, and attitudes about astronomy. To explore this notion, 11 non-science majoring undergraduates taking ASTRO 101 at west coast community colleges were interviewed in the first few weeks of the course to better understand students' pre-existing affect toward learning astronomy with an eye toward predicting student success. In answering this question, we hope to contribute to our understanding of the incoming knowledge of students taking undergraduate introductory astronomy classes, but also gain insight into how faculty can best meet those students' needs and assist them in achieving success. Perhaps surprisingly, there was only weak correlation between students' motivation toward learning astronomy and their pre-test scores. Instead, the most fruitful predictor of TOAST pre-test scores was the quantity of pre-existing, informal, self-directed astronomy learning experiences.
The Use of Quality Control and Data Mining Techniques for Monitoring Scaled Scores: An Overview. Research Report. ETS RR-12-20

ERIC Educational Resources Information Center

von Davier, Alina A.

2012-01-01

Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…
Turkish version of the modified Constant-Murley score and standardized test protocol: reliability and validity.

PubMed

Çelik, Derya

2016-01-01

The Constant-Murley score (CMS) is widely used to evaluate disabilities associated with shoulder injuries, but it has been criticized for relying on imprecise terminology and a lack of standardized methodology. A modified guideline, therefore, was published in 2008 with several recommendations. This new version has not yet been translated or culturally adapted for Turkish-speaking populations. The purpose of this study was to translate and cross-culturally adapt the modified CMS and its test protocol, as well as define and measure its reliability and validity. The modified CMS was translated into Turkish, consistent with published methodological guidelines. The measurement properties of the Turkish version of the modified CMS were tested in 30 patients (12 males, 18 females; mean age: 59.5±13.5 years) with a variety of shoulder pathologies. Intraclass correlation coefficients (ICC) were used to estimate test-retest reliability. Construct validity was analyzed with the Turkish version of the American Shoulder and Elbow Surgeons (ASES) Standardized Shoulder Assessment Form and Short-Form Health Survey (SF-12). No difficulties were found in the translation process. The Turkish version of the modified CMS showed excellent test-retest reliability (ICC=0.86). The correlation coefficients between the Turkish version of the modified CMS and the ASES, SF-12-physical component score, and SF-12 mental component scores were found to be 0.48, 0.35, and 0.05, respectively. No floor or ceiling effects were found. The translation and cultural adaptation of the modified CMS and its standardized test protocol into Turkish were successful. The Turkish version of the modified CMS has sufficient reliability and validity to measure a variety of shoulder disorders for Turkish-speaking individuals.
Ideal Standards, Acceptance, and Relationship Satisfaction: Latitudes of Differential Effects

PubMed Central

Buyukcan-Tetik, Asuman; Campbell, Lorne; Finkenauer, Catrin; Karremans, Johan C.; Kappen, Gesa

2017-01-01

We examined whether the relations of consistency between ideal standards and perceptions of a current romantic partner with partner acceptance and relationship satisfaction level off, or decelerate, above a threshold. We tested our hypothesis using a 3-year longitudinal data set collected from heterosexual newlywed couples. We used two indicators of consistency: pattern correspondence (within-person correlation between ideal standards and perceived partner ratings) and mean-level match (difference between ideal standards score and perceived partner score). Our results revealed that pattern correspondence had no relation with partner acceptance, but a positive linear/exponential association with relationship satisfaction. Mean-level match had a significant positive association with actor’s acceptance and relationship satisfaction up to the point where perceived partner score equaled ideal standards score. Partner effects did not show a consistent pattern. The results suggest that the consistency between ideal standards and perceived partner attributes has a non-linear association with acceptance and relationship satisfaction, although the results were more conclusive for mean-level match. PMID:29033876
An Improved Correction for Range Restricted Correlations Under Extreme, Monotonic Quadratic Nonlinearity and Heteroscedasticity.

PubMed

Culpepper, Steven Andrew

2016-06-01

Standardized tests are frequently used for selection decisions, and the validation of test scores remains an important area of research. This paper builds upon prior literature about the effect of nonlinearity and heteroscedasticity on the accuracy of standard formulas for correcting correlations in restricted samples. Existing formulas for direct range restriction require three assumptions: (1) the criterion variable is missing at random; (2) a linear relationship between independent and dependent variables; and (3) constant error variance or homoscedasticity. The results in this paper demonstrate that the standard approach for correcting restricted correlations is severely biased in cases of extreme monotone quadratic nonlinearity and heteroscedasticity. This paper offers at least three significant contributions to the existing literature. First, a method from the econometrics literature is adapted to provide more accurate estimates of unrestricted correlations. Second, derivations establish bounds on the degree of bias attributed to quadratic functions under the assumption of a monotonic relationship between test scores and criterion measurements. New results are presented on the bias associated with using the standard range restriction correction formula, and the results show that the standard correction formula yields estimates of unrestricted correlations that deviate by as much as 0.2 for high to moderate selectivity. Third, Monte Carlo simulation results demonstrate that the new procedure for correcting restricted correlations provides more accurate estimates in the presence of quadratic and heteroscedastic test score and criterion relationships.
The Analysis of a Teacher Test Preparation Tutorial to Learner Test Scores: An Action Research Study

ERIC Educational Resources Information Center

Mild, Toni L. Hittle

2014-01-01

Many Pennsylvania colleges and universities require that teacher candidates pass a standardized assessment in order to gain formal entry in to their education programs. Standardized tests are also required for Level I teacher certification within Pennsylvania. The initial assessment required of all Pennsylvania preservice teachers for…
Test-Based Accountability: The Promise and the Perils

ERIC Educational Resources Information Center

Loveless, Tom

2005-01-01

In the early 1990s, states began establishing standards in academic subjects backed by test-based accountability systems to see that the standards were met. Incentives were implemented for schools and students based on pupil test scores. These early accountability systems paved the way for passage of landmark federal legislation, the No Child Left…
Gross Motor Development in Children Aged 3-5 Years, United States 2012.

PubMed

Kit, Brian K; Akinbami, Lara J; Isfahani, Neda Sarafrazi; Ulrich, Dale A

2017-07-01

Objective Gross motor development in early childhood is important in fostering greater interaction with the environment. The purpose of this study is to describe gross motor skills among US children aged 3-5 years using the Test of Gross Motor Development (TGMD-2). Methods We used 2012 NHANES National Youth Fitness Survey (NNYFS) data, which included TGMD-2 scores obtained according to an established protocol. Outcome measures included locomotor and object control raw and age-standardized scores. Means and standard errors were calculated for demographic and weight status with SUDAAN using sample weights to calculate nationally representative estimates, and survey design variables to account for the complex sampling methods. Results The sample included 339 children aged 3-5 years. As expected, locomotor and object control raw scores increased with age. Overall mean standardized scores for locomotor and object control were similar to the mean value previously determined using a normative sample. Girls had a higher mean locomotor, but not mean object control, standardized score than boys (p < 0.05). However, the mean locomotor standardized scores for both boys and girls fell into the range categorized as "average." There were no other differences by age, race/Hispanic origin, weight status, or income in either of the subtest standardized scores (p > 0.05). Conclusions In a nationally representative sample of US children aged 3-5 years, TGMD-2 mean locomotor and object control standardized scores were similar to the established mean. These results suggest that standardized gross motor development among young children generally did not differ by demographic or weight status.

Is the standard SF-12 health survey valid and equivalent for a Chinese population?

PubMed

Lam, Cindy L K; Tse, Eileen Y Y; Gandek, Barbara

2005-03-01

Chinese is the world's largest ethnic group but few health-related quality of life (HRQoL) measures have been tested on them. The aim of this study was to determine if the standard SF-12 was valid and equivalent for a Chinese population. The SF-36 data of 2410 Chinese adults randomly selected from the general population of Hong Kong (HK) were analysed. The Chinese (HK) specific SF-12 items and scoring algorithm were derived from the HK Chinese population data by multiple regressions. The SF-36 PCS and MCS scores were used as criteria to assess the content and criterion validity of the SF-12. The standard and Chinese (HK) specific SF-12 PCS and MCS scores were compared for equivalence. The standard SF-12 explained 82% and 89% of the variance of the SF-36 PCS and MCS scores, respectively, and the effect size differences between the standard SF-36 and SF-12 scores were less than 0.3. Six of the Chinese (HK) specific SF-12 items were different from those of the standard SF-12, but the effect size differences between the Chinese (HK) specific and standard SF-12 scores were mostly less than 0.3. The standard SF-12 was valid and equivalent for the Chinese, which would enable more Chinese to be included in clinical trials that measure HRQoL.
The Science Camp Model based on maker movement and tinkering activity for developing concept of electricity in middle school students to meet standard evaluation of ordinary national educational test (O-NET)

NASA Astrophysics Data System (ADS)

Chamrat, Suthida

2018-01-01

The standard evaluation of Thai education relies excessively on the Ordinary National Educational Test, widely known as O-NET. However, a focus on O-Net results can lead to unsatisfactory teaching practices, especially in science subjects. Among the negative consequences, is that schools frequently engage in "cramming" practices in order to elevate their O-NET scores. Higher education, which is committed to generating and applying knowledge by socially engaged scholars, needs to take account of this situation. This research article portrays the collaboration between the faculty of education at Chiang Mai University and an educational service area to develop the model of science camp. The activities designed for the Science Camp Model were based on the Tinkering and Maker Movement. Specifically, the Science Camp Model was designed to enhance the conceptualization of electricity for Middle School Students in order to meet the standard evaluation of the Ordinary National Educational Test. The hands-on activities consisted of 5 modules which were simple electrical circuits, paper circuits, electrical measurement roleplay motor art robots and Force from Motor. The data were collected by 11 items of Electricity Socratic-based Test adapted from cumulative published O-NET tests focused on the concept of electricity concept. The qualitative data were also collected virtually via Flinga.com. The results indicated that students after participating in 5modules of science camp based on the Maker Movement and tinkering activity developed average percentage of test scores from 33.64 to 65.45. Gain score analysis using dependent t-test compared pretest and posttest mean scores. The p value was found to be statistically significant (less than 0.001). The posttest had a considerably higher mean score compared with the pretest. Qualitative data also indicated that students could explain the main concepts of electrical circuits, and the transformation of electrical energy to mechanical energy. The schools were satisfied, and expressed greater confidence in the Science Camp Model as an alternative way to improve Standard Evaluation of Ordinary National Educational Test.
Visuomotor Performance in KCNJ11-Related Neonatal Diabetes Is Impaired in Children With DEND-Associated Mutations and May Be Improved by Early Treatment With Sulfonylureas

PubMed Central

Shah, Reshma P.; Spruyt, Karen; Kragie, Brigette C.; Greeley, Siri Atma W.; Msall, Michael E.

2012-01-01

OBJECTIVE To assess performance on an age-standardized neuromotor coordination task among sulfonylurea-treated KCNJ11-related neonatal diabetic patients. RESEARCH DESIGN AND METHODS Nineteen children carrying KCNJ11 mutations associated with isolated diabetes (R201H; n = 8), diabetes with neurodevelopmental impairment (V59M or V59A [V59M/A]; n = 8), or diabetes not consistently associated with neurodevelopmental disability (Y330C, E322K, or R201C; n = 3) were studied using the age-standardized Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI). RESULTS Although R201H subjects tested in the normal range (median standard score = 107), children with V59M/A mutations had significantly lower than expected VMI standard scores (median = 49). The scores for all three groups were significantly different from each other (P = 0.0017). The age of sulfonylurea initiation was inversely correlated with VMI scores in the V59M/A group (P < 0.05). CONCLUSIONS Neurodevelopmental disability in KCNJ11-related diabetes includes visuomotor problems that may be ameliorated by early sulfonylurea treatment. Comprehensive longitudinal assessment on larger samples will be imperative. PMID:22855734
Computerized Maze Navigation and On-Road Performance by Drivers With Dementia

PubMed Central

Ott, Brian R.; Festa, Elena K.; Amick, Melissa M.; Grace, Janet; Davis, Jennifer D.; Heindel, William C.

2012-01-01

This study examined the ability of computerized maze test performance to predict the road test performance of cognitively impaired and normal older drivers. The authors examined 133 older drivers, including 65 with probable Alzheimer disease, 23 with possible Alzheimer disease, and 45 control subjects without cognitive impairment. Subjects completed 5 computerized maze tasks employing a touch screen and pointer as well as a battery of standard neuropsychological tests. Parameters measured for mazes included errors, planning time, drawing time, and total time. Within 2 weeks, subjects were examined by a professional driving instructor on a standardized road test modeled after the Washington University Road Test. Road test total score was significantly correlated with total time across the 5 mazes. This maze score was significant for both Alzheimer disease subjects and control subjects. One maze in particular, requiring less than 2 minutes to complete, was highly correlated with driving performance. For the standard neuropsychological tests, highest correlations were seen with Trail Making A (TrailsA) and the Hopkins Verbal Learning Tests Trial 1 (HVLT1). Multiple regression models for road test score using stepwise subtraction of maze and neuropsychological test variables revealed significant independent contributions for total maze time, HVLT1, and TrailsA for the entire group; total maze time and HVLT1 for Alzheimer disease subjects; and TrailsA for normal subjects. As a visual analog of driving, a brief computerized test of maze navigation time compares well to standard neuropsychological tests of psychomotor speed, scanning, attention, and working memory as a predictor of driving performance by persons with early Alzheimer disease and normal elders. Measurement of maze task performance appears to be useful in the assessment of older drivers at risk for hazardous driving. PMID:18287166
College Planning: The Savvy Parent's Guide

ERIC Educational Resources Information Center

Berger, Sandra

2008-01-01

Because college admission has become much more competitive, parents and students need to know that excellent grades and test scores may not be enough to gain placement, especially in highly selective schools. Keep in mind that there are more than 25,000 class valedictorians every year, most with nearly perfect standardized test scores. Also, the…
Teacher Use of Achievement Test Score Data

ERIC Educational Resources Information Center

Miller, Steven C.

2012-01-01

The Wyoming Department of Education (WDE) has invested time and money developing standardized achievement test score reports designed to give teachers data about each of their students' levels of mastery of particular concepts in order to differentiate their instruction. The purpose of this study was to determine the extent to which eighth-grade…
Relationship of Self Esteem of the Disadvantaged to School Success.

ERIC Educational Resources Information Center

Frerichs, Allen H.

This study shows that there is a positive correlation between self esteem and academic achievement for inner city black children. Seventy-eight grade 6 black students were divided into the following categories: upper one-third and lower third based on intelligence test scores, standardized reading test scores, and grade point average (GPA) from…
Increasing Student Learning in Mathematics with the Use of Collaborative Teaching Strategies

ERIC Educational Resources Information Center

Di Fatta, Jenna; Garcia, Sarah; Gorman, Stephanie

2009-01-01

Three teacher researchers conducted this action research project to increase their 54 high school students' achievements in mathematics. The teacher researchers had noticed a trend of low scores on teacher-made chapter tests and non-completion of daily homework. Standardized tests showed that most students scored below average on the mathematics…
Methodological Approaches to Online Scoring of Essays.

ERIC Educational Resources Information Center

Chung, Gregory K. W. K.; O'Neil, Harold F., Jr.

This report examines the feasibility of scoring essays using computer-based techniques. Essays have been incorporated into many of the standardized testing programs. Issues of validity and reliability must be addressed to deploy automated approaches to scoring fully. Two approaches that have been used to classify documents, surface- and word-based…
A standardized test battery for the study of synesthesia

PubMed Central

Eagleman, David M.; Kagan, Arielle D.; Nelson, Stephanie S.; Sagaram, Deepak; Sarma, Anand K.

2014-01-01

Synesthesia is an unusual condition in which stimulation of one modality evokes sensation or experience in another modality. Although discussed in the literature well over a century ago, synesthesia slipped out of the scientific spotlight for decades because of the difficulty in verifying and quantifying private perceptual experiences. In recent years, the study of synesthesia has enjoyed a renaissance due to the introduction of tests that demonstrate the reality of the condition, its automatic and involuntary nature, and its measurable perceptual consequences. However, while several research groups now study synesthesia, there is no single protocol for comparing, contrasting and pooling synesthetic subjects across these groups. There is no standard battery of tests, no quantifiable scoring system, and no standard phrasing of questions. Additionally, the tests that exist offer no means for data comparison. To remedy this deficit we have devised the Synesthesia Battery. This unified collection of tests is freely accessible online (http://www.synesthete.org). It consists of a questionnaire and several online software programs, and test results are immediately available for use by synesthetes and invited researchers. Performance on the tests is quantified with a standard scoring system. We introduce several novel tests here, and offer the software for running the tests. By presenting standardized procedures for testing and comparing subjects, this endeavor hopes to speed scientific progress in synesthesia research. PMID:16919755
The Best of Both Worlds

ERIC Educational Resources Information Center

Schneider, Jack; Feldman, Joe; French, Dan

2016-01-01

Relying on teachers' assessments for the information currently provided by standardized test scores would save instructional time, better capture the true abilities of diverse students, and reduce the problem of teaching to the test. A California high school is implementing standards-based reporting, ensuring that teacher-issued grades function as…
Comparing NET and ERI standardized exam scores between baccalaureate graduates who pass or fail the NCLEX-RN.

PubMed

Bondmass, Mary D; Moonie, Sheniz; Kowalski, Susan

2008-01-01

In the United States, nursing programs are commonly evaluated by their graduates success on the National Council Licensure Examination for Registered Nurses (NCLEX-RN). The purpose of this paper is to describe a change in NCLEX-RN success rates following the addition of standardized exams throughout our program's curriculum, and to compare these exam scores between graduates who pass NCLEX-RN and those who do not. Our results indicate an 8.5% change (p < 0.000) in the NCLEX-RN pass rate from our previous 5-year mean pass rate, and significant differences in standardized test scores for those who pass the NCLEX-RN compared to those who do not (p < 0.03). We conclude that our selected standardized exam scores are able to significantly identify graduates who are more likely to pass NCLEX-RN than not.
Examining the Relationship between Students' Mathematics Test Scores and Computer Use at Home and at School

ERIC Educational Resources Information Center

O'Dwyer, Laura M.; Russell, Michael; Bebell, Damian; Seeley, Kevon

2008-01-01

Over the past decade, standardized test results have become the primary tool used to judge the effectiveness of schools and educational programs, and today, standardized testing serves as the keystone for educational policy at the state and federal levels. This paper examines the relationship between fourth grade mathematics achievement and…
The Relationship between English Language Learners' Language Proficiency and Standardized Test Scores

ERIC Educational Resources Information Center

Thakkar, Darshan

2013-01-01

It is generally theorized that English Language Learner (ELL) students do not succeed on state standardized tests because ELL students lack the cognitive academic language skills necessary to function on the large scale content assessments. The purpose of this dissertation was to test that theory. Through the use of quantitative methodology, ELL…
SDT: The Brazilian Standardization of the Silver Drawing Test of Cognition and Emotion.

ERIC Educational Resources Information Center

Allessandrini, Cristina Dias; Duarte, Jose Luclano Miranda; Bianco, Marisa Fernandes; Dupas, Margarida Azevedo

1998-01-01

The Silver Drawing Test of Cognition and Emotion was standardized for Brazilian children (N=2,000). ANOVA results are presented for age and education groups from early grades on, including distinguishing adult education levels; results are compared for U.S. and Brazilian populations. Growth in test scores, emotional content responses, and…
Health Behaviors and Standardized Test Scores: The Impact of School Health Climate on Performance

ERIC Educational Resources Information Center

Gunter, Whitney D.; Daly, Kevin

2013-01-01

Research has found that many characteristics are related to performance on standardized tests. Many of these are not necessarily "academic" attributes. One area of this research is on the connection between physical health or lifestyles and test performance. The research that exists in this area is often disconnected with each other and…
Pediatric residents' learning styles and temperaments and their relationships to standardized test scores.

PubMed

Tuli, Sanjeev Y; Thompson, Lindsay A; Saliba, Heidi; Black, Erik W; Ryan, Kathleen A; Kelly, Maria N; Novak, Maureen; Mellott, Jane; Tuli, Sonal S

2011-12-01

Board certification is an important professional qualification and a prerequisite for credentialing, and the Accreditation Council for Graduate Medical Education (ACGME) assesses board certification rates as a component of residency program effectiveness. To date, research has shown that preresidency measures, including National Board of Medical Examiners scores, Alpha Omega Alpha Honor Medical Society membership, or medical school grades poorly predict postresidency board examination scores. However, learning styles and temperament have been identified as factors that 5 affect test-taking performance. The purpose of this study is to characterize the learning styles and temperaments of pediatric residents and to evaluate their relationships to yearly in-service and postresidency board examination scores. This cross-sectional study analyzed the learning styles and temperaments of current and past pediatric residents by administration of 3 validated tools: the Kolb Learning Style Inventory, the Keirsey Temperament Sorter, and the Felder-Silverman Learning Style test. These results were compared with known, normative, general and medical population data and evaluated for correlation to in-service examination and postresidency board examination scores. The predominant learning style for pediatric residents was converging 44% (33 of 75 residents) and the predominant temperament was guardian 61% (34 of 56 residents). The learning style and temperament distribution of the residents was significantly different from published population data (P = .002 and .04, respectively). Learning styles, with one exception, were found to be unrelated to standardized test scores. The predominant learning style and temperament of pediatric residents is significantly different than that of the populations of general and medical trainees. However, learning styles and temperament do not predict outcomes on standardized in-service and board examinations in pediatric residents.
Pediatric Residents' Learning Styles and Temperaments and Their Relationships to Standardized Test Scores

PubMed Central

Tuli, Sanjeev Y.; Thompson, Lindsay A.; Saliba, Heidi; Black, Erik W.; Ryan, Kathleen A.; Kelly, Maria N.; Novak, Maureen; Mellott, Jane; Tuli, Sonal S.

2011-01-01

Background Board certification is an important professional qualification and a prerequisite for credentialing, and the Accreditation Council for Graduate Medical Education (ACGME) assesses board certification rates as a component of residency program effectiveness. To date, research has shown that preresidency measures, including National Board of Medical Examiners scores, Alpha Omega Alpha Honor Medical Society membership, or medical school grades poorly predict postresidency board examination scores. However, learning styles and temperament have been identified as factors that 5 affect test-taking performance. The purpose of this study is to characterize the learning styles and temperaments of pediatric residents and to evaluate their relationships to yearly in-service and postresidency board examination scores. Methods This cross-sectional study analyzed the learning styles and temperaments of current and past pediatric residents by administration of 3 validated tools: the Kolb Learning Style Inventory, the Keirsey Temperament Sorter, and the Felder-Silverman Learning Style test. These results were compared with known, normative, general and medical population data and evaluated for correlation to in-service examination and postresidency board examination scores. Results The predominant learning style for pediatric residents was converging 44% (33 of 75 residents) and the predominant temperament was guardian 61% (34 of 56 residents). The learning style and temperament distribution of the residents was significantly different from published population data (P = .002 and .04, respectively). Learning styles, with one exception, were found to be unrelated to standardized test scores. Conclusions The predominant learning style and temperament of pediatric residents is significantly different than that of the populations of general and medical trainees. However, learning styles and temperament do not predict outcomes on standardized in-service and board examinations in pediatric residents. PMID:23205211
Diagnostic Profiles: A Standard Setting Method for Use with a Cognitive Diagnostic Model

ERIC Educational Resources Information Center

Skaggs, Gary; Hein, Serge F.; Wilkins, Jesse L. M.

2016-01-01

This article introduces the Diagnostic Profiles (DP) standard setting method for setting a performance standard on a test developed from a cognitive diagnostic model (CDM), the outcome of which is a profile of mastered and not-mastered skills or attributes rather than a single test score. In the DP method, the key judgment task for panelists is a…
Outcomes of Fundamentals of Laparoscopic Surgery (FLS) mastery training standards applied to an ergonomically different, lower cost platform.

PubMed

Placek, Sarah B; Franklin, Brenton R; Haviland, Sarah M; Wagner, Mercy D; O'Donnell, Mary T; Cryer, Chad T; Trinca, Kristen D; Silverman, Elliott; Matthew Ritter, E

2017-06-01

Using previously established mastery learning standards, this study compares outcomes of training on standard FLS (FLS) equipment with training on an ergonomically different (ED-FLS), but more portable, lower cost platform. Subjects completed a pre-training FLS skills test on the standard platform and were then randomized to train on the FLS training platform (n = 20) or the ED-FLS platform (n = 19). A post-training FLS skills test was administered to both groups on the standard FLS platform. Group performance on the pretest was similar. Fifty percent of FLS and 32 % of ED-FLS subjects completed the entire curriculum. 100 % of subjects completing the curriculum achieved passing scores on the post-training test. There was no statistically discernible difference in scores on the final FLS exam (FLS 93.4, ED-FLS 93.3, p = 0.98) or training sessions required to complete the curriculum (FLS 7.4, ED-FLS 9.8, p = 0.13). These results show that when applying mastery learning theory to an ergonomically different platform, skill transfer occurs at a high level and prepares subjects to pass the standard FLS skills test.

Evaluation of an Innovative Digital Assessment Tool in Dental Anatomy.

PubMed

Lam, Matt T; Kwon, So Ran; Qian, Fang; Denehy, Gerald E

2015-05-01

The E4D Compare software is an innovative tool that provides immediate feedback to students' projects and competencies. It should provide consistent scores even when different scanners are used which may have inherent subtle differences in calibration. This study aimed to evaluate potential discrepancies in evaluation using the E4D Compare software based on four different NEVO scanners in dental anatomy projects. Additionally, correlation between digital and visual scores was evaluated. Thirty-five projects of maxillary left central incisors were evaluated. Among these, thirty wax-ups were performed by four operators and five consisted of standard dentoform teeth. Five scores were obtained for each project: one from an instructor that visually graded the project and from four different NEVO scanners. A faculty involved in teaching the dental anatomy course blindly scored the 35 projects. One operator scanned all projects to four NEVO scanners (D4D Technologies, Richardson, TX, USA). The images were aligned to the gold standard, and tolerance set at 0.3 mm to generate a score. The score reflected percentage match between the project and the gold standard. One-way ANOVA with repeated measures was used to determine whether there was a significant difference in scores among the four NEVO scanners. Paired-sample t-test was used to detect any difference between visual scores and the average scores of the four NEVO scanners. Pearson's correlation test was used to assess the relationship between visual and average scores of NEVO scanners. There was no significant difference in mean scores among four different NEVO scanners [F(3, 102) = 2.27, p = 0.0852 one-way ANOVA with repeated measures]. Moreover, the data provided strong evidence that a significant difference existed between visual and digital scores (p = 0.0217; a paired - sample t-test). Mean visual scores were significantly lower than digital scores (72.4 vs 75.1). Pearson's correlation coefficient of 0.85 indicated a strong correlation between visual and digital scores (p < 0.0001). The E4D Compare software provides consistent scores even when different scanners are used and correlates well with visual scores. The use of innovative digital assessment tools in dental education is promising with the E4D Compare software correlating well with visual scores and providing consistent scores even when different scanners are used.
Using Norm-Referenced Data to Set Standards for a Minimum Competency Program in the State of South Carolina.

ERIC Educational Resources Information Center

Garcia-Quintana, Roan A.; Mappus, M. Lynne

1980-01-01

Norm referenced data were utilized for determining the mastery cutoff score on a criterion referenced test. Once a cutoff score on the norm referenced measure is selected, the cutoff score on the criterion referenced measure becomes that score which maximizes proportion of consistent classifications and proportion of improvement beyond change. (CP)
An alternative to the balance error scoring system: using a low-cost balance board to improve the validity/reliability of sports-related concussion balance testing.

PubMed

Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J

2014-05-01

Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Holistic Approach to Partial Covalent Interactions in Protein Structure Prediction and Design with Rosetta.

PubMed

Combs, Steven A; Mueller, Benjamin K; Meiler, Jens

2018-05-29

Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identify and evaluate these PCIs through analyzing the geometry between participating atoms. However, we hypothesize that PCIs can be unified through a simplified electron orbital representation. To test this hypothesis, we have introduced orbital based chemical descriptors for PCIs into Rosetta, called the PCI score function. Optimal geometries for the PCIs are derived from a statistical analysis of high-quality protein structures obtained from the Protein Data Bank (PDB), and the relative orientation of electron deficient hydrogen atoms and electron-rich lone pair or π orbitals are evaluated. We demonstrate that nativelike geometries of hydrogen bonds, salt bridges, cation-π, and π-π interactions are recapitulated during minimization of protein conformation. The packing density of tested protein structures increased from the standard score function from 0.62 to 0.64, closer to the native value of 0.70. Overall, rotamer recovery improved when using the PCI score function (75%) as compared to the standard Rosetta score function (74%). The PCI score function represents an improvement over the standard Rosetta score function for protein model scoring; in addition, it provides a platform for future directions in the analysis of small molecule to protein interactions, which depend on partial covalent interactions.
Crossing the North Sea seems to make DCD disappear: cross-validation of Movement Assessment Battery for Children-2 norms.

PubMed

Niemeijer, Anuschka S; van Waelvelde, Hilde; Smits-Engelsman, Bouwien C M

2015-02-01

The Movement Assessment Battery for Children has been revised as the Movement ABC-2 (Henderson, Sugden, & Barnett, 2007). In Europe, the 15th percentile score on this test is recommended for one of the DSM-IV diagnostic criteria for Developmental Coordination Disorder (DCD). A representative sample of Dutch and Flemish children was tested to cross-validate the UK standard scores, including the 15th percentile score. First, the mean, SD and percentile scores of Dutch children were compared to those of UK normative samples. Item standard scores of Dutch speaking children deviated from the UK reference values suggesting necessary adjustments. Except for very young children, the Dutch-speaking samples performed better. Second, based on the mean and SD and clinical relevant cut-off scores (5th and 15th percentile), norms were adjusted for the Dutch population. For diagnostic use, researchers and clinicians should use the reference norms that are valid for the group of children they are testing. The results indicate that there possibly is an effect of testing procedure in other countries that validated the UK norms and/or cultural influence on the age norms of the Movement ABC-2. It is suggested to formulate criterion-based norms for age groups in addition to statistical norms. Copyright © 2014 Elsevier B.V. All rights reserved.
Raise Test Scores without Selling Your Soul: An Interview with Scott Mandel

ERIC Educational Resources Information Center

Curriculum Review, 2006

2006-01-01

With his 10th book, Improving Test Scores: A Practical Approach for Teachers and Administrators, Scott Mandel outlines steps educators can take to boost achievement on standardized exams while maintaining the integrity of their day-to-day teaching. Mandel, who holds a Ph.D. in curriculum and instruction from USC, teaches history and English at…
Can a Two-Question Test Be Reliable and Valid for Predicting Academic Outcomes?

ERIC Educational Resources Information Center

Bridgeman, Brent

2016-01-01

Scores on essay-based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple-choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores…
Manual for the USES General Aptitude Test Battery. Section IV: Norms, Specific Occupations.

ERIC Educational Resources Information Center

Manpower Administration (DOL), Washington, DC.

Adult norms are shown as cutting scores for each of the aptitudes judged significant for a given occupation. Tables for converting adult scores to their ninth and tenth grade equivalents are included. The standard error of measurement is reported for each of the nine aptitudes of the General Aptitude Test Battery (GATB): intelligence, verbal…
Redundancy, Discrimination and Corruption in the Multibillion-Dollar Business of College Admissions Testing

ERIC Educational Resources Information Center

Rizzo, Monica Ellen

2012-01-01

Most American colleges and universities require standardized entrance exams when making admissions decisions. Scores on these exams help determine if, when and where students will be allowed to pursue higher education. These scores are also used to determine eligibility for merit based financial aid. This testing persists even though half of the…
Race, Poverty and SAT Scores: Modeling the Influences of Family Income on Black and White High School Students' SAT Performance

ERIC Educational Resources Information Center

Dixon-Roman, Ezekiel J.; Everson, Howard T.; McArdle, John J.

2013-01-01

Background: Educational policy makers and test critics often assert that standardized test scores are strongly influenced by factors beyond individual differences in academic achievement such as family income and wealth. Unfortunately, few empirical studies consider the simultaneous and related influences of family income, parental education, and…
The effect of peer-group size on the delivery of feedback in basic life support refresher training: a cluster randomized controlled trial.

PubMed

Cho, Youngsuk; Je, Sangmo; Yoon, Yoo Sang; Roh, Hye Rin; Chang, Chulho; Kang, Hyunggoo; Lim, Taeho

2016-07-04

Students are largely providing feedback to one another when instructor facilitates peer feedback rather than teaching in group training. The number of students in a group affect the learning of students in the group training. We aimed to investigate whether a larger group size increases students' test scores on a post-training test with peer feedback facilitated by instructor after video-guided basic life support (BLS) refresher training. Students' one-rescuer adult BLS skills were assessed by a 2-min checklist-based test 1 year after the initial training. A cluster randomized controlled trial was conducted to evaluate the effect of student number in a group on BLS refresher training. Participants included 115 final-year medical students undergoing their emergency medicine clerkship. The median number of students was 8 in the large groups and 4 in the standard group. The primary outcome was to examine group differences in post-training test scores after video-guided BLS training. Secondary outcomes included the feedback time, number of feedback topics, and results of end-of-training evaluation questionnaires. Scores on the post-training test increased over three consecutive tests with instructor-led peer feedback, but not differ between large and standard groups. The feedback time was longer and number of feedback topics generated by students were higher in standard groups compared to large groups on the first and second tests. The end-of-training questionnaire revealed that the students in large groups preferred the smaller group size compared to their actual group size. In this BLS refresher training, the instructor-led group feedback increased the test score after tutorial video-guided BLS learning, irrespective of the group size. A smaller group size allowed more participations in peer feedback.
Creating School Communities through Music

ERIC Educational Resources Information Center

Marasco, Katelyn

2011-01-01

There are many problems facing educators today. Student retention, standardized test scores, and motivational issues are only a few. It seems that students are dropping out of school at higher rates and having more difficulty finding motivation to do well on their school work and standardized tests. This sought to investigate strategies that…
Robust Confidence Interval for a Ratio of Standard Deviations

ERIC Educational Resources Information Center

Bonett, Douglas G.

2006-01-01

Comparing variability of test scores across alternate forms, test conditions, or subpopulations is a fundamental problem in psychometrics. A confidence interval for a ratio of standard deviations is proposed that performs as well as the classic method with normal distributions and performs dramatically better with nonnormal distributions. A simple…
The potential of standards-based agriculture biology as an alternative to traditional biology in California

NASA Astrophysics Data System (ADS)

Sellu, George Sahr

Over the past five decades, several waves of educational reform have influenced K-12 science course offerings and classroom instruction in public education. The effectiveness of educational policies has been increasingly measured by standardized tests. The focus on test scores, content standards, and performance standards (which is a product of recent educational policies) has influenced course offerings and the depth and breadth of curriculum coverage (Linn, 2000). For the better part of the last hundred years, vocational education and traditional education have followed two separate tracks in terms of objectives, policies, and values (Hillison, 1996). Educational reform policies have had varying influences on school programs. For example, elective courses such as Career Technical Education (CTE) courses---which are not considered core academic courses---have been negatively influenced by current educational reform. In the past three decades, there has been gradual movement toward merging vocational and traditional education. It has been difficult for policies from both sides to merge because of differences in objectives for both tracks. Traditional courses have been guided by federal policies, such as No Child Left Behind (NCLB) and Common Core State Standards (CCSS) while the Carl D. Perkins Act (Perkins Act) has shaped CTE courses. It appears that several of the requirements of the Perkins Act meet expectations of traditional education policies. However, there is no direct metric for measuring the contribution of CTE courses toward increased achievement in science as measured by standardized tests. As such, CTE courses will continue to lose resources in order to support courses that prepare students for standardized tests. In order to address some of these challenges over the last three decades, agriculture educators have developed integrated science courses as a means for increasing science achievement scores for agriculture education students in K-12 public schools. Thoron & Meyer (2011) suggested that research into the contribution of integrated science courses toward higher test scores yielded mixed results. This finding may have been due in part to the fact that integrated science courses only incorporate select topics into agriculture education courses. In California, however, agriculture educators have developed standards-based courses such as Agriculture Biology (AgBio) that cover the same content standards as core traditional courses such as traditional biology. Students in both AgBio and traditional biology take the same standardized biology test. This is the first time there has been an opportunity for a fair comparison and a uniform metric for an agriscience course such as AgBio to be directly compared to traditional biology. This study will examine whether there are differences between AgBio and traditional biology with regard to standardized test scores in biology. Furthermore, the study examines differences in perception between teachers and students regarding teaching and learning activities associated with higher achievement in science. The findings of the study could provide a basis for presenting AgBio as a potential alternative to traditional biology. The findings of this study suggest that there are no differences between AgBio and traditional biology students with regard to standardized biology test scores. Additionally, the findings indicate that co-curricular activities in AgBio could contribute higher student achievement in biology. However, further research is required to identify specific activities in AgBio that contribute to higher achievement in science.
Categorical Differences in Statewide Standardized Testing Scores of Students with Disabilities

ERIC Educational Resources Information Center

Trexler, Ellen L.

2013-01-01

The No Child Left Behind Act requires all students be proficient in reading and mathematics by 2014, and students in subgroups to make Adequate Yearly Progress. One of these groups is students with disabilities, who continue to score well below their general education peers. This quantitative study identified scoring differences between disability…
Role of a computer-generated three-dimensional laryngeal model in anatomy teaching for advanced learners.

PubMed

Tan, S; Hu, A; Wilson, T; Ladak, H; Haase, P; Fung, K

2012-04-01

(1) To investigate the efficacy of a computer-generated three-dimensional laryngeal model for laryngeal anatomy teaching; (2) to explore the relationship between students' spatial ability and acquisition of anatomical knowledge; and (3) to assess participants' opinion of the computerised model. Forty junior doctors were randomised to undertake laryngeal anatomy study supplemented by either a three-dimensional computer model or two-dimensional images. Outcome measurements comprised a laryngeal anatomy test, the modified Vandenberg and Kuse mental rotation test, and an opinion survey. Mean scores ± standard deviations for the anatomy test were 15.7 ± 2.0 for the 'three dimensions' group and 15.5 ± 2.3 for the 'standard' group (p = 0.7222). Pearson's correlation between the rotation test scores and the scores for the spatial ability questions in the anatomy test was 0.4791 (p = 0.086, n = 29). Opinion survey answers revealed significant differences in respondents' perceptions of the clarity and 'user friendliness' of, and their preferences for, the three-dimensional model as regards anatomical study. The three-dimensional computer model was equivalent to standard two-dimensional images, for the purpose of laryngeal anatomy teaching. There was no association between students' spatial ability and functional anatomy learning. However, students preferred to use the three-dimensional model.
The impact of a scheduling change on ninth grade high school performance on biology benchmark exams and the California Standards Test

NASA Astrophysics Data System (ADS)

Leonardi, Marcelo

The primary purpose of this study was to examine the impact of a scheduling change from a trimester 4x4 block schedule to a modified hybrid schedule on student achievement in ninth grade biology courses. This study examined the impact of the scheduling change on student achievement through teacher created benchmark assessments in Genetics, DNA, and Evolution and on the California Standardized Test in Biology. The secondary purpose of this study examined the ninth grade biology teacher perceptions of ninth grade biology student achievement. Using a mixed methods research approach, data was collected both quantitatively and qualitatively as aligned to research questions. Quantitative methods included gathering data from departmental benchmark exams and California Standardized Test in Biology and conducting multiple analysis of covariance and analysis of covariance to determine significance differences. Qualitative methods include journal entries questions and focus group interviews. The results revealed a statistically significant increase in scores on both the DNA and Evolution benchmark exams. DNA and Evolution benchmark exams showed significant improvements from a change in scheduling format. The scheduling change was responsible for 1.5% of the increase in DNA benchmark scores and 2% of the increase in Evolution benchmark scores. The results revealed a statistically significant decrease in scores on the Genetics Benchmark exam as a result of the scheduling change. The scheduling change was responsible for 1% of the decrease in Genetics benchmark scores. The results also revealed a statistically significant increase in scores on the CST Biology exam. The scheduling change was responsible for .7% of the increase in CST Biology scores. Results of the focus group discussions indicated that all teachers preferred the modified hybrid schedule over the trimester schedule and that it improved student achievement.
Linking English-Language Test Scores onto the Common European Framework of Reference: An Application of Standard-Setting Methodology. TOEFL iBT Research Report TOEFL iBt-06. ETS RR-08-34

ERIC Educational Resources Information Center

Tannenbaum, Richard J.; Wylie, E. Caroline

2008-01-01

The Common European Framework of Reference (CEFR) describes language proficiency in reading, writing, speaking, and listening on a 6-level scale. In this study, English-language experts from across Europe linked CEFR levels to scores on three tests: the TOEFL® iBT test, the TOEIC® assessment, and the TOEIC "Bridge"™ test.…
The MCCB impairment profile for schizophrenia outpatients: results from the MATRICS psychometric and standardization study.

PubMed

Kern, Robert S; Gold, James M; Dickinson, Dwight; Green, Michael F; Nuechterlein, Keith H; Baade, Lyle E; Keefe, Richard S E; Mesholam-Gately, Raquelle I; Seidman, Larry J; Lee, Cathy; Sugar, Catherine A; Marder, Stephen R

2011-03-01

The MATRICS Psychometric and Standardization Study was conducted as a final stage in the development of the MATRICS Consensus Cognitive Battery (MCCB). The study included 176 persons with schizophrenia or schizoaffective disorder and 300 community residents. Data were analyzed to examine the cognitive profile of clinically stable schizophrenia patients on the MCCB. Secondarily, the data were analyzed to identify which combination of cognitive domains and corresponding cut-off scores best discriminated patients from community residents, and patients competitively employed vs. those not. Raw scores on the ten MCCB tests were entered into the MCCB scoring program which provided age- and gender-corrected T-scores on seven cognitive domains. To test for between-group differences, we conducted a 2 (group)×7 (cognitive domain) MANOVA with follow-up independent t-tests on the individual domains. Classification and regression trees (CART) were used for the discrimination analyses. Examination of patient T-scores across the seven cognitive domains revealed a relatively compact profile with T-scores ranging from 33.4 for speed of processing to 39.3 for reasoning and problem-solving. Speed of processing and social cognition best distinguished individuals with schizophrenia from community residents; speed of processing along with visual learning and attention/vigilance optimally distinguished patients competitively employed from those who were not. The cognitive profile findings provide a standard to which future studies can compare results from other schizophrenia samples and related disorders; the classification results point to specific areas and levels of cognitive impairment that may advance work rehabilitation efforts. Published by Elsevier B.V.
The MCCB Impairment Profile for Schizophrenia Outpatients:Results from the MATRICS Psychometric and Standardization Study

PubMed Central

Kern, Robert S.; Gold, James M.; Dickinson, Dwight; Green, Michael F.; Nuechterlein, Keith H.; Baade, Lyle E.; Keefe, Richard S. E.; Mesholam-Gately, Raquelle I.; Seidman, Larry J.; Lee, Cathy; Sugar, Catherine A.; Marder, Stephen R.

2010-01-01

The MATRICS Psychometric and Standardization Study was conducted as a final stage in the development of the MATRICS Consensus Cognitive Battery (MCCB). The study included 176 persons with schizophrenia or schizoaffective disorder and 300 community residents. Data were analyzed to examine the cognitive profile of clinically stable schizophrenia patients on the MCCB. Secondarily, the data were analyzed to identify which combination of cognitive domains and corresponding cut-off scores best discriminated patients from community residents, and patients competitively employed vs. those not. Raw scores on the ten MCCB tests were entered into the MCCB scoring program which provided age-and gender-corrected T-scores on seven cognitive domains. To test for between-group differences, we conducted a 2 (group) × 7 (cognitive domain) MANOVA with follow-up independent t – tests on the individual domains. Classification and regression trees (CART) were used for the discrimination analyses. Examination of patient T-scores across the seven cognitive domains revealed a relatively compact profile with T-scores ranging from 33.4 for speed of processing to 39.3 for reasoning and problem-solving. Speed of processing and social cognition best distinguished individuals with schizophrenia from community residents; speed of processing along with visual learning and attention/vigilance optimally distinguished patients competitively employed from those who were not. The cognitive profile findings provide a standard to which future studies can compare results from other schizophrenia samples and related disorders; the classification results point to specific areas and levels of cognitive impairment that may advance work rehabilitation efforts. PMID:21159492

A Cross-Cultural Test of Sex Bias in the Predictive Validity of Scholastic Aptitude Examinations: Some Israeli Findings.

ERIC Educational Resources Information Center

Zeidner, Moshe

1987-01-01

This study examined the cross-cultural validity of the sex bias contention with respect to standardized aptitude testing, used for academic prediction purposes in Israel. Analyses were based on the grade point average and scores of 1778 Jewish and 1017 Arab students who were administered standardized college entrance test batteries. (Author/LMO)
Investigating a Judgemental Rank-Ordering Method for Maintaining Standards in UK Examinations

ERIC Educational Resources Information Center

Black, Beth; Bramley, Tom

2008-01-01

A new judgemental method of equating raw scores on two tests, based on rank-ordering scripts from both tests, has been developed by Bramley. The rank-ordering method has potential application as a judgemental standard-maintaining mechanism, because given a mark on one test (e.g. the A grade boundary mark), the equivalent mark (i.e. at the same…
The Myths of Standardized Tests: Why They Don't Tell You What You Think They Do

ERIC Educational Resources Information Center

Harris, Phillip; Smith, Bruce M.; Harris, Joan

2011-01-01

Pundits, politicians, and business leaders continually make claims for what standardized tests can do, and those claims go largely unchallenged because they are in line with popular assumptions about what these tests can do, what the scores mean, and the psychology of human motivation. But what most of what these opinion leaders say--and the…
Face recognition performance of individuals with Asperger syndrome on the Cambridge Face Memory Test.

PubMed

Hedley, Darren; Brewer, Neil; Young, Robyn

2011-12-01

Although face recognition deficits in individuals with Autism Spectrum Disorder (ASD), including Asperger syndrome (AS), are widely acknowledged, the empirical evidence is mixed. This in part reflects the failure to use standardized and psychometrically sound tests. We contrasted standardized face recognition scores on the Cambridge Face Memory Test (CFMT) for 34 individuals with AS with those for 42, IQ-matched non-ASD individuals, and age-standardized scores from a large Australian cohort. We also examined the influence of IQ, autistic traits, and negative affect on face recognition performance. Overall, participants with AS performed significantly worse on the CFMT than the non-ASD participants and when evaluated against standardized test norms. However, while 24% of participants with AS presented with severe face recognition impairment (>2 SDs below the mean), many individuals performed at or above the typical level for their age: 53% scored within +/- 1 SD of the mean and 9% demonstrated superior performance (>1 SD above the mean). Regression analysis provided no evidence that IQ, autistic traits, or negative affect significantly influenced face recognition: diagnostic group membership was the only significant predictor of face recognition performance. In sum, face recognition performance in ASD is on a continuum, but with average levels significantly below non-ASD levels of performance. Copyright © 2011, International Society for Autism Research, Wiley-Liss, Inc.
Intuitive Sense of Number Correlates With Math Scores on College-Entrance Examination

PubMed Central

Libertus, Melissa E.; Odic, Darko; Halberda, Justin

2012-01-01

Many educated adults possess exact mathematical abilities in addition to an approximate, intuitive sense of number, often referred to as the Approximate Number System (ANS). Here we investigate the link between ANS precision and mathematics performance in adults by testing participants on an ANS-precision test and collecting their scores on the Scholastic Aptitude Test (SAT), a standardized college-entrance exam in the USA. In two correlational studies, we found that ANS precision correlated with SAT-Quantitative (i.e., mathematics) scores. This relationship remained robust even when controlling for SAT-Verbal scores, suggesting a small but specific relationship between our primitive sense for number and formal mathematical abilities. PMID:23098904
Correlation between musical responsiveness and developmental age among early age children as assessed by the Non-Verbal Measurement of the Musical Responsiveness of Children.

PubMed

Matsuyama, Kumi

2005-10-01

The currently-available standardized music tests are not suitable for administration to young children and children with special needs because they are complicated and require verbal instructions and verbal responses. A test that was named the Non-Verbal Measurement of the Musical Responsiveness of Children, was developed to assess the musical responsiveness of young children. This test does not depend on verbal instructions, and is composed of two parts, Rhythm and Melody. Ninety-two children [age, range, 6-69 months; 36.39+/-17.61 (mean +/-standard deviation) months] who attended mainstream pre-schools were studied. Each child was tested to see whether the child correctly imitated 7 different patterns of rhythm and 6 different patterns of melody that were delivered by clapping of hands or the voice of the examiner, respectively. The examiner rated whether the child could imitate each pattern and the total score was the sum of successfully reproduced patterns. Two independent observers viewed videotapes of the testing sessions and assigned scores in a similar manner. The inter-rater reliability among the three raters was assessed. The total score in Melody (R=0.63, p<0.001) and the total score in Rhythm (R=0.81, p<0.001) were each correlated with developmental age. The inter-rater reliability was good (Melody: Kendall's W=0.78, Rhythm: Kendall's W=0.95). The degree of musical responsiveness of normal young children is correlated with general development. This measurement tool is valid and reliable for use in young children who lack sufficient verbal understanding to take standardized music tests. This test may also be administered to children with special needs.
A descriptive study of the U.S. Marine Corps fitness tests (2000-2012).

PubMed

Bartlett, Jamie L; Phillips, Jennifer; Galarneau, Michael R

2015-05-01

This article describes the performance of active duty U.S. Marines on the Physical Fitness Test (PFT) and Combat Fitness Test (CFT) during calendar years 2000 through 2012. Our study sample included PFT composite scores (n = 543,185), PFT and CFT composite scores (n = 160,936), and PFT and CFT event scores (n = 135,926 and n = 201,953, respectively). In general, all Marines performed very well on each fitness test, with overall annual improvements. Interestingly, the majority of female Marines passed the minimum male standard on the CFT. Further studies will evaluate the relationship of fitness test performance and injury. Reprint & Copyright © 2015 Association of Military Surgeons of the U.S.
Daylight Makes a Difference: Daylight in the Classroom Can Boost Standardized Test Scores and Learning. [Audiotape].

ERIC Educational Resources Information Center

Kosik, Kenneth S.; Heschong, Lisa

An audiotape presents study analysis of the effect of daylighting on student performance. The study includes a focus on skylighting as a way to isolate daylight as an illumination source, and separate illumination effects from other qualities associated with daylighting from windows. Results from test scores of over 21,000 student records, along…
Classroom Organizational Structure in Fifth Grade Math Classrooms and the Effect on Standardized Test Scores

ERIC Educational Resources Information Center

Lane, Dallas Marie

2017-01-01

The purpose of this study was to determine if there is a relationship between the classroom organizational structure and MCT2 test scores of fifth-grade math students. The researcher gained insight regarding which structure teachers believe is most beneficial to them and students, and whether or not their belief of classroom organizational…
Are students' impressions of improved learning through active learning methods reflected by improved test scores?

PubMed

Everly, Marcee C

2013-02-01

To report the transformation from lecture to more active learning methods in a maternity nursing course and to evaluate whether student perception of improved learning through active-learning methods is supported by improved test scores. The process of transforming a course into an active-learning model of teaching is described. A voluntary mid-semester survey for student acceptance of the new teaching method was conducted. Course examination results, from both a standardized exam and a cumulative final exam, among students who received lecture in the classroom and students who had active learning activities in the classroom were compared. Active learning activities were very acceptable to students. The majority of students reported learning more from having active-learning activities in the classroom rather than lecture-only and this belief was supported by improved test scores. Students who had active learning activities in the classroom scored significantly higher on a standardized assessment test than students who received lecture only. The findings support the use of student reflection to evaluate the effectiveness of active-learning methods and help validate the use of student reflection of improved learning in other research projects. Copyright © 2011 Elsevier Ltd. All rights reserved.
An Analysis of Grade 4 Teachers' Mathematical Instructional Strategies

ERIC Educational Resources Information Center

Wilson-Patrick, Dedra

2016-01-01

The standardized math test scores of approximately 48 African American and Hispanic students from 4 different classes at a rural Title I elementary school located in the southern United States decreased by 10 points on the Palmetto Assessment of State Standards Test. For this qualitative case study, purposive sampling was used to recruit four…
Academic status and progress of deaf and hard-of-hearing students in general education classrooms.

PubMed

Antia, Shirin D; Jones, Patricia B; Reed, Susanne; Kreimeyer, Kathryn H

2009-01-01

The study participants were 197 deaf or hard-of-hearing students with mild to profound hearing loss who attended general education classes for 2 or more hours per day. We obtained scores on standardized achievement tests of math, reading, and language/writing, and standardized teacher's ratings of academic competence annually, for 5 years, together with other demographic and communication data. Results on standardized achievement tests indicated that, over the 5-year period, 63%-79% of students scored in the average or above-average range in math, 48%-68% in reading, and 55%-76% in language/writing. The standardized test scores for the group were, on average, half an SD below hearing norms. Average student progress in each subject area was consistent with or better than that made by the norm group of hearing students, and 79%-81% of students made one or more year's progress annually. Teachers rated 69%-81% of students as average or above average in academic competence over the 5 years. The teacher's ratings also indicated that 89% of students made average or above-average progress. Students' expressive and receptive communication, classroom participation, communication mode, and parental participation in school were significantly, but moderately, related to academic outcomes.
INTRA-RATER RELIABILITY OF THE MULTIPLE SINGLE-LEG HOP-STABILIZATION TEST AND RELATIONSHIPS WITH AGE, LEG DOMINANCE AND TRAINING.

PubMed

Sawle, Leanne; Freeman, Jennifer; Marsden, Jonathan

2017-04-01

Balance is a complex construct, affected by multiple components such as strength and co-ordination. However, whilst assessing an athlete's dynamic balance is an important part of clinical examination, there is no gold standard measure. The multiple single-leg hop-stabilization test is a functional test which may offer a method of evaluating the dynamic attributes of balance, but it needs to show adequate intra-tester reliability. The purpose of this study was to assess the intra-rater reliability of a dynamic balance test, the multiple single-leg hop-stabilization test on the dominant and non-dominant legs. Intra-rater reliability study. Fifteen active participants were tested twice with a 10-minute break between tests. The outcome measure was the multiple single-leg hop-stabilization test score, based on a clinically assessed numerical scoring system. Results were analysed using an Intraclass Correlations Coefficient (ICC 2,1 ) and Bland-Altman plots. Regression analyses explored relationships between test scores, leg dominance, age and training (an alpha level of p = 0.05 was selected). ICCs for intra-rater reliability were 0.85 for the dominant and non-dominant legs (confidence intervals = 0.62-0.95 and 0.61-0.95 respectively). Bland-Altman plots showed scores within two standard deviations. A significant correlation was observed between the dominant and non-dominant leg on balance scores (R 2 =0.49, p<0.05), and better balance was associated with younger participants in their non-dominant leg (R 2 =0.28, p<0.05) and their dominant leg (R 2 =0.39, p<0.05), and a higher number of hours spent training for the non-dominant leg R 2 =0.37, p<0.05). The multiple single-leg hop-stabilisation test demonstrated strong intra-tester reliability with active participants. Younger participants who trained more, have better balance scores. This test may be a useful measure for evaluating the dynamic attributes of balance. 3.
Shifting the Curve: Fostering Academic Success in a Diverse Student Body.

PubMed

Elks, Martha L; Herbert-Carter, Janice; Smith, Marjorie; Klement, Brenda; Knight, Brandi Brandon; Anachebe, Ngozi F

2018-01-01

Diversity in the health care workforce is key to achieving health equity. Although U.S. medical schools have worked to increase the matriculation and academic success of underrepresented minority (URM) students (African Americans, Latinos, others), they have had only limited success. Lower standardized test scores, including on the Medical College Admission Test (MCAT), have been a barrier to matriculation for many URM applicants. Lower subsequent standardized exam scores, including on the United States Medical Licensing Exam Step 1, also have been an impediment to students' progress, with mean scores for URM students lagging behind those for others. Faculty at the Morehouse School of Medicine developed and implemented interventions to enhance the academic success of their URM students (about 75% are African American, and 5% are from other URM groups). To assess the outcomes of this work, the authors analyzed the MCAT scores and subsequent Step 1 scores of students in the graduating classes of 2009-2014. They also reviewed course evaluations, Graduation Questionnaires, and student and faculty interviews and focus groups. Students' Step 1 scores exceeded those expected based on their MCAT scores. This success was due to three key elements: (1) milieu and mentoring, (2) structure and content of the curriculum, and (3) monitoring. A series of mixed-method studies are planned to better discern the core elements of faculty-student relationships that are key to students' success. Lower test scores are not a fixed attribute; with the elements described, success is attainable for all students.
Parent's Guide to Understanding Tests.

ERIC Educational Resources Information Center

CTB / McGraw-Hill, Monterey, CA.

This brief introduction to testing is geared to parents. Types of tests are defined, such as standardized tests, achievement tests, norm referenced tests, criterion referenced tests, and aptitude tests. Various types of scores (grade equivalent, percentile rank, and stanine are also defined, and the uses made of tests by administrators, teachers,…
Comparison of two methods of standard setting: the performance of the three-level Angoff method.

PubMed

Jalili, Mohammad; Hejri, Sara M; Norcini, John J

2011-12-01

Cut-scores, reliability and validity vary among standard-setting methods. The modified Angoff method (MA) is a well-known standard-setting procedure, but the three-level Angoff approach (TLA), a recent modification, has not been extensively evaluated. This study aimed to compare standards and pass rates in an objective structured clinical examination (OSCE) obtained using two methods of standard setting with discussion and reality checking, and to assess the reliability and validity of each method. A sample of 105 medical students participated in a 14-station OSCE. Fourteen and 10 faculty members took part in the MA and TLA procedures, respectively. In the MA, judges estimated the probability that a borderline student would pass each station. In the TLA, judges estimated whether a borderline examinee would perform the task correctly or not. Having given individual ratings, judges discussed their decisions. One week after the examination, the procedure was repeated using normative data. The mean score for the total test was 54.11% (standard deviation: 8.80%). The MA cut-scores for the total test were 49.66% and 51.52% after discussion and reality checking, respectively (the consequent percentages of passing students were 65.7% and 58.1%, respectively). The TLA yielded mean pass scores of 53.92% and 63.09% after discussion and reality checking, respectively (rates of passing candidates were 44.8% and 12.4%, respectively). Compared with the TLA, the MA showed higher agreement between judges (0.94 versus 0.81) and a narrower 95% confidence interval in standards (3.22 versus 11.29). The MA seems a more credible and reliable procedure with which to set standards for an OSCE than does the TLA, especially when a reality check is applied. © Blackwell Publishing Ltd 2011.
A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

ERIC Educational Resources Information Center

Attali, Yigal

2016-01-01

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
A pre-test and post-test study of the physical and psychological effects of out-of-home respite care on caregivers of children with life-threatening conditions.

PubMed

Remedios, Cheryl; Willenberg, Lisa; Zordan, Rachel; Murphy, Andrea; Hessel, Gail; Philip, Jennifer

2015-03-01

Respite services are recommended as an important support for caregivers of children with life-threatening conditions. However, the benefits of respite have not been convincingly demonstrated through quantitative research. To determine the impact of out-of home respite care on levels of fatigue, psychological adjustment, quality of life and relationship satisfaction among caregivers of children with life-threatening conditions. A mixed-methods, pre-test and post-test study A consecutive sample of 58 parental caregivers whose children were admitted to a children's hospice for out-of-home respite over an average of 4 days. Caregivers had below-standard levels of quality of life compared to normative populations. Paired t-tests demonstrated that caregivers' average psychological adjustment scores significantly improved from pre-respite (mean = 13.9, standard error = 0.71) to post-respite (mean = 10.7, standard error = 1); p < 0.001, 95% confidence interval: 1.25-5.11). Furthermore, caregivers' average fatigue scores significantly improved from pre-respite (mean = 14.3, standard error = 0.85) to post-respite (mean = 10.9, standard error = 1.01; p < 0.001, 95% confidence interval: 1.69-7.94), and caregivers' average mental health quality of life scores significantly improved from pre-respite (mean = 44.2, standard error = 1.8) to post-respite (mean = 49.1, standard error = 1.6; p < 0.01, 95% confidence interval: -9.56 to 0.36). Qualitative data showed caregivers sought respite for relief from intensive care provision and believed this was essential to their well-being. Findings indicate the effectiveness of out-of-home respite care in improving the fatigue and psychological adjustment of caregivers of children with life-threatening conditions. Study outcomes inform service provision and future research efforts in paediatric palliative care. © The Author(s) 2015.
The remote, the mouse, and the no. 2 pencil: the household media environment and academic achievement among third grade students.

PubMed

Borzekowski, Dina L G; Robinson, Thomas N

2005-07-01

Media can influence aspects of a child's physical, social, and cognitive development; however, the associations between a child's household media environment, media use, and academic achievement have yet to be determined. To examine relationships among a child's household media environment, media use, and academic achievement. During a single academic year, data were collected through classroom surveys and telephone interviews from an ethnically diverse sample of third grade students and their parents from 6 northern California public elementary schools. The majority of our analyses derive from spring 2000 data, including academic achievement assessed through the mathematics, reading, and language arts sections of the Stanford Achievement Test. We fit linear regression models to determine the associations between variations in household media and performance on the standardized tests, adjusting for demographic and media use variables. The household media environment is significantly associated with students' performance on the standardized tests. It was found that having a bedroom television set was significantly and negatively associated with students' test scores, while home computer access and use were positively associated with the scores. Regression models significantly predicted up to 24% of the variation in the scores. Absence of a bedroom television combined with access to a home computer was consistently associated with the highest standardized test scores. This study adds to the growing literature reporting that having a bedroom television set may be detrimental to young elementary school children. It also suggests that having and using a home computer may be associated with better academic achievement.
Development and standardization of Arabic words in noise test in Egyptian children.

PubMed

Abdel Rahman, Tayseer Taha

2018-05-01

To develop and establish norms of Arabic Words in Noise test in Egyptian children. Total number of participants was 152 with normal hearing and ranging in age from 5 to 12 years. They are subdivided into two main groups (standardization group) which comprised 120 children with normal scholastic achievement and (application group) which comprised 32 children with different types of central auditory processing disorders. Arabic version of both Speech perception in noise (SPIN) and Words in Noise (WIN) tests were presented in each ear at zero signal to-noise ratio (SNR) using ipsilateral Cafeteria noise fixed at 50 dB sensation level (dBSL). The least performance in WIN test occurred between 5 and 7 years and highest scores from 9 to 12 years. However, no statistically significant difference was found among the three standardization age groups. Moreover, no statistically significant difference was found between the right and left ears scores or among the three lists. When the WIN test was compared to SPIN test in children with and without abnormal SPIN scores it showed highly consistent results except in children suffering from memory deficit reflecting that WIN test is more accurate than SPIN in this group of children. The Arabic WIN test can be used in children as young as 5 years. Also, it can be a good cross check test with SPIN test or used to follow up children after rehabilitation program in hearing impaired children or follow up after central auditory remediation of children with selective auditory attention deficit. Copyright © 2017. Published by Elsevier B.V.

The New Peabody Picture Vocabulary Test-III: An Illusion of Unbiased Assessment?

PubMed

Stockman, Ida J

2000-10-01

This article examines whether changes in the ethnic minority composition of the standardization sample for the latest edition of the Peabody Picture Vocabulary Test (PPVT-III, Dunn & Dunn, 1997) can be used as the sole explanation for children's better test scores when compared to an earlier edition, the Peabody Picture Vocabulary Test-Revised (PPVT-R, Dunn & Dunn, 1981). Results from a comparative analysis of these two test editions suggest that other factors may explain improved performances. Among these factors are the number of words and age levels sampled, the types of words and pictures used, and characteristics of the standardization sample other than its ethnic minority composition. This analysis also raises questions regarding the usefulness of converting scores from one edition to the other and the type of criteria that could be used to evaluate whether the PPVT-III is an unbiased test of vocabulary for children from diverse cultural and linguistic backgrounds.
Academic Outcome Measures of a Dedicated Education Unit Over Time: Help or Hinder?

PubMed

Smyer, Tish; Gatlin, Tricia; Tan, Rhigel; Tejada, Marianne; Feng, Du

2015-01-01

Critical thinking, nursing process, quality and safety measures, and standardized RN exit examination scores were compared between students (n = 144) placed in a dedicated education unit (DEU) and those in a traditional clinical model. Standardized test scores showed that differences between the clinical groups were not statistically significant. This study shows that the DEU model is 1 approach to clinical education that can enhance students' academic outcomes.
Reliable change indices and standardized regression-based change score norms for evaluating neuropsychological change in children with epilepsy.

PubMed

Busch, Robyn M; Lineweaver, Tara T; Ferguson, Lisa; Haut, Jennifer S

2015-06-01

Reliable change indices (RCIs) and standardized regression-based (SRB) change score norms permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRB change score norms for use in children with epilepsy. Sixty-three children with epilepsy (age range: 6-16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice effect-adjusted RCIs and SRB change score norms were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children's Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. Reliable change indices and SRB change score norms for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRB change score norms for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An Excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. Copyright © 2015 Elsevier Inc. All rights reserved.
Is the NIHSS Certification Process Too Lenient?

PubMed Central

Hills, Nancy K.; Josephson, S. Andrew; Lyden, Patrick D.; Johnston, S. Claiborne

2009-01-01

Background and Purpose The National Institutes of Health Stroke Scale (NIHSS) is a widely used measure of neurological function in clinical trials and patient assessment; inter-rater scoring variability could impact communications and trial power. The manner in which the rater certification test is scored yields multiple correct answers that have changed over time. We examined the range of possible total NIHSS scores from answers given in certification tests by over 7,000 individual raters who were certified. Methods We analyzed the results of all raters who completed one of two standard multiple-patient videotaped certification examinations between 1998 and 2004. The range for the correct score, calculated using NIHSS ‘correct answers’, was determined for each patient. The distribution of scores derived from those who passed the certification test then was examined. Results A total of 6,268 raters scored 5 patients on Test 1; 1,240 scored 6 patients on Test 2. Using a National Stroke Association (NSA) answer key, we found that correct total scores ranged from 2 correct scores to as many as 12 different correct total scores. Among raters who achieved a passing score and were therefore qualified to administer the NIHSS, score distributions were even wider, with 1 certification patient receiving 18 different correct total scores. Conclusions Allowing multiple acceptable answers for questions on the NIHSS certification test introduces scoring variability. It seems reasonable to assume that the wider the range of acceptable answers in the certification test, the greater the variability in the performance of the test in trials and clinical practice by certified examiners. Greater consistency may be achieved by deriving a set of ‘best’ answers through expert consensus on all questions where this is possible, then teaching raters how to derive these answers using a required interactive training module. PMID:19295205
42 CFR 493.845 - Standard; Toxicology.

Code of Federal Regulations, 2012 CFR

2012-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.851 - Standard; Hematology.

Code of Federal Regulations, 2014 CFR

2014-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.843 - Standard; Endocrinology.

Code of Federal Regulations, 2013 CFR

2013-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.845 - Standard; Toxicology.

Code of Federal Regulations, 2014 CFR

2014-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.845 - Standard; Toxicology.

Code of Federal Regulations, 2013 CFR

2013-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.851 - Standard; Hematology.

Code of Federal Regulations, 2013 CFR

2013-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.843 - Standard; Endocrinology.

Code of Federal Regulations, 2012 CFR

2012-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.843 - Standard; Endocrinology.

Code of Federal Regulations, 2014 CFR

2014-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
42 CFR 493.851 - Standard; Hematology.

Code of Federal Regulations, 2012 CFR

2012-10-01

... acceptable responses for each analyte in each testing event is unsatisfactory analyte performance for the... testing event. (e)(1) For any unsatisfactory analyte or test performance or testing event for reasons... any unacceptable analyte or testing event score, remedial action must be taken and documented, and the...
Predicting clinical concussion measures at baseline based on motivation and academic profile.

PubMed

Trinidad, Katrina J; Schmidt, Julianne D; Register-Mihalik, Johna K; Groff, Diane; Goto, Shiho; Guskiewicz, Kevin M

2013-11-01

The purpose of this study was to predict baseline neurocognitive and postural control performance using a measure of motivation, high school grade point average (hsGPA), and Scholastic Aptitude Test (SAT) score. Cross-sectional. Clinical research center. Eighty-eight National Collegiate Athletic Association Division I incoming student-athletes (freshman and transfers). Participants completed baseline clinical concussion measures, including a neurocognitive test battery (CNS Vital Signs), a balance assessment [Sensory Organization Test (SOT)], and motivation testing (Rey Dot Counting). Participants granted permission to access hsGPA and SAT total score. Standard scores for each CNS Vital Signs domain and SOT composite score. Baseline motivation, hsGPA, and SAT explained a small percentage of the variance of complex attention (11%), processing speed (12%), and composite SOT score (20%). Motivation, hsGPA, and total SAT score do not explain a significant amount of the variance in neurocognitive and postural control measures but may still be valuable to consider when interpreting neurocognitive and postural control measures.
The Chinese-Western Intercultural Couple Standards Scale.

PubMed

Hiew, Danika N; Halford, W Kim; van de Vijver, Fons J R; Liu, Shuang

2015-09-01

We developed the Chinese-Western Intercultural Couple Standards Scale (CWICSS) to assess relationship standards that may differ between Chinese and Western partners and may challenge intercultural couples. The scale assesses 4 Western-derived relationship standards (demonstrations of love, demonstrations of caring, intimacy expression, and intimacy responsiveness) and 4 Chinese-derived relationship standards (relations with the extended family, relational harmony, face, and gender roles). We administered the CWICSS to 983 Chinese and Western participants living in Australia to assess the psychometric properties of the scores as measures of respondents' relationship standards. The CWICSS has a 2-level factor structure with the items reflecting the 8 predicted standards. The 4 Western derived standards loaded onto a higher order factor of couple bond, and the 4 Chinese derived standards loaded onto a higher order factor of family responsibility. The scale scores were structurally equivalent across cultures, genders, and 2 independent samples, and good convergent and discriminant validity was found for the interpretation of scale scores as respondents' endorsement of the predicted standards. Scores on the 8 scales and 2 superordinate scales showed high internal consistency and test-retest coefficients. Chinese endorsed all 4 family responsibility standards more strongly than did Westerners, but Chinese and Western participants were similar in endorsement of couple bond standards. Across both cultures, couple bond standards were endorsed more highly than were family responsibility standards. The CWICSS assesses potential areas of conflict in Chinese-Western relationships. (c) 2015 APA, all rights reserved.
Psychoeducational Characteristics of Children with Hypohidrotic Ectodermal Dysplasia

PubMed Central

Maxim, Rolanda A.; Zinner, Samuel H.; Matsuo, Hisako; Prosser, Theresa M.; Fete, Mary; Leet, Terry L.; Fete, Timothy J.

2012-01-01

Objective. Hypohidrotic ectodermal dysplasia (HED) is an X-linked hereditary disorder characterized by hypohidrosis, hypotrichosis, and anomalous dentition. Estimates of up to 50% of affected children having intellectual disability are controversial. Method. In a cross-sectional study, 45 youth with HED (77% males, mean age 9.75 years) and 59 matched unaffected controls (70% males, mean age 9.79 years) were administered the Kaufman Brief Intelligence Test and the Kaufman Test of Educational Achievement, and their parents completed standardized neurodevelopmental and behavioral measures, educational, and health-related information regarding their child, as well as standardized and nonstandardized data regarding socioeconomic information for their family. Results. There were no statistically significant differences between the two groups in intelligence quotient composite and educational achievement scores, suggesting absence of learning disability in either group. No gender differences within or between groups were found on any performance measures. Among affected youth, parental education level correlated positively with (1) cognitive vocabulary scores and cognitive composite scores; (2) educational achievement for mathematics, reading, and composite scores. Conclusion. Youth affected with HED and unaffected matched peers have similar profiles on standardized measures of cognition, educational achievement, and adaptive functioning although children with HED may be at increased risk for ADHD. PMID:22536143
Highlights of Conference on Using Student Test Scores to Measure Teacher Performance: The State of the Art in Research and Practice

ERIC Educational Resources Information Center

Guarino, Cassandra; Reckase, Mark D.; Wooldridge, Jeffrey M.

2013-01-01

The push for accountability in public schooling has extended to the measurement of teacher performance, accelerated by federal efforts through Race to the Top. Currently, a large number of states and districts across the country are computing measures of teacher performance based on the standardized test scores of their students and using them to…
Setting Cut Scores on an EFL Placement Test Using the Prototype Group Method: A Receiver Operating Characteristic (ROC) Analysis

ERIC Educational Resources Information Center

Eckes, Thomas

2017-01-01

This paper presents an approach to standard setting that combines the prototype group method (PGM; Eckes, 2012) with a receiver operating characteristic (ROC) analysis. The combined PGM-ROC approach is applied to setting cut scores on a placement test of English as a foreign language (EFL). To implement the PGM, experts first named learners whom…
Testing for independence in J×K contingency tables with complex sample survey data.

PubMed

Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C

2015-09-01

The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.
The Objective Borderline Method: A Probabilistic Method for Standard Setting

ERIC Educational Resources Information Center

Shulruf, Boaz; Poole, Phillippa; Jones, Philip; Wilkinson, Tim

2015-01-01

A new probability-based standard setting technique, the Objective Borderline Method (OBM), was introduced recently. This was based on a mathematical model of how test scores relate to student ability. The present study refined the model and tested it using 2500 simulated data-sets. The OBM was feasible to use. On average, the OBM performed well…

The Relationship between Mathematics Scores and Family and Consumer Science Education

ERIC Educational Resources Information Center

Welle, Stacy L.

2013-01-01

With the passage of the No Child Left Behind Act of 2001, public school districts in the United States are working to improve the achievement of students on state standardized tests for accountability. Teachers, administrators, and districts need to find ways to get all students to pass standardized tests. Mathematical concepts appear throughout…
An evaluation of nonclinical dissociation utilizing a virtual environment shows enhanced working memory and attention.

PubMed

Saidel-Goley, Isaac N; Albiero, Erin E; Flannery, Kathleen A

2012-02-01

Dissociation is a mental process resulting in the disruption of memory, perception, and sometimes identity. At a nonclinical level, only mild dissociative experiences occur. The nature of nonclinical dissociation is disputed in the literature, with some asserting that it is a beneficial information processing style and others positing that it is a psychopathological phenomenon. The purpose of this study was to further the understanding of nonclinical dissociation with respect to memory and attention, by including a more ecologically valid virtual reality (VR) memory task along with standard neuropsychological tasks. Forty-five undergraduate students from a small liberal arts college in the northeast participated for course credit. The participants completed a battery of tasks including two standard memory tasks, a standard attention task, and an experimental VR memory task; the VR task included immersion in a virtual apartment, followed by incidental object-location recall for objects in the virtual apartment. Support for the theoretical model portraying nonclinical dissociation as a beneficial information processing style was found in this study. Dissociation scores were positively correlated with working memory scores and attentional processing scores on the standard neuropsychological tasks. In terms of the VR task, dissociation scores were positively correlated with more false positive memories that could be the result of a tendency of nonclinical highly dissociative individuals to create more elaborative schemas. This study also demonstrates that VR paradigms add to the prediction of cognitive functioning in testing protocols using standard neuropsychological tests, while simultaneously increasing ecological validity.
The Pittsburgh Sleep Quality Index: validation of the Urdu translation.

PubMed

Hashmi, Ali Madeeh; Khawaja, Imran Shuja; Butt, Zeeshan; Umair, Muhammad; Naqvi, Suhaib Haider; Jawad-Ul-Haq

2014-02-01

To translate and validate the Pittsburgh Sleep Quality Index (PSQI), a standardized self-administered questionnaire for the assessment of subjective sleep quality into the Urdu language. Validation study. Mayo Hospital, Lahore, from March to April 2012. The PSQI was translated into Urdu following standard guidelines. The final Urdu version (PSQI-U) was administered to 200 healthy volunteers comprising medical students, nursing staff and doctors. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation of component scores with global score was assessed by calculating Spearman correlation coefficient. Correlation between global PSQI-U scores at baseline with global scores for each PSQI-U and PSQI-E at 4-week interval was evaluated by calculating Spearman correlation coefficient. Moreover, scores on individual items of the scale at baseline were compared with respective scores after 4-week by t-test. One hundred and eighty five (185) participants completed the PSQI-U at baseline. The Cronbach alpha for PSQI-U was 0.56. Scores on individual components of the PSQI-U and composite scores were all highly correlated with each other (all p-values < 0.01). Composite scores for PSQI-U at baseline and PSQI-E at 4-week interval were also highly correlated with each other (Spearman correlation coefficient 0.74, p-value < 0.01) indicating good linguistic interchangeability. Composite scores for PSQI-U at baseline and at 4-week interval were positively correlated with each other (Spearman correlation coefficient 0.70, p < 0.01) indicating good test-retest reliability. The PSQI-U is a valid and reliable instrument for the assessment of sleep quality. It shows good linguistic interchangeability and test-retest reliability in comparison to the original English version when applied to individuals who speak the Urdu language. The PSQI-U can be a tool either for clinical management or research.
Procedures for Constructing and Using Criterion-Referenced Performance Tests.

ERIC Educational Resources Information Center

Campbell, Clifton P.; Allender, Bill R.

1988-01-01

Criterion-referenced performance tests (CRPT) provide a realistic method for objectively measuring task proficiency against predetermined attainment standards. This article explains the procedures of constructing, validating, and scoring CRPTs and includes a checklist for a welding test. (JOW)
49 CFR 383.135 - Passing knowledge and skills tests.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 49 Transportation 5 2013-10-01 2013-10-01 false Passing knowledge and skills tests. 383.135... COMMERCIAL DRIVER'S LICENSE STANDARDS; REQUIREMENTS AND PENALTIES Tests § 383.135 Passing knowledge and skills tests. (a) Knowledge tests. (1) To achieve a passing score on each of the knowledge tests, a...
49 CFR 383.135 - Passing knowledge and skills tests.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 49 Transportation 5 2011-10-01 2011-10-01 false Passing knowledge and skills tests. 383.135... COMMERCIAL DRIVER'S LICENSE STANDARDS; REQUIREMENTS AND PENALTIES Tests § 383.135 Passing knowledge and skills tests. (a) Knowledge tests. (1) To achieve a passing score on each of the knowledge tests, a...
49 CFR 383.135 - Passing knowledge and skills tests.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 49 Transportation 5 2014-10-01 2014-10-01 false Passing knowledge and skills tests. 383.135... COMMERCIAL DRIVER'S LICENSE STANDARDS; REQUIREMENTS AND PENALTIES Tests § 383.135 Passing knowledge and skills tests. (a) Knowledge tests. (1) To achieve a passing score on each of the knowledge tests, a...
Comparing the MMPI-2 Scale Scores of Parents Involved in Parental Competency and Child Custody Assessments

ERIC Educational Resources Information Center

Resendes, John; Lecci, Len

2012-01-01

MMPI-2 scores from a parent competency sample (N = 136 parents) are compared with a previously published data set of MMPI-2 scores for child custody litigants (N = 508 parents; Bathurst et al., 1997). Independent samples t tests yielded significant and in some cases substantial differences on the standard MMPI-2 clinical scales (especially Scales…
Intelligent Use of Intelligence Tests: Empirical and Clinical Support for Canadian WAIS-IV Norms

ERIC Educational Resources Information Center

Miller, Jessie L.; Weiss, Lawrence G.; Beal, A. Lynne; Saklofske, Donald H.; Zhu, Jianjun; Holdnack, James A.

2015-01-01

It is well established that Canadians produce higher raw scores than their U.S. counterparts on intellectual assessments. As a result of these differences in ability along with smaller variability in the population's intellectual performance, Canadian normative data will yield lower standard scores for most raw score points compared to U.S. norms.…
CBM Maze-Scores as Indicators of Reading Level and Growth for Seventh-Grade Students

ERIC Educational Resources Information Center

Chung, Siuman; Espin, Christine A.; Stevenson, Claire E.

2018-01-01

The technical adequacy of CBM maze-scores as indicators of reading level and growth for seventh-grade secondary-school students was examined. Participants were 452 Dutch students who completed weekly maze measures over a period of 23 weeks. Criterion measures were school level, dyslexia status, scores and growth on a standardized reading test.…
42 CFR 493.841 - Standard; Routine chemistry.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 5 2010-10-01 2010-10-01 false Standard; Routine chemistry. 493.841 Section 493.841 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.841 Standard; Routine chemistry. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.843 - Standard; Endocrinology.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 42 Public Health 5 2010-10-01 2010-10-01 false Standard; Endocrinology. 493.843 Section 493.843 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.843 Standard; Endocrinology. (a) Failure to attain a score of at least 80 percent of...
42 CFR 493.841 - Standard; Routine chemistry.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 42 Public Health 5 2011-10-01 2011-10-01 false Standard; Routine chemistry. 493.841 Section 493.841 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.841 Standard; Routine chemistry. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.841 - Standard; Routine chemistry.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 5 2013-10-01 2013-10-01 false Standard; Routine chemistry. 493.841 Section 493.841 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.841 Standard; Routine chemistry. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.841 - Standard; Routine chemistry.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 5 2014-10-01 2014-10-01 false Standard; Routine chemistry. 493.841 Section 493.841 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.841 Standard; Routine chemistry. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.841 - Standard; Routine chemistry.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 5 2012-10-01 2012-10-01 false Standard; Routine chemistry. 493.841 Section 493.841 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.841 Standard; Routine chemistry. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.837 - Standard; General immunology.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 5 2014-10-01 2014-10-01 false Standard; General immunology. 493.837 Section 493.837 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.837 Standard; General immunology. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.837 - Standard; General immunology.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 5 2013-10-01 2013-10-01 false Standard; General immunology. 493.837 Section 493.837 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.837 Standard; General immunology. (a) Failure to attain a score of at least 80 percent...
42 CFR 493.837 - Standard; General immunology.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 5 2012-10-01 2012-10-01 false Standard; General immunology. 493.837 Section 493.837 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES... These Tests § 493.837 Standard; General immunology. (a) Failure to attain a score of at least 80 percent...
Testing Our Limits

ERIC Educational Resources Information Center

Tempel, Melissa Bollow

2012-01-01

Computerized testing, including the widely used MAP test, has infiltrated the public schools in Milwaukee and across the nation, bringing with it a frightening future for public education. High-stakes standardized tests can be scored almost immediately via the internet, and testing companies can now easily link districts to their online data…

Total recognition discriminability in Huntington's and Alzheimer's disease.

PubMed

Graves, Lisa V; Holden, Heather M; Delano-Wood, Lisa; Bondi, Mark W; Woods, Steven Paul; Corey-Bloom, Jody; Salmon, David P; Delis, Dean C; Gilbert, Paul E

2017-03-01

Both the original and second editions of the California Verbal Learning Test (CVLT) provide an index of total recognition discriminability (TRD) but respectively utilize nonparametric and parametric formulas to compute the index. However, the degree to which population differences in TRD may vary across applications of these nonparametric and parametric formulas has not been explored. We evaluated individuals with Huntington's disease (HD), individuals with Alzheimer's disease (AD), healthy middle-aged adults, and healthy older adults who were administered the CVLT-II. Yes/no recognition memory indices were generated, including raw nonparametric TRD scores (as used in CVLT-I) and raw and standardized parametric TRD scores (as used in CVLT-II), as well as false positive (FP) rates. Overall, the patient groups had significantly lower TRD scores than their comparison groups. The application of nonparametric and parametric formulas resulted in comparable effect sizes for all group comparisons on raw TRD scores. Relative to the HD group, the AD group showed comparable standardized parametric TRD scores (despite lower raw nonparametric and parametric TRD scores), whereas the previous CVLT literature has shown that standardized TRD scores are lower in AD than in HD. Possible explanations for the similarity in standardized parametric TRD scores in the HD and AD groups in the present study are discussed, with an emphasis on the importance of evaluating TRD scores in the context of other indices such as FP rates in an effort to fully capture recognition memory function using the CVLT-II.
Determining Learning Disabilities in Mathematics.

ERIC Educational Resources Information Center

Dunlap, William P.; And Others

1979-01-01

To determine the generalizability of reading expectancy formulas in ascertaining mathematics expectancy levels, correlation coefficients were computed between the scores of 150 Ss (7 to 12 years old) with learning problems on standardized mathematics and reading tests and expectancy scores. Formulas correlated higher with Ss' actual mathematics…
College Readiness Standards[TM] for EXPLORE[R], PLAN[R], and the ACT[R]: Includes Ideas for Progress

ERIC Educational Resources Information Center

ACT, Inc., 2008

2008-01-01

At the foundation of the Educational Planning and Assessment System (EPAS) programs are ACT's College Readiness Standards. The Standards offer learning strategies that are likely to help students meet state standards and acquire the more advanced concepts associated with higher EPAS test scores and, more importantly, increased college readiness.…
Age-related invariance of abilities measured with the Wechsler Adult Intelligence Scale-IV.

PubMed

Sudarshan, Navaneetham J; Bowden, Stephen C; Saklofske, Donald H; Weiss, Lawrence G

2016-11-01

Assessment of measurement invariance across populations is essential for meaningful comparison of test scores, and is especially relevant where repeated measurements are required for educational assessment or clinical diagnosis. Establishing measurement invariance legitimizes the assumption that test scores reflect the same psychological trait in different populations or across different occasions. Examination of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) U.S. standardization samples revealed that a first-order 5-factor measurement model was best fitting across 9 age groups from 16 years to 69 years. Strong metric invariance was found for 3 of 5 factors and partial intercept invariance for the remaining 2. Pairwise comparisons of adjacent age groups supported the inference that cognitive-trait group differences are manifested by group differences in the test scores. In educational and clinical settings these findings provide theoretical and empirical support to interpret changes in the index or subtest scores as reflecting changes in the corresponding cognitive abilities. Further, where clinically relevant, the subtest score composites can be used to compare changes in respective cognitive abilities. The model was supported in the Canadian standardization data with pooled age groups but the sample sizes were not adequate for detailed examination of separate age groups in the Canadian sample. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Shifting the Curve: Fostering Academic Success in a Diverse Student Body

PubMed Central

Herbert-Carter, Janice; Smith, Marjorie; Klement, Brenda; Knight, Brandi Brandon; Anachebe, Ngozi F.

2018-01-01

Problem Diversity in the health care workforce is key to achieving health equity. Although U.S. medical schools have worked to increase the matriculation and academic success of underrepresented minority (URM) students (African Americans, Latinos, others), they have had only limited success. Lower standardized test scores, including on the Medical College Admission Test (MCAT), have been a barrier to matriculation for many URM applicants. Lower subsequent standardized exam scores, including on the United States Medical Licensing Exam Step 1, also have been an impediment to students’ progress, with mean scores for URM students lagging behind those for others. Approach Faculty at the Morehouse School of Medicine developed and implemented interventions to enhance the academic success of their URM students (about 75% are African American, and 5% are from other URM groups). To assess the outcomes of this work, the authors analyzed the MCAT scores and subsequent Step 1 scores of students in the graduating classes of 2009–2014. They also reviewed course evaluations, Graduation Questionnaires, and student and faculty interviews and focus groups. Outcomes Students’ Step 1 scores exceeded those expected based on their MCAT scores. This success was due to three key elements: (1) milieu and mentoring, (2) structure and content of the curriculum, and (3) monitoring. Next Steps A series of mixed-method studies are planned to better discern the core elements of faculty–student relationships that are key to students’ success. Lower test scores are not a fixed attribute; with the elements described, success is attainable for all students. PMID:28678099
New graduate students' baseline knowledge of the responsible conduct of research.

PubMed

Heitman, Elizabeth; Olsen, Cara H; Anestidou, Lida; Bulger, Ruth Ellen

2007-09-01

To assess (1) new biomedical science graduate students' baseline knowledge of core concepts and standards in responsible conduct of research (RCR), (2) differences in graduate students' baseline knowledge overall and across the Office of Research Integrity's nine core areas, and (3) demographic and educational factors in these differences. A 30-question, computer-scored multiple-choice test on core concepts and standards of RCR was developed following content analysis of 20 United States-published RCR texts, and combined with demographic questions on undergraduate experience with RCR developed from graduate student focus groups. Four hundred two new graduate students at three health science universities were recruited for Scantron and online testing before beginning RCR instruction. Two hundred fifty-one of 402 eligible trainees (62%) at three universities completed the test; scores ranged from 26.7% to 83.3%, with a mean of 59.5%. Only seven (3%) participants scored 80% or above. Students who received their undergraduate education outside the United States scored significantly lower (mean 52.0%) than those with U.S. bachelor's degrees (mean 60.5%, P < .001). Participants with prior graduate biomedical or health professions education scored marginally higher than new students, but both groups' mean scores were well below 80%. The mean score of 16 participants who reported previous graduate-level RCR instruction was 67.7%. Participants' specific knowledge varied, but overall scores were universally low. New graduate biomedical sciences students have inadequate and inconsistent knowledge of RCR, irrespective of their prior education or experience. Incoming trainees with previous graduate RCR education may also have gaps in core knowledge.
Feeding the pipeline: academic skills training for predental students.

PubMed

Markel, Geraldine; Woolfolk, Marilyn; Inglehart, Marita Rohr

2008-06-01

This article reports the outcomes of an evaluation conducted to determine if an academic skills training program for undergraduate predental students from underrepresented minority backgrounds increased the students' standardized academic skills test scores for vocabulary, reading comprehension, reading rates, spelling, and math as well as subject-specific test results in biology, chemistry, and physics. Data from standardized academic skill tests and subject-specific tests were collected at the beginning and end of the 1998 to 2006 Pipeline Programs, six-week summer enrichment programs for undergraduate predental students from disadvantaged backgrounds. In total, 179 students (75.4 percent African American, 7.3 percent Hispanic, 5.6 percent Asian American, 5 percent white) attended the programs during these nine summers. Scores on the Nelson-Denny Reading Test showed that the students improved their vocabulary scores (percentile ranks before/after: 46.80 percent/59.56 percent; p<.001), reading comprehension scores (47.21 percent/62.67 percent; p<.001), and reading rates (34.01 percent/78.31 percent; p<.001) from the beginning to the end of the summer programs. Results on the Wide Range Achievement Test III showed increases in spelling (73.58 percent/86.22 percent; p<.001) and math scores (56.98 percent/81.28 percent; p<.001). The students also improved their subject-specific scores in biology (39.07 percent/63.42 percent; p<.001), chemistry (20.54 percent/51.01 percent; p<.001), and physics (35.12 percent/61.14 percent; p<.001). To increase the number of underrepresented minority students in the dental school admissions pool, efforts are needed to prepare students from disadvantaged backgrounds for this process. These data demonstrate that a six-week enrichment program significantly improved the academic skills and basic science knowledge scores of undergraduate predental students. These improvements have the potential to enhance the performance of these students in college courses and thus increase their level of competitiveness in the dental school admissions process.
Validation of the French version of the BACS (the brief assessment of cognition in schizophrenia) among 50 French schizophrenic patients.

PubMed

Bralet, Marie-Cécile; Falissard, Bruno; Neveu, Xavier; Lucas-Ross, Margaret; Eskenazi, Anne-Marie; Keefe, Richard S E

2007-09-01

Schizophrenic patients demonstrate impairments in several key dimensions of cognition. These impairments are correlated with important aspects of functional outcome. While assessment of these cognition disorders is increasingly becoming a part of clinical and research practice in schizophrenia, there is no standard and easily administered test battery. The BACS (Brief Assessment of Cognition in Schizophrenia) has been validated in English language [Keefe RSE, Golberg TE, Harvey PD, Gold JM, Poe MP, Coughenour L. The Brief Assessment of Cognition in Schizophrenia: reliability, sensibility, and comparison with a standard neurocognitive battery. Schizophr. Res 2004;68:283-97], and was found to be as sensitive to cognitive dysfunction as a standard battery of tests, with the advantage of requiring less than 35 min to complete. We developed a French adaptation of the BACS and this study tested its ease of administration and concurrent validity. Correlation analyses between the BACS (version A) and a standard battery were performed. A sample of 50 stable schizophrenic patients received the French Version A of the BACS in a first session, and in a second session a standard battery. All the patients completed each of the subtests of the French BACS . The mean duration of completion for the BACS French version was 36 min (S.D.=5.56). A correlation analysis between the BACS (version A) global score and the standard battery global score showed a significant result (r=0.81, p<0.0001). The correlation analysis between the BACS (version A) sub-scores and the standard battery sub-scores showed significant results for verbal memory, working memory, verbal fluency, attention and speed of information processing and executive functions (p<0.001) and for motor speed (p<0.05). The French Version of the BACS is easier to use in French schizophrenic patients compared to a standard battery (administration shorter and completion rate better) and its good psychometric properties suggest that the French Version of the BACS may be a useful tool for assessing cognition in schizophrenic patients with French as their primary language.
Do "TOEFL iBT"® Scores Reflect Improvement in English-Language Proficiency? Extending the TOEFL iBT Validity Argument. Research Report. ETS RR-14-09

ERIC Educational Resources Information Center

Ling, Guangming; Powers, Donald E.; Adler, Rachel M.

2014-01-01

One fundamental way to determine the validity of standardized English-language test scores is to investigate the extent to which they reflect anticipated learning effects in different English-language programs. In this study, we investigated the extent to which the "TOEFL iBT"® practice test reflects the learning effects of students at…
Pilot study: EatFit impacts sixth graders' academic performance on achievement of mathematics and english education standards.

PubMed

Shilts, Mical Kay; Lamp, Cathi; Horowitz, Marcel; Townsend, Marilyn S

2009-01-01

Investigate the impact of a nutrition education program on student academic performance as measured by achievement of education standards. Quasi-experimental crossover-controlled study. California Central Valley suburban elementary school (58% qualified for free or reduced-priced lunch). All sixth-grade students (n = 84) in the elementary school clustered in 3 classrooms. 9-lesson intervention with an emphasis on guided goal setting and driven by the Social Cognitive Theory. Multiple-choice survey assessing 5 education standards for sixth-grade mathematics and English at 3 time points: baseline (T1), 5 weeks (T2), and 10 weeks (T3). Repeated measures, paired t test, and analysis of covariance. Changes in total scores were statistically different (P < .05), with treatment scores (T3 - T2) generating more gains. The change scores for 1 English (P < .01) and 2 mathematics standards (P < .05; P < .001) were statistically greater for the treatment period (T3 - T2) compared to the control period (T2 - T1). Using standardized tests, results of this pilot study suggest that EatFit can improve academic performance measured by achievement of specific mathematics and English education standards. Nutrition educators can show school administrators and wellness committee members that this program can positively impact academic performance, concomitant to its primary objective of promoting healthful eating and physical activity.
[Impact of passing items above the ceiling on the assessment results of Peabody developmental motor scales].

PubMed

Zhao, Gai; Bian, Yang; Li, Ming

2013-12-18

To analyze the impact of passing items above the roof level in the gross motor subtest of Peabody development motor scales (PDMS-2) on its assessment results. In the subtests of PDMS-2, 124 children from 1.2 to 71 months were administered. Except for the original scoring method, a new scoring method which includes passing items above the ceiling were developed. The standard scores and quotients of the two scoring methods were compared using the independent-samples t test. Only one child could pass the items above the ceiling in the stationary subtest, 19 children in the locomotion subtest, and 17 children in the visual-motor integration subtest. When the scores of these passing items were included in the raw scores, the total raw scores got the added points of 1-12, the standard scores added 0-1 points and the motor quotients added 0-3 points. The diagnostic classification was changed only in two children. There was no significant difference between those two methods about motor quotients or standard scores in the specific subtest (P>0.05). The passing items above a ceiling of PDMS-2 isn't a rare situation. It usually takes place in the locomotion subtest and visual-motor integration subtest. Including these passing items into the scoring system will not make significant difference in the standard scores of the subtests or the developmental motor quotients (DMQ), which supports the original setting of a ceiling established by upassing 3 items in a row. However, putting the passing items above the ceiling into the raw score will improve tracking of children's developmental trajectory and intervention effects.
The effects of calculator-based laboratories on standardized test scores

NASA Astrophysics Data System (ADS)

Stevens, Charlotte Bethany Rains

Nationwide, the goal of providing a productive science and math education to our youth in today's educational institutions is centering itself around the technology being utilized in these classrooms. In this age of digital technology, educational software and calculator-based laboratories (CBL) have become significant devices in the teaching of science and math for many states across the United States. Among the technology, the Texas Instruments graphing calculator and Vernier Labpro interface, are among some of the calculator-based laboratories becoming increasingly popular among middle and high school science and math teachers in many school districts across this country. In Tennessee, however, it is reported that this type of technology is not regularly utilized at the student level in most high school science classrooms, especially in the area of Physical Science (Vernier, 2006). This research explored the effect of calculator based laboratory instruction on standardized test scores. The purpose of this study was to determine the effect of traditional teaching methods versus graphing calculator teaching methods on the state mandated End-of-Course (EOC) Physical Science exam based on ability, gender, and ethnicity. The sample included 187 total tenth and eleventh grade physical science students, 101 of which belonged to a control group and 87 of which belonged to the experimental group. Physical Science End-of-Course scores obtained from the Tennessee Department of Education during the spring of 2005 and the spring of 2006 were used to examine the hypotheses. The findings of this research study suggested the type of teaching method, traditional or calculator based, did not have an effect on standardized test scores. However, the students' ability level, as demonstrated on the End-of-Course test, had a significant effect on End-of-Course test scores. This study focused on a limited population of high school physical science students in the middle Tennessee Putnam County area. The study should be reproduced in various school districts in the state of Tennessee to compare the findings.
AMP!: A Cross-site Analysis of the Effects of a Theater-based Intervention on Adolescent Awareness, Attitudes, and Knowledge about HIV.

PubMed

Taggart, Tamara; Taboada, Arianna; Stein, Judith A; Milburn, Norweeta G; Gere, David; Lightfoot, Alexandra F

2016-07-01

AMP! (Arts-based, Multiple component, Peer-education) is an HIV intervention developed for high school adolescents. AMP! uses interactive theater-based scenarios developed by trained college undergraduates to deliver messages addressing HIV/STI prevention strategies, healthy relationships, and stigma reduction towards people living with HIV/AIDS. We used a pre-test/post-test, control group study design to simultaneously assess intervention effect on ninth grade students in an urban county in California (N = 159) and a suburban county in North Carolina (N = 317). In each location, the control group received standard health education curricula delivered by teachers; the intervention group received AMP! in addition to standard health education curricula. Structural equation modeling was used to determine intervention effects. The post-test sample was 46 % male, 90 % self-identified as heterosexual, 32 % reported receiving free or reduced lunch, and 49 % White. Structural models indicated that participation in AMP! predicted higher scores on HIV knowledge (p = 0.05), HIV awareness (p = 0.01), and HIV attitudes (p = 0.05) at the post-test. Latent means comparison analyses revealed post-test scores were significantly higher than pre-test scores on HIV knowledge (p = 0.001), HIV awareness (p = 0.001), and HIV attitudes (p = 0.001). Further analyses indicated that scores rose for both groups, but the post-test scores of intervention participants were significantly higher than controls (HIV knowledge (p = 0.01), HIV awareness (p = 0.01), and HIV attitudes (p = 0.05)). Thus, AMP!'s theater-based approach shows promise for addressing multiple adolescent risk factors and attitudes concerning HIV in school settings.
Validation of the tablet-administered Brief Assessment of Cognition (BAC App).

PubMed

Atkins, Alexandra S; Tseng, Tina; Vaughan, Adam; Twamley, Elizabeth W; Harvey, Philip; Patterson, Thomas; Narasimhan, Meera; Keefe, Richard S E

2017-03-01

Computerized tests benefit from automated scoring procedures and standardized administration instructions. These methods can reduce the potential for rater error. However, especially in patients with severe mental illnesses, the equivalency of traditional and tablet-based tests cannot be assumed. The Brief Assessment of Cognition in Schizophrenia (BACS) is a pen-and-paper cognitive assessment tool that has been used in hundreds of research studies and clinical trials, and has normative data available for generating age- and gender-corrected standardized scores. A tablet-based version of the BACS called the BAC App has been developed. This study compared performance on the BACS and the BAC App in patients with schizophrenia and healthy controls. Test equivalency was assessed, and the applicability of paper-based normative data was evaluated. Results demonstrated the distributions of standardized composite scores for the tablet-based BAC App and the pen-and-paper BACS were indistinguishable, and the between-methods mean differences were not statistically significant. The discrimination between patients and controls was similarly robust. The between-methods correlations for individual measures in patients were r>0.70 for most subtests. When data from the Token Motor Test was omitted, the between-methods correlation of composite scores was r=0.88 (df=48; p<0.001) in healthy controls and r=0.89 (df=46; p<0.001) in patients, consistent with the test-retest reliability of each measure. Taken together, results indicate that the tablet-based BAC App generates results consistent with the traditional pen-and-paper BACS, and support the notion that the BAC App is appropriate for use in clinical trials and clinical practice. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Predicting motor outcome at preschool age for infants tested at 7, 30, 60, and 90 days after term age using the Test of Infant Motor Performance.

PubMed

Kolobe, Thubi H A; Bulanda, Michelle; Susman, Louisa

2004-12-01

Accurate and diagnostic measures are central to early identification and intervention with infants who are at risk for developmental delays or disabilities. The purpose of this study was to examine (1) the ability of infants' Test of Infant Motor Performance (TIMP) scores at 7, 30, 60 and 90 days after term age to predict motor development at preschool age and (2) the contribution of the home environment and medical risk to the prediction. Sixty-one children from an original cohort of 90 infants who were assessed weekly with the TIMP, between 34 weeks gestational age and 4 months after term age, participated in this follow-up study. The Peabody Developmental Motor Scales, 2nd edition (PDMS-2), were administered to the children at the mean age of 57 months (SD=4.8 months). The quality and quantity of the home environment also were assessed at this age using the Early Childhood Home Observation for Measurement of the Environment (EC-HOME). Pearson product moment correlation coefficients, multiple regression, sensitivity and specificity, and positive and negative predictive values were used to assess the relationship among the TIMP, HOME, medical risk, and PDMS-2 scores. The correlation coefficients between the TIMP and PDMS-2 scores were statistically significant for all ages except at 7 days. The highest correlation coefficient was at 90 days (r=.69, P=.001). The TIMP scores at 30, 60, and 90 days after term; medical risk scores; and EC-HOME scores explained 24%, 23%, and 52% of the variance in the PDMS-2 scores, respectively. The TIMP score at 90 days after term was the most significant contributor to the prediction. The TIMP cutoff score of -0.5 standard deviation below the mean correctly classified 80%, 79%, and 87% of the children using a cutoff score of -2 standard deviations on the PDMS-2 at 30, 60, and 90 days, respectively. The results compare favorably with those of developmental tests administered to infants at 6 months of age or older. These findings underscore the need for age-specific test values and developmental surveillance of infants before making referrals.
Test-Retest Reliability of Standard and Emotional Stroop Tasks: An Investigation of Color-Word and Picture-Word Versions

ERIC Educational Resources Information Center

Strauss, Gregory P.; Allen, Daniel N.; Jorgensen, Melinda L.; Cramer, Stacey L.

2005-01-01

Previous studies have examined the reliability of scores derived from various Stroop tasks. However, few studies have compared reliability of more recently developed Stroop variants such as emotional Stroop tasks to standard versions of the Stroop. The current study developed four different single-stimulus Stroop tasks and compared test-retest…
Curriculum-Based Measurement in Writing: Predicting the Success of High-School Students on State Standards Tests

ERIC Educational Resources Information Center

Espin, Christine; Wallace, Teri; Campbell, Heather; Lembke, Erica S.; Long, Jeffrey D.; Ticha, Renata

2008-01-01

We examined the technical adequacy of writing progress measures as indicators of success on state standards tests. Tenth-grade students wrote for 10 min, marking their samples at 3, 5, and 7 min. Samples were scored for words written, words spelled correctly, and correct and correct minus incorrect word sequences. The number of correct minus…
The "Pedagogy of the Oppressed": The Necessity of Dealing with Problems in Students' Lives

ERIC Educational Resources Information Center

Reynolds, Patricia R.

2007-01-01

Students have problems in their lives, but can teachers help them? Should teachers help? The No Child Left Behind (NCLB) act and its emphasis on standardized test results have forced school systems to produce high scores, and in turn school administrators pressure teachers to prepare students for taking standardized tests. Teachers may want to…
Cross-cultural adaptation, reliability and validity of the Turkish version of the Hospital for Special Surgery (HSS) Knee Score.

PubMed

Narin, Selnur; Unver, Bayram; Bakırhan, Serkan; Bozan, Ozgür; Karatosun, Vasfi

2014-01-01

The purpose of this study was to adapt the English version of the Hospital for Special Surgery (HSS) knee score for use in a Turkish population and to evaluate its validity, reliability and cultural adaptation. Standard forward-back translation of the HSS knee score was performed and the Turkish version was applied in 73 patients. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Mini-Mental State Examination and sit-to-stand test were also performed and analyzed. Internal consistency reliability was tested using Cronbach's alpha. The intraclass correlation coefficient (ICC) was used to calculate the test-retest reliability at one-week intervals. Validity was assessed by calculating the Pearson correlation between the HSS, WOMAC and sit-to-stand test scores. The ICC ranged from 0.98 to 0.99 with high internal consistency (Cronbach's alpha: 0.87). The WOMAC score correlated with total HSS score (r: -0.80, p<0.001) and sit-to-stand score (r: 0.12, p: 0.312). The Turkish version of the HSS knee score is reliable and valid in evaluating the total knee arthroplasty in Turkish patients.
Development and validation of a composite scoring system for robot-assisted surgical training--the Robotic Skills Assessment Score.

PubMed

Chowriappa, Ashirwad J; Shi, Yi; Raza, Syed Johar; Ahmed, Kamran; Stegemann, Andrew; Wilding, Gregory; Kaouk, Jihad; Peabody, James O; Menon, Mani; Hassett, James M; Kesavadas, Thenkurussi; Guru, Khurshid A

2013-12-01

A standardized scoring system does not exist in virtual reality-based assessment metrics to describe safe and crucial surgical skills in robot-assisted surgery. This study aims to develop an assessment score along with its construct validation. All subjects performed key tasks on previously validated Fundamental Skills of Robotic Surgery curriculum, which were recorded, and metrics were stored. After an expert consensus for the purpose of content validation (Delphi), critical safety determining procedural steps were identified from the Fundamental Skills of Robotic Surgery curriculum and a hierarchical task decomposition of multiple parameters using a variety of metrics was used to develop Robotic Skills Assessment Score (RSA-Score). Robotic Skills Assessment mainly focuses on safety in operative field, critical error, economy, bimanual dexterity, and time. Following, the RSA-Score was further evaluated for construct validation and feasibility. Spearman correlation tests performed between tasks using the RSA-Scores indicate no cross correlation. Wilcoxon rank sum tests were performed between the two groups. The proposed RSA-Score was evaluated on non-robotic surgeons (n = 15) and on expert-robotic surgeons (n = 12). The expert group demonstrated significantly better performance on all four tasks in comparison to the novice group. Validation of the RSA-Score in this study was carried out on the Robotic Surgical Simulator. The RSA-Score is a valid scoring system that could be incorporated in any virtual reality-based surgical simulator to achieve standardized assessment of fundamental surgical tents during robot-assisted surgery. Copyright © 2013 Elsevier Inc. All rights reserved.

Assessment issues in the testing of children at school entry.

PubMed

Rock, Donald A; Stenner, A Jackson

2005-01-01

The authors introduce readers to the research documenting racial and ethnic gaps in school readiness. They describe the key tests, including the Peabody Picture Vocabulary Test (PPVT), the Early Childhood Longitudinal Study (ECLS), and several intelligence tests, and describe how they have been administered to several important national samples of children. Next, the authors review the different estimates of the gaps and discuss how to interpret these differences. In interpreting test results, researchers use the statistical term "standard deviation" to compare scores across the tests. On average, the tests find a gap of about 1 standard deviation. The ECLS-K estimate is the lowest, about half a standard deviation. The PPVT estimate is the highest, sometimes more than 1 standard deviation. When researchers adjust those gaps statistically to take into account different outside factors that might affect children's test scores, such as family income or home environment, the gap narrows but does not disappear. Why such different estimates of the gap? The authors consider explanations such as differences in the samples, racial or ethnic bias in the tests, and whether the tests reflect different aspects of school "readiness," and conclude that none is likely to explain the varying estimates. Another possible explanation is the Spearman Hypothesis-that all tests are imperfect measures of a general ability construct, g; the more highly a given test correlates with g, the larger the gap will be. But the Spearman Hypothesis, too, leaves questions to be investigated. A gap of 1 standard deviation may not seem large, but the authors show clearly how it results in striking disparities in the performance of black and white students and why it should be of serious concern to policymakers.
Professional Testing Standards: What Educators Need To Know.

ERIC Educational Resources Information Center

Camara, Wayne J.

Real and perceived misuses of educational tests, errors in test scoring and test use, and incidents of cheating on tests have been widely reported in local and national media. As educational tests take on additional importance for students, teachers, and schools, there is appropriate concern about the quality of assessments and the appropriate use…
High Stakes in the Classroom, High Stakes on the Street: The Effects of Community Violence on Students' Standardized Test Performance. Working Paper #03-13

ERIC Educational Resources Information Center

Sharkey, Patrick; Schwartz, Amy Ellen; Ellen, Ingrid Gould; Lacoe, Johanna

2013-01-01

This paper examines the effect of exposure to violent crime on students' standardized test performance among a sample of students in New York City public schools. To identify the effect of exposure to community violence on children's test scores, we compare students exposed to an incident of violent crime on their own blockface in the week prior…
42 CFR 493.1236 - Standard: Evaluation of proficiency testing performance.

Code of Federal Regulations, 2011 CFR

2011-10-01

... the following: (1) Any analyte or subspecialty without analytes listed in subpart I of this part that is not evaluated or scored by a CMS-approved proficiency testing program. (2) Any analyte, specialty...
42 CFR 493.1236 - Standard: Evaluation of proficiency testing performance.

Code of Federal Regulations, 2010 CFR

2010-10-01

... the following: (1) Any analyte or subspecialty without analytes listed in subpart I of this part that is not evaluated or scored by a CMS-approved proficiency testing program. (2) Any analyte, specialty...
42 CFR 493.1236 - Standard: Evaluation of proficiency testing performance.

Code of Federal Regulations, 2014 CFR

2014-10-01

... the following: (1) Any analyte or subspecialty without analytes listed in subpart I of this part that is not evaluated or scored by a CMS-approved proficiency testing program. (2) Any analyte, specialty...
42 CFR 493.1236 - Standard: Evaluation of proficiency testing performance.

Code of Federal Regulations, 2012 CFR

2012-10-01

... the following: (1) Any analyte or subspecialty without analytes listed in subpart I of this part that is not evaluated or scored by a CMS-approved proficiency testing program. (2) Any analyte, specialty...
42 CFR 493.1236 - Standard: Evaluation of proficiency testing performance.

Code of Federal Regulations, 2013 CFR

2013-10-01

... the following: (1) Any analyte or subspecialty without analytes listed in subpart I of this part that is not evaluated or scored by a CMS-approved proficiency testing program. (2) Any analyte, specialty...
Higher Education Faculty Engagement in a Modified Mapmark Standard Setting

ERIC Educational Resources Information Center

Horst, S. Jeanne; DeMars, Christine E.

2016-01-01

The Mapmark standard setting method was adapted to a higher education setting in which faculty leaders were highly involved. Eighteen university faculty members participated in a day-long standard setting for a general education communications test. In Round 1, faculty set initial cut-scores for each of four student learning objectives. In Rounds…
Evaluation of modifications of the traditional patch test in assessing the chemical irritation potential of feminine hygiene products.

PubMed

Farage, Miranda A; Meyer, Sandy; Walter, Dave

2004-05-01

The first main objective of the work presented in this paper was to investigate ways of optimizing the current arm patch test protocol by (1) increasing the sensitivity of the test in order to evaluate more effectively the products that are inherently non-irritating, and/or (2) reducing the costs of these types of studies by shortening the protocol. The second main objective was to use the results of these studies and the results of the parallel studies conducted using the behind-the-knee method to better understand the contribution of mechanical irritation to the skin effects produced by these types of products. In addition, we were interested in continuing the evaluation of sensory effects and their relationship to objective measures of irritation. Test materials were prepared from three, currently marketed feminine protection pads. Wet and dry samples were applied to the upper arm using the standard 24-h patch test. Applications were repeated daily for 4 consecutive days. The test sites were scored for irritation prior to the first patch application, and 30-60 min after removal of each patch. Some test sites were treated by tape stripping the skin prior to the initial patch application. In addition, in one experiment, panelists were asked to keep a daily diary describing any sensory skin effects they noticed at each test site. All protocol variations ([intact skin/dry samples], [compromised skin/dry samples], [intact skin/wet samples], and [compromised skin/wet samples]) gave similar results for the products tested. When compared to the behind-the-knee test method, the standard upper arm patch test gave consistently lower levels of irritation when the test sites were scored shortly after patch removal, even though the sample application was longer (24 vs. 6 h) in the standard patch test. The higher level of irritation in the behind-the-knee method was likely due to mechanical irritation. The sensory skin effects did not appear to be related to a particular test product or a particular protocol variation. However, the mean irritation scores at those sites where a sensory effect was reported were higher than the mean irritation scores at those sites were no sensory effects were reported. All four protocol variations of the standard upper arm patch test can be used to assess the inherent chemical irritant properties of feminine protection products. For these products, which are inherently non-irritating, tape stripping and/or applying wet samples does not increase the sensitivity of the patch test method. Differences in irritation potential were apparent after one to three 24-h applications. Therefore, the standard patch test protocol can be shortened to three applications without compromising our ability to detect differences in the chemical irritation produced by the test materials. The patch test can be used to evaluate effectively the inherent chemical irritation potential of these types of products. However, this method is not suitable for testing the mechanical irritation due to friction that occurs during product use. There is no relationship between specific test conditions, i.e., compromised skin and/or testing wet samples and reports of perceived sensory reactions. However, there seems to be a clear relationship between sensory reactions and objective irritation scores.
Testing accommodation or modification? The effects of integrated object representation on enhancing geometry performance in children with and without geometry difficulties.

PubMed

Zhang, Dake; Wang, Qiu; Ding, Yi; Liu, Jeremy Jian

2014-01-01

According to the National Council of Teachers of Mathematics, geometry and spatial sense are fundamental components of mathematics learning. However, learning disabilities (LD) research has shown that many K-12 students encounter particular geometry difficulties (GD). This study examined the effect of an integrated object representation (IOR) accommodation on the test performance of students with GD compared to students without GD. Participants were 118 elementary students who took a researcher-developed geometry problem solving test under both a standard testing condition and an IOR accommodation condition. A total of 36 students who were classified with GD scored below 40% correct in the geometry problem solving test in the standard testing condition, and 82 students who were classified without GD scored equal to or above 40% correct in the same test and condition. All students were tested in both standard testing condition and IOR accommodation condition. The results from both ANOVA and regression discontinuity (RD) analyses suggested that students with GD benefited more than students without GD from the IOR accommodation. Implications of the study are discussed in terms of providing accommodations for students with mathematics learning difficulties and recommending RD design in LD research. © Hammill Institute on Disabilities 2013.
The Effects of Math Anxiety

ERIC Educational Resources Information Center

Andrews, Amanda; Brown, Jennifer

2015-01-01

Math anxiety is a reoccurring problem for many students, and the effects of this anxiety on college students are increasing. The purpose of this study was to examine the association between pre-enrollment math anxiety, standardized test scores, math placement scores, and academic success during freshman math coursework (i.e., pre-algebra, college…
Music Achievement and Academic Achievement: Isolating the School as a Unit of Study

ERIC Educational Resources Information Center

Frey-Clark, Marta

2015-01-01

Music participation and academic achievement have long been of interest to educators, researchers and policy makers. The literature is replete with studies linking music participation to higher state assessment scores, grade point averages, and Standardized Achievement Test (SAT) scores. If students from quality music programs academically…
Against Conventional Wisdom: Factors Influencing Hispanic Students' Reading Achievement

ERIC Educational Resources Information Center

Percell, Jay C.; Kaufman, Kristina

2013-01-01

The researchers performed a variable analysis of the 2002 Educational Longitudinal Study data investigating factors that influence students' reading scores on standardized tests. Hispanic and non-Hispanic Scores were analyzed and controlling variables were compared to determine the effect of each on both populations. Certain variables commonly…
Self-regulated learning and achievement by middle-school children.

PubMed

Sink, C A; Barnett, J E; Hixon, J E

1991-12-01

The relationship of self-regulated learning to the achievement test scores of 62 Grade 6 students was studied. Generally, the metacognitive and affective variables correlated significantly with teachers' grades and standardized test scores in mathematics, reading, and science. Planning and self-assessment significantly predicted the six measures of achievement. Step-wise multiple regression analyses using the metacognitive and affective variables largely indicate that students' and teachers' perceptions of scholastic ability and planning appear to be the most salient factors in predicting academic performance. The locus of control dimension had no utility in predicting classroom grades and performance on standardized measures of achievement. The implications of the findings for teaching and learning are discussed.
Testing in semiparametric models with interaction, with applications to gene-environment interactions.

PubMed

Maity, Arnab; Carroll, Raymond J; Mammen, Enno; Chatterjee, Nilanjan

2009-01-01

Motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we develop score tests in general semiparametric regression problems that involves Tukey style 1 degree-of-freedom form of interaction between parametrically and non-parametrically modelled covariates. We find that the score test in this type of model, as recently developed by Chatterjee and co-workers in the fully parametric setting, is biased and requires undersmoothing to be valid in the presence of non-parametric components. Moreover, in the presence of repeated outcomes, the asymptotic distribution of the score test depends on the estimation of functions which are defined as solutions of integral equations, making implementation difficult and computationally taxing. We develop profiled score statistics which are unbiased and asymptotically efficient and can be performed by using standard bandwidth selection methods. In addition, to overcome the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented by using standard computational methods. We present simulation studies to evaluate type I error and power of the method proposed compared with a naive test that does not consider interaction. Finally, we illustrate our methodology by analysing data from a case-control study of colorectal adenoma that was designed to investigate the association between colorectal adenoma and the candidate gene NAT2 in relation to smoking history.
Do racial and ethnic group differences in performance on the MCAT exam reflect test bias?

PubMed

Davis, Dwight; Dorsey, J Kevin; Franks, Ronald D; Sackett, Paul R; Searcy, Cynthia A; Zhao, Xiaohui

2013-05-01

The Medical College Admission Test (MCAT) is a standardized examination that assesses fundamental knowledge of scientific concepts, critical reasoning ability, and written communication skills. Medical school admission officers use MCAT scores, along with other measures of academic preparation and personal attributes, to select the applicants they consider the most likely to succeed in medical school. In 2008-2011, the committee charged with conducting a comprehensive review of the MCAT exam examined four issues: (1) whether racial and ethnic groups differ in mean MCAT scores, (2) whether any score differences are due to test bias, (3) how group differences may be explained, and (4) whether the MCAT exam is a barrier to medical school admission for black or Latino applicants. This analysis showed that black and Latino examinees' mean MCAT scores are lower than white examinees', mirroring differences on other standardized admission tests and in the average undergraduate grades of medical school applicants. However, there was no evidence that the MCAT exam is biased against black and Latino applicants as determined by their subsequent performance on selected medical school performance indicators. Among other factors which could contribute to mean differences in MCAT performance, whites, blacks, and Latinos interested in medicine differ with respect to parents' education and income. Admission data indicate that admission committees accept majority and minority applicants at similar rates, which suggests that medical students are selected on the basis of a combination of attributes and competencies rather than on MCAT scores alone.
A new computer-based Farnsworth Munsell 100-hue test for evaluation of color vision.

PubMed

Ghose, Supriyo; Parmar, Twinkle; Dada, Tanuj; Vanathi, Murugesan; Sharma, Sourabh

2014-08-01

To evaluate a computer-based Farnsworth-Munsell (FM) 100-hue test and compare it with a manual FM 100-hue test in normal and congenital color-deficient individuals. Fifty color defective subjects and 200 normal subjects with a best-corrected visual acuity ≥ 6/12 were compared using a standard manual FM 100-hue test and a computer-based FM 100-hue test under standard operating conditions as recommended by the manufacturer after initial trial testing. Parameters evaluated were total error scores (TES), type of defect and testing time. Pearson's correlation coefficient was used to determine the relationship between the test scores. Cohen's kappa was used to assess agreement of color defect classification between the two tests. A receiver operating characteristic curve was used to determine the optimal cut-off score for the computer-based FM 100-hue test. The mean time was 16 ± 1.5 (range 6-20) min for the manual FM 100-hue test and 7.4 ± 1.4 (range 5-13) min for the computer-based FM 100-hue test, thus reducing testing time to <50 % (p < 0.05). For grading color discrimination, Pearson's correlation coefficient for TES between the two tests was 0.91 (p < 0.001). For color defect classification, Cohen's agreement coefficient was 0.98 (p < 0.01). The computer-based FM 100-hue is an effective and rapid method for detecting, classifying and grading color vision anomalies.
Effects of training students to identify the semantic base of prose materials

PubMed Central

Glover, John A.; Zimmer, John W.; Filbeck, Robert W.; Plake, Barbara S.

1980-01-01

Feedback and feedback plus points toward a course grade were applied to the attentional behaviors (defined as the ability to identify the semantic base of text passages) of 30 undergraduate students participating in a reading comprehension development program. Correct underlining was increased, extraneous underlining was decreased, and postreading comprehension test scores improved as a result of the procedures. Scores on a standardized test of reading comprehension also increased significantly. PMID:16795637
The correlation of symptoms, pulmonary function tests and exercise testing with high-resolution computed tomography in patients with idiopathic interstitial pneumonia in a tertiary care hospital in South India.

PubMed

Isaac, Barney Thomas Jesudason; Thangakunam, Balamugesh; Cherian, Rekha A; Christopher, Devasahayam Jesudas

2015-01-01

For the follow-up of patients with idiopathic interstitial pneumonias (IIP), it is unclear which parameters of pulmonary function tests (PFT) and exercise testing would correlate best with high-resolution computed tomography (HRCT).. To find out the correlation of symptom scores, PFTs and exercise testing with HRCT scoring in patients diagnosed as idiopathic interstitial pneumonia. Cross-sectional study done in pulmonary medicine outpatients department of a tertiary care hospital in South India. Consecutive patients who were diagnosed as IIP by a standard algorithm were included into the study. Cough and dyspnea were graded for severity and duration. Pulmonary function tests and exercise testing parameters were noted. HRCT was scored based on an alveolar score, an interstitial score and a total score. The HRCT was correlated with each of the clinical and physiologic parameters. Pearson's/Spearman's correlation coefficient was used for the correlation of symptoms and parameters of ABG, PFT and 6MWT with the HRCT scores. A total of 94 patients were included in the study. Cough and dyspnea severity (r = 0.336 and 0.299), FVC (r = -0.48), TLC (r = -0.439) and DLCO and distance saturation product (DSP) (r = -0.368) and lowest saturation (r = -0.324) had significant correlation with total HRCT score. Among these, DLCO, particularly DLCO corrected % of predicted, correlated best with HRCT score (r = -0.721).. Symptoms, PFT and exercise testing had good correlation with HRCT. DLCO corrected % of predicted correlated best with HRCT.

Estimating premorbid general cognitive functioning for children and adolescents using the American Wechsler Intelligence Scale for Children-Fourth Edition: demographic and current performance approaches.

PubMed

Schoenberg, Mike R; Lange, Rael T; Brickell, Tracey A; Saklofske, Donald H

2007-04-01

Neuropsychologic evaluation requires current test performance be contrasted against a comparison standard to determine if change has occurred. An estimate of premorbid intelligence quotient (IQ) is often used as a comparison standard. The Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) is a commonly used intelligence test. However, there is no method to estimate premorbid IQ for the WISC-IV, limiting the test's utility for neuropsychologic assessment. This study develops algorithms to estimate premorbid Full Scale IQ scores. Participants were the American WISC-IV standardization sample (N = 2172). The sample was randomly divided into 2 groups (development and validation). The development group was used to generate 12 algorithms. These algorithms were accurate predictors of WISC-IV Full Scale IQ scores in healthy children and adolescents. These algorithms hold promise as a method to predict premorbid IQ for patients with known or suspected neurologic dysfunction; however, clinical validation is required.
A Large Sample Procedure for Testing Coefficients of Ordinal Association: Goodman and Kruskal's Gamma and Somers' d ba and d ab

ERIC Educational Resources Information Center

Berry, Kenneth J.; And Others

1977-01-01

A FORTRAN program, GAMMA, computes Goodman and Kruskal's coefficient of ordinal association, gamma, and Somer's coefficient. The program also provides associated standard errors, standard scores, and probability values. (Author/JKS)
An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation.

PubMed

Kardong-Edgren, Suzan; Oermann, Marilyn H; Rizzolo, Mary Anne; Odom-Maryon, Tamara

This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing. Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters. Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability. One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores. There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.
TEST-RETEST RELIABILITY OF THE CLOSED KINETIC CHAIN UPPER EXTREMITY STABILITY TEST (CKCUEST) IN ADOLESCENTS: RELIABILITY OF CKCUEST IN ADOLESCENTS.

PubMed

de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C

2017-02-01

The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p < 0.014), the significant difference between the measurements (p < 0.05), and the analysis of the Bland-Altman graph infer that CKCUEST is a discordant test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.
Student science achievement and the integration of Indigenous knowledge on standardized tests

NASA Astrophysics Data System (ADS)

Dupuis, Juliann; Abrams, Eleanor

2017-09-01

In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally relevant curriculum in all schools and to incorporate this curriculum into a portion of the standardized assessment items. This study compares White and American Indian student test scores on these particular test items to determine how White and American Indian students perform on culturally relevant test items compared to traditional standard science test items. The connections between student achievement on adapted culturally relevant science test items versus traditional items brings valuable insights to the fields of science education, research on student assessments, and Indigenous studies.
Using the Clinical Interview as a Complementary Assessment for Minority Elementary Students to Determine Their In-Depth Understanding of Mathematical Concepts

ERIC Educational Resources Information Center

Crisp, Nicola Elinor

2013-01-01

While some African American students perform as well as or better than their White peers on standardized tests, African Americans as a group attain lower scores on standardized tests than their White peers. This phenomenon has been addressed extensively in educational research. However, not much empirical research has been conducted to investigate…
Longitudinal Study Using a Standardized Test Battery as Predictors of Student Outcomes in a Rural County School System.

ERIC Educational Resources Information Center

Twale, Darla J.; Thompson, Mary J.

This longitudinal study focused on predicting student outcomes through multiple test scores and vocational preferences using standardized instruments and self-reports of career plans. A total of 444 students in the class of 1986 were enrolled in either a non-vocational or vocational curriculum at one of 4 high schools in a small, rural,…
Flexner 3.0-Democratization of Medical Knowledge for the 21st Century: Teaching Medical Science Using K-12 General Pathology as a Gateway Course.

PubMed

Weinstein, Ronald S; Krupinski, Elizabeth A; Weinstein, John B; Graham, Anna R; Barker, Gail P; Erps, Kristine A; Holtrust, Angelette L; Holcomb, Michael J

2016-01-01

A medical school general pathology course has been reformatted into a K-12 general pathology course. This new course has been implemented at a series of 7 to 12 grade levels and the student outcomes compared. Typically, topics covered mirrored those in a medical school general pathology course serving as an introduction to the mechanisms of diseases. Assessment of student performance was based on their score on a multiple-choice final examination modeled after an examination given to medical students. Two Tucson area schools, in a charter school network, participated in the study. Statistical analysis of examination performances showed that there were no significant differences as a function of school ( F = 0.258, P = .6128), with students at school A having an average test scores of 87.03 (standard deviation = 8.99) and school B 86.00 (standard deviation = 8.18; F = 0.258, P = .6128). Analysis of variance was also conducted on the test scores as a function of gender and class grade. There were no significant differences as a function of gender ( F = 0.608, P = .4382), with females having an average score of 87.18 (standard deviation = 7.24) and males 85.61 (standard deviation = 9.85). There were also no significant differences as a function of grade level ( F = 0.627, P = .6003), with 7th graders having an average of 85.10 (standard deviation = 8.90), 8th graders 86.00 (standard deviation = 9.95), 9th graders 89.67 (standard deviation = 5.52), and 12th graders 86.90 (standard deviation = 7.52). The results demonstrated that middle and upper school students performed equally well in K-12 general pathology. Student course evaluations showed that the course met the student's expectations. One class voted K-12 general pathology their "elective course-of-the-year."
Flexner 3.0—Democratization of Medical Knowledge for the 21st Century

PubMed Central

Krupinski, Elizabeth A.; Weinstein, John B.; Graham, Anna R.; Barker, Gail P.; Erps, Kristine A.; Holtrust, Angelette L.; Holcomb, Michael J.

2016-01-01

A medical school general pathology course has been reformatted into a K-12 general pathology course. This new course has been implemented at a series of 7 to 12 grade levels and the student outcomes compared. Typically, topics covered mirrored those in a medical school general pathology course serving as an introduction to the mechanisms of diseases. Assessment of student performance was based on their score on a multiple-choice final examination modeled after an examination given to medical students. Two Tucson area schools, in a charter school network, participated in the study. Statistical analysis of examination performances showed that there were no significant differences as a function of school (F = 0.258, P = .6128), with students at school A having an average test scores of 87.03 (standard deviation = 8.99) and school B 86.00 (standard deviation = 8.18; F = 0.258, P = .6128). Analysis of variance was also conducted on the test scores as a function of gender and class grade. There were no significant differences as a function of gender (F = 0.608, P = .4382), with females having an average score of 87.18 (standard deviation = 7.24) and males 85.61 (standard deviation = 9.85). There were also no significant differences as a function of grade level (F = 0.627, P = .6003), with 7th graders having an average of 85.10 (standard deviation = 8.90), 8th graders 86.00 (standard deviation = 9.95), 9th graders 89.67 (standard deviation = 5.52), and 12th graders 86.90 (standard deviation = 7.52). The results demonstrated that middle and upper school students performed equally well in K-12 general pathology. Student course evaluations showed that the course met the student’s expectations. One class voted K-12 general pathology their “elective course-of-the-year.” PMID:28725762
An Audit of Emergency Department Accreditation Based on Joint Commission International Standards (JCI).

PubMed

Hashemi, Behrooz; Motamedi, Maryam; Etemad, Mania; Rahmati, Farhad; Forouzanfar, Mohammad Mehdi; Kaghazchi, Fatemeh

2014-01-01

Despite thousands of years from creation of medical knowledge, it not much passes from founding the health care systems. Accreditation is an effective mechanism for performance evaluation, quality enhancement, and the safety of health care systems. This study was conducted to assess the results of emergency department (ED) accreditation in Shohadaye Tajrish Hospital, Tehran, Iran, 2013 in terms of domesticated standards of joint commission international (JCI) standards. This cohort study with a four-month follow up was conducted in the ED of Shohadaye Tajrish Hospital in 2013. The standard evaluation checklist of Iran hospitals (based on JCI standards) included 24 heading and 337 subheading was used for this purpose. The effective possible causes of weak spots were found and their solutions considered. After correction, assessment of accreditation were repeated again. Finally, the achieved results of two periods were analyzed using SPSS version 20. Quality improvement, admission in department and patient assessment, competency and capability test for staffs, collection and analysis of data, training of patients, and facilities had the score of below 50%. The mean of total score for accreditation in ED in the first period was 60.4±30.15 percent and in the second period 68.9±22.9 (p=0.005). Strategic plans, head of department, head nurse, resident physician, responsible nurse for the shift, and personnel file achieved the score of 100%. Of total headings below 50% in the first period just in two cases, collection and analysis of data with growth of 40% as well as competency and capability test for staffs with growth of 17%, were reached to more than 50%. Based on findings of the present study, the ED of Shohadaye Tajrish hospital reached the score of below 50% in six heading of quality improvement, admission in department and patient assessment, competency and capability test for staffs, collection and analysis of data, training of patients, and facilities. While, the given score in strategic plans, head of department, head nurse, resident physician, responsible nurse for the shifts, and personnel file was 100%.
Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models.

PubMed

Li, Wenlin; Schaeffer, R Dustin; Otwinowski, Zbyszek; Grishin, Nick V

2016-01-01

The Critical Assessment of techniques for protein Structure Prediction (or CASP) is a community-wide blind test experiment to reveal the best accomplishments of structure modeling. Assessors have been using the Global Distance Test (GDT_TS) measure to quantify prediction performance since CASP3 in 1998. However, identifying significant score differences between close models is difficult because of the lack of uncertainty estimations for this measure. Here, we utilized the atomic fluctuations caused by structure flexibility to estimate the uncertainty of GDT_TS scores. Structures determined by nuclear magnetic resonance are deposited as ensembles of alternative conformers that reflect the structural flexibility, whereas standard X-ray refinement produces the static structure averaged over time and space for the dynamic ensembles. To recapitulate the structural heterogeneous ensemble in the crystal lattice, we performed time-averaged refinement for X-ray datasets to generate structural ensembles for our GDT_TS uncertainty analysis. Using those generated ensembles, our study demonstrates that the time-averaged refinements produced structure ensembles with better agreement with the experimental datasets than the averaged X-ray structures with B-factors. The uncertainty of the GDT_TS scores, quantified by their standard deviations (SDs), increases for scores lower than 50 and 70, with maximum SDs of 0.3 and 1.23 for X-ray and NMR structures, respectively. We also applied our procedure to the high accuracy version of GDT-based score and produced similar results with slightly higher SDs. To facilitate score comparisons by the community, we developed a user-friendly web server that produces structure ensembles for NMR and X-ray structures and is accessible at http://prodata.swmed.edu/SEnCS. Our work helps to identify the significance of GDT_TS score differences, as well as to provide structure ensembles for estimating SDs of any scores.
Assessment of body-powered upper limb prostheses by able-bodied subjects, using the Box and Blocks Test and the Nine-Hole Peg Test.

PubMed

Haverkate, Liz; Smit, Gerwin; Plettenburg, Dick H

2016-02-01

The functional performance of currently available body-powered prostheses is unknown. The goal of this study was to objectively assess and compare the functional performance of three commonly used body-powered upper limb terminal devices. Experimental trial. A total of 21 able-bodied subjects (n = 21, age = 22 ± 2) tested three different terminal devices: TRS voluntary closing Hook Grip 2S, Otto Bock voluntary opening hand and Hosmer Model 5XA hook, using a prosthesis simulator. All subjects used each terminal device nine times in two functional tests: the Nine-Hole Peg Test and the Box and Blocks Test. Significant differences were found between the different terminal devices and their scores on the Nine-Hole Peg Test and the Box and Blocks Test. The Hosmer hook scored best in both tests. The TRS Hook Grip 2S scored second best. The Otto Bock hand showed the lowest scores. This study is a first step in the comparison of functional performances of body-powered prostheses. The data can be used as a reference value, to assess the performance of a terminal device or an amputee. The measured scores enable the comparison of the performance of a prosthesis user and his or her terminal device relative to standard scores. © The International Society for Prosthetics and Orthotics 2014.
Blinded randomized controlled study of a web-based otoscopy simulator in undergraduate medical education.

PubMed

Stepniak, Camilla; Wickens, Brandon; Husein, Murad; Paradis, Josee; Ladak, Hanif M; Fung, Kevin; Agrawal, Sumit K

2017-06-01

OtoTrain is a Web-based otoscopy simulator that has previously been shown to have face and content validity. The objective of this study was to evaluate the effectiveness of this Web-based otoscopy simulator in teaching diagnostic otoscopy to novice learners STUDY DESIGN: Prospective, blinded randomized control trial. Second-year medical students were invited to participate in the study. A pretest consisted of a series of otoscopy videos followed by an open-answer format assessment pertaining to the characteristics and diagnosis of each video. Participants were then randomly divided into a control group and a simulator group. Following the pretest, both groups attended standard otology lectures, but the simulator group was additionally given unlimited access to OtoTrain for 1 week. A post-test was completed using a separate set of otoscopy videos. Tests were graded based on a comprehensive marking scheme. The pretest and post-test were anonymized, and the three evaluators were blinded to student allotment. A total of 41 medical students were enrolled in the study and randomized to the control group (n = 20) and the simulator group (n = 21). There was no significant difference between the two groups on their pretest scores. With the standard otology lectures, the control group had a 31% improvement in their post-test score (mean ± standard error of the mean, 30.4 ± 1.5) compared with their pretest score (23.3 ± 1.8) (P < .001). The simulator group had the addition of OtoTrain to the otology lectures, and their score improved by 71% on their post-test (37.8 ± 1.6) compared to their pretest (22.1 ± 1.9) (P < .001). Comparing the post-test results, the simulator group had a 24% higher score than the control group (P < .002). Inter-rater reliability between the blinded evaluators was excellent (r = 0.953, P < .001). The use of OtoTrain increased the diagnostic otoscopic performance in novice learners. OtoTrain may be an effective teaching adjunct for undergraduate medical students. 1b. Laryngoscope, 127:1306-1311, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Reducing the number of options on multiple-choice questions: response time, psychometrics and standard setting.

PubMed

Schneid, Stephen D; Armour, Chris; Park, Yoon Soo; Yudkowsky, Rachel; Bordage, Georges

2014-10-01

Despite significant evidence supporting the use of three-option multiple-choice questions (MCQs), these are rarely used in written examinations for health professions students. The purpose of this study was to examine the effects of reducing four- and five-option MCQs to three-option MCQs on response times, psychometric characteristics, and absolute standard setting judgements in a pharmacology examination administered to health professions students. We administered two versions of a computerised examination containing 98 MCQs to 38 Year 2 medical students and 39 Year 3 pharmacy students. Four- and five-option MCQs were converted into three-option MCQs to create two versions of the examination. Differences in response time, item difficulty and discrimination, and reliability were evaluated. Medical and pharmacy faculty judges provided three-level Angoff (TLA) ratings for all MCQs for both versions of the examination to allow the assessment of differences in cut scores. Students answered three-option MCQs an average of 5 seconds faster than they answered four- and five-option MCQs (36 seconds versus 41 seconds; p = 0.008). There were no significant differences in item difficulty and discrimination, or test reliability. Overall, the cut scores generated for three-option MCQs using the TLA ratings were 8 percentage points higher (p = 0.04). The use of three-option MCQs in a health professions examination resulted in a time saving equivalent to the completion of 16% more MCQs per 1-hour testing period, which may increase content validity and test score reliability, and minimise construct under-representation. The higher cut scores may result in higher failure rates if an absolute standard setting method, such as the TLA method, is used. The results from this study provide a cautious indication to health professions educators that using three-option MCQs does not threaten validity and may strengthen it by allowing additional MCQs to be tested in a fixed amount of testing time with no deleterious effect on the reliability of the test scores. © 2014 John Wiley & Sons Ltd.
Integrative Examination of Motor Abilities in Dialysis Patients and Selection of Tests for a Standardized Physical Function Assessment.

PubMed

Bučar Pajek, Maja; Leskošek, Bojan; Vivoda, Tjaša; Svilan, Katarina; Čuk, Ivan; Pajek, Jernej

2016-06-01

To reduce the need for a large number of executed physical function tests we examined inter-relations and determined predictive power for daily physical activity of the following tests: 6-min walk, 10 repetition sit-to-stand, time up-and-go, Storke balance, handgrip strength, upper limb tapping and sitting forward bend tests. In 90 dialysis and 140 healthy control subjects we found high correlations between all tests, especially those engaging lower extremities. Sit-to-stand, forward bend and handgrip strength were selected for the test battery and composite motor performance score. Sit-to-stand test was superior in terms of sensitivity to uremia effects and association with daily physical function in adjusted analyses. There was no incremental value in calculating the composite performance score. We propose to standardize the physical function assessment of dialysis patients for cross-sectional and longitudinal observations with three simple, cheap, well-accessible and easily performed test tools: sit-to-stand test, handgrip strength and Human Activity Profile questionnaire. © 2016 International Society for Apheresis, Japanese Society for Apheresis, and Japanese Society for Dialysis Therapy.
The Impact of Setting the Standards of Health Promoting Hospitals on Hospital Indicators in Iran

PubMed Central

Amiri, Mohammad; Khosravi, Ahmad; Riyahi, Leila

2016-01-01

Hospitals play a critical role in the health promotion of the society. This study aimed to determine the impact of establishing standards of health promoting hospitals on hospital indicators in Shahroud. This applied study was a quasi-experimental one which was conducted in 2013. Standards of health promoting hospitals were established as an intervention procedure in the Fatemiyeh hospital. Parameters of health promoting hospitals were compared in intervention and control hospitals before and after of intervention (6 months). The data were analyzed using chi-square and t-test. With the establishment of standards for health promotion hospitals, standard scores in intervention and control hospitals were found to be 72.26 ± 4.1 and 16.26 ± 7.5, respectively. T-test showed a significant difference between the mean scores of the hospitals under study (P = 0.001).The chi-square test also showed a significant relationship between patient satisfaction before and after the intervention so that patients’ satisfaction was higher after the intervention (P = 0.001). Commenting on the short-term or long-term positive impacts of establishing standards of health promoting hospitals on all hospital indicators is a bit difficult but preliminary results show the positive impact of the implementation of standards in case hospitals which has led to the improvement of many indicators in the hospital. PMID:27959930
Effect of Content Knowledge on Angoff-Style Standard Setting Judgments

ERIC Educational Resources Information Center

Margolis, Melissa J.; Mee, Janet; Clauser, Brian E.; Winward, Marcia; Clauser, Jerome C.

2016-01-01

Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to…
A Study of the Alignment of National Standards, State Standards, and Science Assessment.

ERIC Educational Resources Information Center

Burry-Stock, Judith A.; Casebeer, Cindy M.

Educational reform efforts are currently at the top of the nation's agenda. Policymakers are hearing increasing calls from members of the public to improve standardized test scores. These reform calls are a response to the perceived inadequacy of science teaching in our nation. Data were collected from participating states regarding the status of…
Association of MCAT scores obtained with standard vs extra administration time with medical school admission, medical student performance, and time to graduation.

PubMed

Searcy, Cynthia A; Dowd, Keith W; Hughes, Michael G; Baldwin, Sean; Pigg, Trey

2015-06-09

Individuals with documented disabilities may receive accommodations on the Medical College Admission Test (MCAT). Whether such accommodations are associated with MCAT scores, medical school admission, and medical school performance is unclear. To determine the comparability of MCAT scores obtained with standard vs extra administration time with respect to likelihood of acceptance to medical school and future medical student performance. Retrospective cohort study of applicants to US medical schools for the 2011-2013 entering classes who reported MCAT scores obtained with standard time (n = 133,962) vs extra time (n = 435), and of students who matriculated in US medical schools from 2000-2004 who reported MCAT scores obtained with standard time (n = 76,262) vs extra time (n = 449). Standard or extra administration time during MCAT. Primary outcome measures were acceptance rates at US medical schools and graduation rates within 4 or 5 years after matriculation. Secondary outcome measures were pass rates on the United States Medical Licensing Examination (USMLE) Step examinations and graduation rates within 6 to 8 years after matriculation. Acceptance rates were not significantly different for applicants who had MCAT scores obtained with standard vs extra time (44.5% [59,585/133,962] vs 43.9% [191/435]; difference, 0.6% [95% CI, -4.1 to 5.3]). Students who tested with extra time passed the Step examinations on first attempt at significantly lower rates (Step 1, 82.1% [344/419] vs 94.0% [70,188/74,668]; difference, 11.9% [95% CI, 9.6% to 14.2%]; Step 2 CK, 85.5% [349/408] vs 95.4% [70,476/73,866]; difference, 9.9% [95% CI, 7.8% to 11.9%]; Step 2 CS, 92.0% [288/313] vs 97.0% [60,039/61,882]; difference, 5.0% [95% CI, 3.1% to 6.9%]). They also graduated from medical school at significantly lower rates at different times (4 years, 67.2% [285/424] vs 86.1% [60,547/70,305]; difference, 18.9% [95% CI, 15.6% to 22.2%]; 5 years, 81.6% [346/424] vs 94.4% [66,369/70,305]; difference, 12.8% [95% CI, 10.6% to 15.0%]; 6 years, 85.4% [362/424] vs 95.8% [67,351/70,305]; difference, 10.4% [95% CI, 8.5% to 12.4%]; 7 years, 88.0% [373/424] vs 96.2% [67,639/70,305]; difference, 8.2% [95% CI, 6.4% to 10.1%]; 8 years, 88.4% [375/424] vs 96.5% [67,847/70,305]; difference, 8.1% [95% CI, 6.3% to 9.8%]). These differences remained after controlling for MCAT scores and undergraduate grade point averages. Among applicants to US medical schools, those with MCAT scores obtained with extra test administration time, compared with standard administration time, had no significant difference in rate of medical school admission but had lower rates of passing the USMLE Step examinations and of medical school graduation within 4 to 8 years after matriculation. These findings raise questions about the types of learning environments and support systems needed by students who test with extra time on the MCAT to enable them to succeed in medical school.

School Performance: A Matter of Health or Socio-Economic Background? Findings from the PIAMA Birth Cohort Study

PubMed Central

Ruijsbroek, Annemarie; Wijga, Alet H.; Gehring, Ulrike; Kerkhof, Marjan; Droomers, Mariël

2015-01-01

Background Performance in primary school is a determinant of children’s educational attainment and their socio-economic position and health inequalities in adulthood. We examined the relationship between five common childhood health conditions (asthma symptoms, eczema, general health, frequent respiratory infections, and overweight), health related school absence and family socio-economic status on children’s school performance. Methods We used data from 1,865 children in the Dutch PIAMA birth cohort study. School performance was measured as the teacher’s assessment of a suitable secondary school level for the child, and the child’s score on a standardized achievement test (Cito Test). Both school performance indicators were standardised using Z-scores. Childhood health was indicated by eczema, asthma symptoms, general health, frequent respiratory infections, overweight, and health related school absence. Children’s health conditions were reported repeatedly between the age of one to eleven. School absenteeism was reported at age eleven. Highest attained educational level of the mother and father indicated family socio-economic status. We used linear regression models with heteroskedasticity-robust standard errors for our analyses with adjustment for sex of the child. Results The health indicators used in our study were not associated with children’s school performance, independently from parental educational level, with the exception of asthma symptoms (-0.03 z-score / -0.04 z-score with Cito Test score after adjusting for respectively maternal and paternal education) and missing more than 5 schooldays due to illness (-0.18 z-score with Cito Test score and -0.17 z-score with school level assessment after adjustment for paternal education). The effect estimates for these health indicators were much smaller though than the effect estimates for parental education, which was strongly associated with children’s school performance. Conclusion Children’s school performance was affected only slightly by a number of common childhood health problems, but was strongly associated with parental education. PMID:26247468
School Performance: A Matter of Health or Socio-Economic Background? Findings from the PIAMA Birth Cohort Study.

PubMed

Ruijsbroek, Annemarie; Wijga, Alet H; Gehring, Ulrike; Kerkhof, Marjan; Droomers, Mariël

2015-01-01

Performance in primary school is a determinant of children's educational attainment and their socio-economic position and health inequalities in adulthood. We examined the relationship between five common childhood health conditions (asthma symptoms, eczema, general health, frequent respiratory infections, and overweight), health related school absence and family socio-economic status on children's school performance. We used data from 1,865 children in the Dutch PIAMA birth cohort study. School performance was measured as the teacher's assessment of a suitable secondary school level for the child, and the child's score on a standardized achievement test (Cito Test). Both school performance indicators were standardised using Z-scores. Childhood health was indicated by eczema, asthma symptoms, general health, frequent respiratory infections, overweight, and health related school absence. Children's health conditions were reported repeatedly between the age of one to eleven. School absenteeism was reported at age eleven. Highest attained educational level of the mother and father indicated family socio-economic status. We used linear regression models with heteroskedasticity-robust standard errors for our analyses with adjustment for sex of the child. The health indicators used in our study were not associated with children's school performance, independently from parental educational level, with the exception of asthma symptoms (-0.03 z-score / -0.04 z-score with Cito Test score after adjusting for respectively maternal and paternal education) and missing more than 5 schooldays due to illness (-0.18 z-score with Cito Test score and -0.17 z-score with school level assessment after adjustment for paternal education). The effect estimates for these health indicators were much smaller though than the effect estimates for parental education, which was strongly associated with children's school performance. Children's school performance was affected only slightly by a number of common childhood health problems, but was strongly associated with parental education.
The London handicap scale: a re-evaluation of its validity using standard scoring and simple summation.

PubMed

Jenkinson, C; Mant, J; Carter, J; Wade, D; Winner, S

2000-03-01

To assess the validity of the London handicap scale (LHS) using a simple unweighted scoring system compared with traditional weighted scoring 323 patients admitted to hospital with acute stroke were followed up by interview 6 months after their stroke as part of a trial looking at the impact of a family support organiser. Outcome measures included the six item LHS, the Dartmouth COOP charts, the Frenchay activities index, the Barthel index, and the hospital anxiety and depression scale. Patients' handicap score was calculated both using the standard procedure (with weighting) for the LHS, and using a simple summation procedure without weighting (U-LHS). Construct validity of both LHS and U-LHS was assessed by testing their correlations with the other outcome measures. Cronbach's alpha for the LHS was 0.83. The U-LHS was highly correlated with the LHS (r=0.98). Correlation of U-LHS with the other outcome measures gave very similar results to correlation of LHS with these measures. Simple summation scoring of the LHS does not lead to any change in the measurement properties of the instrument compared with standard weighted scoring. Unweighted scores are easier to calculate and interpret, so it is recommended that these are used.
Score tests for independence in semiparametric competing risks models.

PubMed

Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul

2009-12-01

A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.
Impact of Private Secondary Schooling on Cognitive Skills: Evidence from India

ERIC Educational Resources Information Center

Azam, Mehtabul; Kingdon, Geeta; Wu, Kin Bing

2016-01-01

We examine the effect of attending private secondary school on educational achievement, as measured by students' scores in a comprehensive standardized math test, in two Indian states: Orissa and Rajasthan. We use propensity score matching (PSM) to control for any systematic differences between students attending private secondary schools and…
Predictive Efficiency of Direct, Repeated Measurement: An Analysis of Cost and Accuracy in Classification.

ERIC Educational Resources Information Center

Marston, Doug; And Others

Two studies were conducted to examine the efficacy of direct measurement, standardized achievement tests, and aptitude-achievement discrepancy scores in distinguishing learning disabled (LD) and nonlearning disabled (NLD) students in grades 3 to 6. For both reading (Study I) and written expression (Study II), students' scores on direct and…
An Introduction to Multilinear Formula Score Theory. Measurement Series 84-4.

ERIC Educational Resources Information Center

Levine, Michael V.

Formula score theory (FST) associates each multiple choice test with a linear operator and expresses all of the real functions of item response theory as linear combinations of the operator's eigenfunctions. Hard measurement problems can then often be reformulated as easier, standard mathematical problems. For example, the problem of estimating…
Measuring Teacher Effectiveness with the Pennsylvania Value-Added Assessment System

ERIC Educational Resources Information Center

Bowen, Naomi

2017-01-01

The purpose of this research was to determine if the Pennsylvania Value-Added Assessment System Average Growth Index (PVAAS AGI) scores, derived from standardized tests and calculated for Pennsylvania schools, provide a valid and reliable assessment of teacher effectiveness, as these scores are currently used to derive 15% of the annual…
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.

ERIC Educational Resources Information Center

Braun, Henry I.; And Others

The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…
Predicting End-of-Year Achievement Test Performance: A Comparison of Assessment Methods

ERIC Educational Resources Information Center

Kettler, Ryan J.; Elliott, Stephen N.; Kurz, Alexander; Zigmond, Naomi; Lemons, Christopher J.; Kloo, Amanda; Shrago, Jacqueline; Beddow, Peter A.; Williams, Leila; Bruen, Charles; Lupp, Lynda; Farmer, Jeanie; Mosiman, Melanie

2014-01-01

Motivated by the multiple-measures clause of recent federal policy regarding student eligibility for alternate assessments based on modified academic achievement standards (AA-MASs), this study examined how scores or combinations of scores from a diverse set of assessments predicted students' end-of-year proficiency status on statewide achievement…
Correlation of Simulation Examination to Written Test Scores for Advanced Cardiac Life Support Testing: Prospective Cohort Study.

PubMed

Strom, Suzanne L; Anderson, Craig L; Yang, Luanna; Canales, Cecilia; Amin, Alpesh; Lotfipour, Shahram; McCoy, C Eric; Osborn, Megan Boysen; Langdorf, Mark I

2015-11-01

Traditional Advanced Cardiac Life Support (ACLS) courses are evaluated using written multiple-choice tests. High-fidelity simulation is a widely used adjunct to didactic content, and has been used in many specialties as a training resource as well as an evaluative tool. There are no data to our knowledge that compare simulation examination scores with written test scores for ACLS courses. To compare and correlate a novel high-fidelity simulation-based evaluation with traditional written testing for senior medical students in an ACLS course. We performed a prospective cohort study to determine the correlation between simulation-based evaluation and traditional written testing in a medical school simulation center. Students were tested on a standard acute coronary syndrome/ventricular fibrillation cardiac arrest scenario. Our primary outcome measure was correlation of exam results for 19 volunteer fourth-year medical students after a 32-hour ACLS-based Resuscitation Boot Camp course. Our secondary outcome was comparison of simulation-based vs. written outcome scores. The composite average score on the written evaluation was substantially higher (93.6%) than the simulation performance score (81.3%, absolute difference 12.3%, 95% CI [10.6-14.0%], p<0.00005). We found a statistically significant moderate correlation between simulation scenario test performance and traditional written testing (Pearson r=0.48, p=0.04), validating the new evaluation method. Simulation-based ACLS evaluation methods correlate with traditional written testing and demonstrate resuscitation knowledge and skills. Simulation may be a more discriminating and challenging testing method, as students scored higher on written evaluation methods compared to simulation.
Agreement between clinicians' and care givers' assessment of intelligence in Nigerian children with intellectual disability: 'ratio IQ' as a viable option in the absence of standardized 'deviance IQ' tests in sub-Saharan Africa.

PubMed

Bakare, Muideen O; Ubochi, Vincent N; Okoroikpa, Ifeoma N; Aguocha, Chinyere M; Ebigbo, Peter O

2009-09-15

There may be need to assess intelligent quotient (IQ) scores in sub-Saharan African children with intellectual disability, either for the purpose of educational needs assessment or research. However, modern intelligence scales developed in the western parts of the world suffer limitation of widespread use because of the influence of socio-cultural variations across the world. This study examined the agreement between IQ scores estimation among Nigerian children with intellectual disability using clinicians' judgment based on International Classification of Diseases, tenth Edition(ICD - 10) criteria for mental retardation and caregivers judgment based on 'ratio IQ' scores calculated from estimated mental age in the context of socio-cultural milieu of the children. It proposed a viable option of IQ score assessment among sub-Saharan African children with intellectual disability, using a ratio of culture-specific estimated mental age and chronological age of the child in the absence of standardized alternatives, borne out of great diversity in socio-cultural context of sub-Saharan Africa. Clinicians and care-givers independently assessed the children in relation to their socio-cultural background. Clinicians assessed the IQ scores of the children based on the ICD - 10 diagnostic criteria for mental retardation. 'Ratio IQ' scores were calculated from the ratio of estimated mental age and chronological age of each child. The IQ scores as assessed by the clinicians were then compared with the 'ratio IQ' scores using correlation statistics. A total of forty-four (44) children with intellectual disability were assessed. There was a significant correlation between clinicians' assessed IQ scores and the 'ratio IQ' scores employing zero order correlation without controlling for the chronological age of the children (r = 0.47, df = 42, p = 0.001). First order correlation controlling for the chronological age of the children showed higher correlation score between clinicians' assessed IQ scores and 'ratio IQ' scores (r = 0.75, df = 41, p = 0.000). Agreement between clinicians' assessed IQ scores and 'ratio IQ' scores was good. 'Ratio IQ' test would provide a viable option of assessing IQ scores in sub-Saharan African children with intellectual disability in the absence of culture-appropriate standardized intelligence scales, which is often the case because of great diversity in socio-cultural structures of sub-Saharan Africa.
The use of an essay examination in evaluating medical students during the surgical clerkship.

PubMed

Smart, Blair J; Rinewalt, Daniel; Daly, Shaun C; Janssen, Imke; Luu, Minh B; Myers, Jonathan A

2016-01-01

Third-year medical students are graded according to subjective performance evaluations and standardized tests written by the National Board of Medical Examiners (NBME). Many "poor" standardized test takers believe the heavily weighted NBME does not evaluate their true fund of knowledge and would prefer a more open-ended forum to display their individualized learning experiences. Our study examined the use of an essay examination as part of the surgical clerkship evaluation. We retrospectively examined the final surgical clerkship grades of 781 consecutive medical students enrolled in a large urban academic medical center from 2005 to 2011. We examined final grades with and without the inclusion of the essay examination for all students using a paired t test and then sought any relationship between the essay and NBME using Pearson correlations. Final average with and without the essay examination was 72.2% vs 71.3% (P < .001), with the essay examination increasing average scores by .4, 1.8, and 2.5 for those receiving high pass, pass, and fail, respectively. The essay decreased the average score for those earning an honors by .4. Essay scores were found to overall positively correlate with the NBME (r = .32, P < .001). The inclusion of an essay examination as part of the third-year surgical core clerkship final did increase the final grade a modest degree, especially for those with lower scores who may identify themselves as "poor" standardized test takers. A more open-ended forum may allow these students an opportunity to overcome this deficiency and reveal their true fund of surgical knowledge. Copyright © 2016 Elsevier Inc. All rights reserved.
Establishing robust cognitive dimensions for characterization and differentiation of patients with Alzheimer's disease, mild cognitive impairment, frontotemporal dementia and depression.

PubMed

Beck, Irene R; Schmid, Nicole S; Berres, Manfred; Monsch, Andreas U

2014-06-01

The diagnosis of mild cognitive impairment (MCI) and dementia requires detailed neuropsychological examinations. These examinations typically yield a large number of outcome variables, which may complicate the interpretation and communication of results. The purposes of this study were the following: (i) to reduce a large data set of interrelated neuropsychological variables to a smaller number of cognitive dimensions; (ii) to create a common metric for these dimensions (z-scores); and (iii) to study the ability of the cognitive dimensions to distinguish between groups of patients with different types of cognitive impairment. We tested 1646 patients with different forms of dementia or with a major depression with a standard (n = 632) or, if cognitively less affected, a challenging neuropsychological battery (n = 1014). To identify the underlying cognitive dimensions of the two test batteries, maximum likelihood factor analyses with a promax rotation were conducted. To interpret the sum scores of the factors as standard scores, we divided them by the standard deviation of a cognitively healthy sample (n = 1145). The factor analyses yielded seven factors for each test battery. The cognitive dimensions in both test batteries distinguished patients with different forms of dementia (MCI, Alzheimer's dementia or frontotemporal dementia) and patients with major depression. Furthermore, patients with stable MCI could be separated from patients with progressing MCI. Discriminant analyses with an independent new sample of patients (n = 306) revealed that the new dimension scores distinguished new samples of patients with MCI from patients with Alzheimer's dementia with high accuracy. These findings suggest that these cognitive dimensions may benefit neuropsychological diagnostics. © 2013 The Authors International Journal of Geriatric Psychiatry Published by John Wiley & Sons Ltd.
Establishing robust cognitive dimensions for characterization and differentiation of patients with Alzheimer's disease, mild cognitive impairment, frontotemporal dementia and depression

PubMed Central

Beck, Irene R; Schmid, Nicole S; Berres, Manfred; Monsch, Andreas U

2014-01-01

Objective The diagnosis of mild cognitive impairment (MCI) and dementia requires detailed neuropsychological examinations. These examinations typically yield a large number of outcome variables, which may complicate the interpretation and communication of results. The purposes of this study were the following: (i) to reduce a large data set of interrelated neuropsychological variables to a smaller number of cognitive dimensions; (ii) to create a common metric for these dimensions (z-scores); and (iii) to study the ability of the cognitive dimensions to distinguish between groups of patients with different types of cognitive impairment. Methods We tested 1646 patients with different forms of dementia or with a major depression with a standard (n = 632) or, if cognitively less affected, a challenging neuropsychological battery (n = 1014). To identify the underlying cognitive dimensions of the two test batteries, maximum likelihood factor analyses with a promax rotation were conducted. To interpret the sum scores of the factors as standard scores, we divided them by the standard deviation of a cognitively healthy sample (n = 1145). Results The factor analyses yielded seven factors for each test battery. The cognitive dimensions in both test batteries distinguished patients with different forms of dementia (MCI, Alzheimer's dementia or frontotemporal dementia) and patients with major depression. Furthermore, patients with stable MCI could be separated from patients with progressing MCI. Discriminant analyses with an independent new sample of patients (n = 306) revealed that the new dimension scores distinguished new samples of patients with MCI from patients with Alzheimer's dementia with high accuracy. Conclusion These findings suggest that these cognitive dimensions may benefit neuropsychological diagnostics. PMID:24227657
Pretraining and posttraining assessment of residents' performance in the fourth accreditation council for graduate medical education competency: patient communication skills.

PubMed

Chandawarkar, Rajiv Y; Ruscher, Kimberly A; Krajewski, Aleksandra; Garg, Manish; Pfeiffer, Carol; Singh, Rekha; Longo, Walter E; Kozol, Robert A; Lesnikoski, Beth; Nadkarni, Prakash

2011-08-01

Structured communication curricula will improve surgical residents' ability to communicate effectively with patients. A prospective study approved by the institutional review board involved 44 University of Connecticut general surgery residents. Residents initially completed a written baseline survey to assess general communication skills awareness. In step 1 of the study, residents were randomized to 1 of 2 simulations using standardized patient instructors to mimic patients receiving a diagnosis of either breast or rectal cancer. The standardized patient instructors scored residents' communication skills using a case-specific content checklist and Master Interview Rating Scale. In step 2 of the study, residents attended a 3-part interactive program that comprised (1) principles of patient communication; (2) experiences of a surgeon (role as physician, patient, and patient's spouse); and (3) role-playing (3-resident groups played patient, physician, and observer roles and rated their own performance). In step 3, residents were retested as in step 1, using a crossover case design. Scores were analyzed using Wilcoxon signed rank test with a Bonferroni correction. Case-specific performance improved significantly, from a pretest content checklist median score of 8.5 (65%) to a posttest median of 11.0 (84%) (P = .005 by Wilcoxon signed rank test for paired ordinal data)(n = 44). Median Master Interview Rating Scale scores changed from 58.0 before testing (P = .10) to 61.5 after testing (P = .94). Difference between overall rectal cancer scores and breast cancer scores also were not significant. Patient communication skills need to be taught as part of residency training. With limited training, case-specific skills (herein, involving patients with cancer) are likely to improve more than general communication skills.
PROTECT YOUR HEART: A CULTURE-SPECIFIC, MULTIMEDIA CARDIOVASCULAR HEALTH EDUCATION PROGRAM

PubMed Central

Shah, Amy; Clayman, Marla L.; Lauderdale, Diane S.; Khurana, Neerja; Glass, Sara; Kandula, Namratha R.

2016-01-01

Objectives South Asians (SAs), the second fastest growing racial/ethnic minority in the United States., have high rates of coronary heart disease (CHD). Few CHD prevention efforts target this population. We developed and tested a culture-specific, multimedia CHD prevention education program in English and Hindi for SAs. Methods Participants were recruited from community organizations in Chicago, IL between June-October 2011. Bilingual interviewers used questionnaires to assess participants’ knowledge and perceptions before and after the patient education program. Change from pre- to post-test score was calculated using a paired t-test. Linear regression was used to determine the association between post-test scores and education and language. Results Participants’ (n=112) average age was 41 years, 67% had more than a high school education, and 50% spoke Hindi. Participants’ mean pre-test score was 15 (Standard Deviation= 4). After the patient education program, post-test scores increased significantly among all participants (post-test score=24, SD=4), including those with limited-English proficiency. Lower education was associated with a lower post-test score (Beta-coefficient= −2.2, 95% CI= −0.68, −3.8) in adjusted regression. Conclusions A culture-specific, multimedia patient education program significantly improved knowledge and perceptions about CHD prevention among SA immigrants. Culturally-salient, multimedia education may be an effective and engaging way to deliver health information to diverse patient populations. PMID:25647363
Testing to the Top: Everything But the Kitchen Sink?

ERIC Educational Resources Information Center

Dietel, Ron

2011-01-01

Two tests intended to measure student achievement of the Common Core State Standards will face intense scrutiny, but the test makers say they will include performance assessments and other items that are not multiple-choice questions. Incorporating performance items on this tests will bring up issues over scoring, costs, and validity.
The Validity and Clinical Uses of the Pepper Visual Skills for Reading Test.

ERIC Educational Resources Information Center

Watson, G.; And Others

1990-01-01

The Pepper Visual Skills for Reading Test was assessed as a measure of reading ability with meaningful text in 38 adults with macular degeneration; scores were compared with assessment made using the Gray Oral Reading Test, a previously standardized assessment. The test's validity was confirmed. (Author/JDD)
Development and psychometric testing of an instrument designed to measure chronic pain in dogs with osteoarthritis

PubMed Central

Boston, Raymond C.; Coyne, James C.; Farrar, John T.

2010-01-01

Objective To develop and psychometrically test an owner self-administered questionnaire designed to assess severity and impact of chronic pain in dogs with osteoarthritis. Sample Population 70 owners of dogs with osteoarthritis and 50 owners of clinically normal dogs. Procedures Standard methods for the stepwise development and testing of instruments designed to assess subjective states were used. Items were generated through focus groups and an expert panel. Items were tested for readability and ambiguity, and poorly performing items were removed. The reduced set of items was subjected to factor analysis, reliability testing, and validity testing. Results Severity of pain and interference with function were 2 factors identified and named on the basis of the items contained in them. Cronbach’s α was 0.93 and 0.89, respectively, suggesting that the items in each factor could be assessed as a group to compute factor scores (ie, severity score and interference score). The test-retest analysis revealed κ values of 0.75 for the severity score and 0.81 for the interference score. Scores correlated moderately well (r = 0.51 and 0.50, respectively) with the overall quality-of-life (QOL) question, such that as severity and interference scores increased, QOL decreased. Clinically normal dogs had significantly lower severity and interference scores than dogs with osteoarthritis. Conclusions and Clinical Relevance A psychometrically sound instrument was developed. Responsiveness testing must be conducted to determine whether the questionnaire will be useful in reliably obtaining quantifiable assessments from owners regarding the severity and impact of chronic pain and its treatment on dogs with osteoarthritis. PMID:17542696

People with Parkinson Disease and Normal MMSE Score Have a Broad Range of Cognitive Performance

PubMed Central

Burdick, DJ; Cholerton, B; Watson, GS; Siderowf, A; Trojanowski, JQ; Weintraub, D; Ritz, B; Rhodes, SL; Rausch, R; Factor, SA; Wood-Siverio, C; Quinn, JF; Chung, KA; Srivatsal, S; Edwards, KL; Montine, TJ; Zabetian, CP; Leverenz, JB

2014-01-01

Background Cognitive impairment, including dementia, is common in Parkinson disease (PD). The Mini-Mental State Examination (MMSE) has been recommended as a screening tool for PDD, with values below 26 indicative of possible dementia. Using a detailed neuropsychological battery, we examined the range of cognitive impairment in PD patients with a MMSE score ≥ 26. Methods In this multi-center, cross-sectional, observational study, we performed neuropsychological testing in a sample of 788 PD patients with MMSE ≥ 26. Evaluation included tests of global cognition, executive function, language, memory, and visuospatial skills. A consensus panel reviewed results for 342 subjects and assigned a diagnosis of no cognitive impairment, mild cognitive impairment, or dementia. Results 67% of the 788 subjects performed 1.5 standard deviations below the normative mean on at least one test. On eight of the 15 tests, more than 20% of subjects scored 1.5 standard deviations or more below the normative mean. Greatest impairments were found on Hopkins Verbal Learning and Digit Symbol Coding tests. The sensitivity of the MMSE to detect dementia was 45% in a subset of participants who underwent clinical diagnostic procedures. Conclusions A remarkably wide range of cognitive impairment can be found in PD patients with a relatively high score on the MMSE, including a level of cognitive impairment consistent with dementia. Given these findings, clinicians must be aware of the limitations of the MMSE in detecting cognitive impairment, including dementia, in PD. PMID:25073717
Pharmacy students' test-taking motivation-effort on a low-stakes standardized test.

PubMed

Waskiewicz, Rhonda A

2011-04-11

To measure third-year pharmacy students' level of motivation while completing the Pharmacy Curriculum Outcomes Assessment (PCOA) administered as a low-stakes test to better understand use of the PCOA as a measure of student content knowledge. Student motivation was manipulated through an incentive (ie, personal letter from the dean) and a process of statistical motivation filtering. Data were analyzed to determine any differences between the experimental and control groups in PCOA test performance, motivation to perform well, and test performance after filtering for low motivation-effort. Incentivizing students diminished the need for filtering PCOA scores for low effort. Where filtering was used, performance scores improved, providing a more realistic measure of aggregate student performance. To ensure that PCOA scores are an accurate reflection of student knowledge, incentivizing and/or filtering for low motivation-effort among pharmacy students should be considered fundamental best practice when the PCOA is administered as a low-stakes test.
A Preliminary Investigation of Dynamic Assessment With Native American Kindergartners.

PubMed

Ukrainetz, Teresa A; Harpell, Stacey; Walsh, Chandra; Coyle, Catherine

2000-04-01

This study examined dynamic assessment as a lessbiased evaluation procedure for assessing the languagelearning ability of Native American children. Twenty-three Arapahoe/Shoshone kindergartners were identified as stronger (n = 15) or weaker (n = 8) language learners through teacher report and examiner classroom observation. Through a test-teach-test protocol, participants were briefly taught the principles of categorization. Participant responses to learning were measured in terms of an index of modifiability and post-test categorization scores. The modifiability index, determined during the teaching phase, was a combined score reflecting the child's learning strategies, such as ability to attend, plan, and self-regulate, and the child's responses to the learning situation. Post-test scores consisted of performance on expressive and receptive subtests from a standardized categorization test after partialling out pretest score differences. Effect sizes and confidence intervals were also determined. Group and individual results indicated that modifiability and post-test scores were significantly greater for stronger than for weaker language learners. The response to modifiability components was a better discriminator than was the learner strategies components. These results provide support for the further development of dynamic assessment as a valid measure of language learning ability in minority children.
Feasibility of remote administration of the Fundamentals of Laparoscopic Surgery (FLS) skills test.

PubMed

Okrainec, Allan; Vassiliou, Melina; Kapoor, Andrew; Pitzul, Kristen; Henao, Oscar; Kaneva, Pepa; Jackson, Timothy; Ritter, E Matt

2013-11-01

Fundamentals of Laparoscopic Surgery (FLS) certification testing currently is offered at accredited test centers or at select surgical conferences. Maintaining these test centers requires considerable investment in human and financial resources. Additionally, it can be challenging for individuals outside North America to become FLS certified. The objective of this pilot study was to assess the feasibility of remotely administering and scoring the FLS examination using live videoconferencing compared with standard onsite testing. This parallel mixed-methods study used both FLS scoring data and participant feedback to determine the barriers to feasibility of remote proctoring for the FLS examination. Participants were tested at two accredited FLS testing centers. An official FLS proctor administered and scored the FLS exam remotely while another onsite proctor provided a live score of participants' performance. Participant feedback was collected during testing. Interrater reliabilities of onsite and remote FLS scoring data were compared using intraclass correlation coefficients (ICCs). Participant feedback was analyzed using modified grounded theory to identify themes for barriers to feasibility. The scores of the remote and onsite proctors showed excellent interrater reliability in the total FLS (ICC 0.995, CI [0.985-0.998]). Several barriers led to critical errors in remote scoring, but most were accompanied by a solution incorporated into the study protocol. The most common barrier was the chain of custody for exam accessories. The results of this pilot study suggest that remote administration of the FLS has the potential to decrease costs without altering test-taker scores or exam validity. Further research is required to validate protocols for remote and onsite proctors and to direct execution of these protocols in a controlled environment identical to current FLS test administration.
Using Multivariate Base Rates to Interpret Low Scores on an Abbreviated Battery of the Delis-Kaplan Executive Function System.

PubMed

Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L

2017-05-01

Executive function consists of multiple cognitive processes that operate as an interactive system to produce volitional goal-oriented behavior, governed in large part by frontal microstructural and physiological networks. Identification of deficits in executive function in those with neurological or psychiatric conditions can be difficult because the normal variation in executive function test scores, in healthy adults when multiple tests are used, is largely unknown. This study addresses that gap in the literature by examining the prevalence of low scores on a brief battery of executive function tests. The sample consisted of 1,050 healthy individuals (ages 16-89) from the standardization sample for the Delis-Kaplan Executive Function System (D-KEFS). Seven individual test scores from the Trail Making Test, Color-Word Interference Test, and Verbal Fluency Test were analyzed. Low test scores, as defined by commonly used clinical cut-offs (i.e., ≤25th, 16th, 9th, 5th, and 2nd percentiles), occurred commonly among the adult portion of the D-KEFS normative sample (e.g., 62.8% of the sample had one or more scores ≤16th percentile, 36.1% had one or more scores ≤5th percentile), and the prevalence of low scores increased with lower intelligence and fewer years of education. The multivariate base rates (BR) in this article allow clinicians to understand the normal frequency of low scores in the general population. By use of these BRs, clinicians and researchers can improve the accuracy with which they identify executive dysfunction in clinical groups, such as those with traumatic brain injury or neurodegenerative diseases. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
The Paradox of Educational Testing.

ERIC Educational Resources Information Center

Ebel, Robert L.

1976-01-01

There is currently a conflict between educational accountability and the distrust of standardized testing. Concern for the quality of education is based on evidence of students' academic dificiencies, the decline in test scores, and increasing education costs and school taxes. As a result of criticism, states have mandated, or are considering,…
Automated Psychological Testing: Method of Administration, Need for Approval, and Measures of Anxiety.

ERIC Educational Resources Information Center

Davis, Caroline; Cowles, Michael

1989-01-01

Computerized and paper-and-pencil versions of four standard personality inventories administered to 147 undergraduates were compared for: (1) test-retest reliability; (2) scores; (3) trait anxiety; (4) interaction between method and social desirability; and (5) preferences concerning method of testing. Doubts concerning the efficacy of…
State Test Results Are Predictable

ERIC Educational Resources Information Center

Tienken, Christopher H.

2014-01-01

Out-of-school, community demographic and family-level variables have an important influence on student achievement as measured by large-scale standardized tests. Studies described here demonstrated that about half of the test score is accounted for by variables outside the control of teachers and school administrators. The results from these…
Clarifying the Consensus Definition of Validity

ERIC Educational Resources Information Center

Newton, Paul E.

2012-01-01

The 1999 "Standards for Educational and Psychological Testing" defines validity as the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Although quite explicit, there are ways in which this definition lacks precision, consistency, and clarity. The history of validity has taught us…
Wake Forest U. Joins Ranks of Test-Optional Colleges

ERIC Educational Resources Information Center

Hoover, Eric; Supiano, Beckie

2008-01-01

Wake Forest University will no longer require applicants to submit standardized test scores, the university announced last week. The move makes Wake Forest, in Winston-Salem, North Carolina, one of the most prominent institutions with a "test optional" admissions policy. The university's decision reveals the increasing complexity of the…
Evaluating Test Validity: Reprise and Progress

ERIC Educational Resources Information Center

Shepard, Lorrie A.

2016-01-01

The AERA, APA, NCME Standards define validity as "the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests". A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it…
EXAMINATIONS AND THE ADVANCEMENT OF TEACHING.

ERIC Educational Resources Information Center

BENSON, ARTHUR L.

FUNCTIONS OF PROFESSIONALLY PREPARED, OBJECTIVE, STANDARDIZED TESTS FOR TEACHERS AND PROSPECTIVE TEACHERS, FOR NATIONWIDE USE, ARE DESCRIBED. THESE ARE (1) ADVANCING THE PRESERVICE PREPARATION OF TEACHERS BY DEMONSTRATING SIGNIFICANT TEST-SCORE DIFFERENCES AMONG STUDENTS FROM VARIOUS INSTITUTIONS, (2) IMPROVING CERTIFICATION OR LICENSING POLICIES…
Poorer right ventricular systolic function and exercise capacity in women after repair of tetralogy of fallot: a sex comparison of standard deviation scores based on sex-specific reference values in healthy control subjects.

PubMed

Sarikouch, Samir; Boethig, Dietmar; Peters, Brigitte; Kropf, Siegfried; Dubowy, Karl-Otto; Lange, Peter; Kuehne, Titus; Haverich, Axel; Beerbaum, Philipp

2013-11-01

In repaired congenital heart disease, there is increasing evidence of sex differences in cardiac remodeling, but there is a lack of comparable data for specific congenital heart defects such as in repaired tetralogy of Fallot. In a prospective multicenter study, a cohort of 272 contemporary patients (158 men; mean age, 14.3±3.3 years [range, 8-20 years]) with repaired tetralogy of Fallot underwent cardiac magnetic resonance for ventricular function and metabolic exercise testing. All data were transformed to standard deviation scores according to the Lambda-Mu-Sigma method by relating individual values to their respective 50th percentile (standard deviation score, 0) in sex-specific healthy control subjects. No sex differences were observed in age at repair, type of repair conducted, or overall hemodynamic results. Relative to sex-specific controls, repaired tetralogy of Fallot in women had larger right ventricular end-systolic volumes (standard deviation scores: women, 4.35; men, 3.25; P=0.001), lower right ventricular ejection fraction (women, -2.83; men, -2.12; P=0.011), lower right ventricular muscle mass (women, 1.58; men 2.45; P=0.001), poorer peak oxygen uptake (women, -1.65; men, -1.14; P<0.001), higher VE/VCO2 (ventilation per unit of carbon dioxide production) slopes (women, 0.88; men 0.58; P=0.012), and reduced peak heart rate (women, -2.16; men -1.74; P=0.017). Left ventricular parameters did not differ between sexes. Relative to their respective sex-specific healthy control subjects, derived standard deviation scores in repaired tetralogy of Fallot suggest that women perform poorer than men in terms of right ventricular systolic function as tested by cardiac magnetic resonance and exercise capacity. This effect cannot be explained by selection bias. Further outcome data are required from longitudinal cohort studies.
English Cross-Cultural Translation and Validation of the Neuromuscular Score: A System for Motor Function Classification in Patients With Neuromuscular Diseases

PubMed Central

Vuillerot, Carole; Meilleur, Katherine G.; Jain, Minal; Waite, Melissa; Wu, Tianxia; Linton, Melody; Datsgir, Jahannaz; Donkervoort, Sandra; Leach, Meganne E.; Rutkowski, Anne; Rippert, Pascal; Payan, Christine; Iwaz, Jean; Hamroun, Dalil; Bérard, Carole; Poirot, Isabelle; Bönnemann, Carsten G.

2016-01-01

Objective To develop and validate an English version of the Neuromuscular (NM)-Score, a classification for patients with NM diseases in each of the 3 motor function domains: D1, standing and transfers; D2, axial and proximal motor function; and D3, distal motor function. Design Validation survey. Setting Patients seen at a medical research center between June and September 2013. Participants Consecutive patients (N = 42) aged 5 to 19 years with a confirmed or suspected diagnosis of congenital muscular dystrophy. Interventions Not applicable. Main Outcome Measures An English version of the NM-Score was developed by a 9-person expert panel that assessed its content validity and semantic equivalence. Its concurrent validity was tested against criterion standards (Brooke Scale, Motor Function Measure [MFM], activity limitations for patients with upper and/or lower limb impairments [ACTIVLIM], Jebsen Test, and myometry measurements). Informant agreement between patient/caregiver (P/C)-reported and medical doctor (MD)-reported NM scores was measured by weighted kappa. Results Significant correlation coefficients were found between NM scores and criterion standards. The highest correlations were found between NM-score D1 and MFM score D1 (ρ = −.944, P<.0001), ACTIVLIM (ρ = −.895, P<.0001), and hip abduction strength by myometry (ρ = −.811, P<.0001). Informant agreement between P/C-reported and MD-reported NM scores was high for D1 (κ = .801; 95% confidence interval [CI], .701–.914) but moderate for D2 (κ = .592; 95% CI, .412–.773) and D3 (κ = .485; 95% CI, .290–.680). Correlation coefficients between the NM scores and the criterion standards did not significantly differ between P/C-reported and MD-reported NM scores. Conclusions Patients and physicians completed the English NM-Score easily and accurately. The English version is a reliable and valid instrument that can be used in clinical practice and research to describe the functional abilities of patients with NM diseases. PMID:24862765
Effect of two additional interventions, test and reflection, added to standard cardiopulmonary resuscitation training on seventh grade students' practical skills and willingness to act: a cluster randomised trial.

PubMed

Nord, Anette; Hult, Håkan; Kreitz-Sandberg, Susanne; Herlitz, Johan; Svensson, Leif; Nilsson, Lennart

2017-06-23

The aim of this research is to investigate if two additional interventions, test and reflection, after standard cardiopulmonary resuscitation (CPR) training facilitate learning by comparing 13-year-old students' practical skills and willingness to act. Seventh grade students in council schools of two municipalities in south-east Sweden. School classes were randomised to CPR training only (O), CPR training with a practical test including feedback (T) or CPR training with reflection and a practical test including feedback (RT). Measures of practical skills and willingness to act in a potential life-threatening situation were studied directly after training and at 6 months using a digital reporting system and a survey. A modified Cardiff test was used to register the practical skills, where scores in each of 12 items resulted in a total score of 12-48 points. The study was conducted in accordance with current European Resuscitation Council guidelines during December 2013 to October 2014. 29 classes for a total of 587 seventh grade students were included in the study. The total score of the modified Cardiff test at 6 months was the primary outcome. Secondary outcomes were the total score directly after training, the 12 individual items of the modified Cardiff test and willingness to act. At 6 months, the T and O groups scored 32 (3.9) and 30 (4.0) points, respectively (p<0.001), while the RT group scored 32 (4.2) points (not significant when compared with T). There were no significant differences in willingness to act between the groups after 6 months. A practical test including feedback directly after training improved the students' acquisition of practical CPR skills. Reflection did not increase further CPR skills. At 6-month follow-up, no intervention effect was found regarding willingness to make a life-saving effort. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Effect of two additional interventions, test and reflection, added to standard cardiopulmonary resuscitation training on seventh grade students’ practical skills and willingness to act: a cluster randomised trial

PubMed Central

Nord, Anette; Hult, Håkan; Kreitz-Sandberg, Susanne; Herlitz, Johan; Svensson, Leif; Nilsson, Lennart

2017-01-01

Objectives The aim of this research is to investigate if two additional interventions, test and reflection, after standard cardiopulmonary resuscitation (CPR) training facilitate learning by comparing 13-year-old students’ practical skills and willingness to act. Settings Seventh grade students in council schools of two municipalities in south-east Sweden. Design School classes were randomised to CPR training only (O), CPR training with a practical test including feedback (T) or CPR training with reflection and a practical test including feedback (RT). Measures of practical skills and willingness to act in a potential life-threatening situation were studied directly after training and at 6 months using a digital reporting system and a survey. A modified Cardiff test was used to register the practical skills, where scores in each of 12 items resulted in a total score of 12–48 points. The study was conducted in accordance with current European Resuscitation Council guidelines during December 2013 to October 2014. Participants 29 classes for a total of 587 seventh grade students were included in the study. Primary and secondary outcome measures The total score of the modified Cardiff test at 6 months was the primary outcome. Secondary outcomes were the total score directly after training, the 12 individual items of the modified Cardiff test and willingness to act. Results At 6 months, the T and O groups scored 32 (3.9) and 30 (4.0) points, respectively (p<0.001), while the RT group scored 32 (4.2) points (not significant when compared with T). There were no significant differences in willingness to act between the groups after 6 months. Conclusions A practical test including feedback directly after training improved the students’ acquisition of practical CPR skills. Reflection did not increase further CPR skills. At 6-month follow-up, no intervention effect was found regarding willingness to make a life-saving effort. PMID:28645953
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

PubMed

Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao

2017-02-21

In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
Affirm VPIII microbial identification test can be used to detect gardnerella vaginalis, Candida albicans and trichomonas vaginalis microbial infections in Korean women.

PubMed

Byun, Seung Won; Park, Yeon Joon; Hur, Soo Young

2016-04-01

The aim of this study was to compare Affirm VPIII Microbial Identification Test results for Korean women to those obtained for Gardnerella vaginalis through Nugent score, Candida albicans based on vaginal culture and Trichomonas vaginalis based on wet smear diagnostic standards. Study participants included 195 women with symptomatic or asymptomatic vulvovaginitis under hospital obstetric or gynecologic care. A definite diagnosis was made based on Nugent score for Gardnerella, vaginal culture for Candida and wet prep for Trichomonas vaginalis. Affirm VPIII Microbial Identification Test results were then compared to diagnostic standard results. Of the 195 participants, 152 were symptomatic, while 43 were asymptomatic. Final diagnosis revealed 68 (37.87%) cases of Gardnerella, 29 (14.87%) cases of Candida, one (0.51%) case of Trichomonas, and 10 (5.10%) cases of mixed infections. The detection rates achieved by each detection method (Affirm assay vs diagnostic standard) for Gardnerella and Candida were not significantly different (33.33% vs 34.8% for Gardnerella, 13.33% vs 14.87% for Candida, respectively). The sensitivity and specificity of the Affirm test for Gardnerella compared to the diagnostic standard were 75.0% and 88.98%, respectively. For Candida, the sensitivity and specificity of the Affirm test compared to the diagnostic standard were 82.76% and 98.80%, respectively. The number of Trichomonas cases was too small (1 case) to be statistically analyzed. The Affirm test is a quick tool that can help physicians diagnose and treat patients with infectious vaginitis at the point of care. © 2016 Japan Society of Obstetrics and Gynecology.
Impact of Gadget Based Learning of Grammar in English at Standard II

ERIC Educational Resources Information Center

Singaravelu, G.

2014-01-01

The study enlightens the impact of Gadget Based Learning of English Grammar at standard II. Objectives of the study is to find out the learning problems of the students of standard II in Learning English Grammar in Shri Vani Vilas Middle School and to find whether there is any significant difference in achievement mean score between pre test of…
Are overreferrals on developmental screening tests really a problem?

PubMed

Glascoe, F P

2001-01-01

Developmental screening tests, even those meeting standards for screening test accuracy, produce numerous false-positive results for 15% to 30% of children. This is thought to produce unnecessary referrals for diagnostic testing or special services and increase the cost of screening programs. To explore whether children who pass screening tests differ in important ways from those who do not and to determine whether children overreferred for testing benefit from the scrutiny of diagnostic testing and treatment planning. Subjects were a national sample of 512 parents and their children (age range of the children, 7 months to 8 years) who participated in validation studies of various screening tests. Psychological examiners adhering to standardized directions obtained informed consent and administered at least 2 developmental screening measures (the Brigance Screens, the Battelle Developmental Inventory Screening Test, the Denver-II, and the Parents' Evaluations of Developmental Status) and a concurrent battery of diagnostic measures, including tests of intelligence, language, and academic achievement (for children aged 2(1/2) years and older). The performance on diagnostic measures of children who failed screening but were not found to have a disability (false positives) was compared with that of children who passed screening and did not have a disability on diagnostic testing (true negatives). Children with false-positive scores performed significantly (P<.001) lower on diagnostic measures than did children with true-negative scores. The false-positive group had scores in adaptive behavior, language, intelligence, and academic achievement that were 9 to 14 points lower than the scores of those in the true-negative group. When viewing the likelihood of scoring below the 25th percentile on diagnostic measures, children with false-positive scores had a relative risk of 2.6 in adaptive behavior (95% confidence interval [CI], 1.67-4.21), 3.1 in language skills (95% CI, 1.90-5.20), 6.7 on intelligence tests (95% CI, 3.28-13.50), and 4.9 on academic measures (95% CI, 2.61-9.28). Overall, 151 (70%) of the children with false-positive results scored below the 25th percentile on 1 or more diagnostic measures (the point at which most children have difficulty benefiting from typical classroom instruction) in contrast with 64 (29%) of the children with true-negative scores (odds ratio, 5.6; 95% CI, 3.73-8.49). Children with false-positive scores were also more likely to be nonwhite and to have parents who had not graduated from high school. Performance differences between children with true-negative scores and children with false-positive scores continued to be significant (P<.001) even after adjusting for sociodemographic differences between groups. Children overreferred for diagnostic testing by developmental screens perform substantially lower than children with true-negative scores on measures of intelligence, language, and academic achievement-the 3 best predictors of school success. These children also carry more psychosocial risk factors, such as limited parental education and minority status. Thus, children with false-positive screening results are an at-risk group for whom diagnostic testing may not be an unnecessary expense but rather a beneficial and needed service that can help focus intervention efforts. Although such testing will not indicate a need for special education placement, it can be useful in identifying children's needs for other programs known to improve language, cognitive, and academic skills, such as Head Start, Title I services, tutoring, private speech-language therapy, and quality day care.

Medical ethical standards in dermatology: an analytical study of knowledge, attitudes and practices.

PubMed

Mostafa, W Z; Abdel Hay, R M; El Lawindi, M I

2015-01-01

Dermatology practice has not been ethically justified at all times. The objective of the study was to find out dermatologists' knowledge about medical ethics, their attitudes towards regulatory measures and their practices, and to study the different factors influencing the knowledge, the attitude and the practices of dermatologists. This is a cross-sectional comparative study conducted among 214 dermatologists, from five Academic Universities and from participants in two conferences. A 54 items structured anonymous questionnaire was designed to describe the demographical characteristics of the study group as well as their knowledge, attitude and practices regarding the medical ethics standards in clinical and research settings. Five scoring indices were estimated regarding knowledge, attitude and practice. Inferential statistics were used to test differences between groups as indicated. The Student's t-test and analysis of variance were carried out for quantitative variables. The chi-squared test was conducted for qualitative variables. The results were considered statistically significant at a P > 0.05. Analysis of the possible factors having impact on the overall scores revealed that the highest knowledge scores were among dermatologists who practice in an academic setting plus an additional place; however, this difference was statistically non-significant (P = 0.060). Female dermatologists showed a higher attitude score compared to males (P = 0.028). The highest significant attitude score (P = 0.019) regarding clinical practice was recorded among those practicing cosmetic dermatology. The different studied groups of dermatologists revealed a significant impact on the attitude score (P = 0.049), and the evidence-practice score (P < 0.001). Ethical practices will improve the quality and integrity of dermatology research. © 2014 European Academy of Dermatology and Venereology.
High Noon for High Stakes: Alfie Kohn at Middlebury College.

ERIC Educational Resources Information Center

Barna, Ed

2002-01-01

The tougher standards movement has five fatal flaws. An emphasis on scores limits student willingness to experiment and be challenged. The "basic skills" approach to teaching--pouring knowledge down student throats--has never worked well. Standardized testing necessarily creates winners and losers. Accountability is coercive and…
Establishing Proficiency Standards for High School Graduation.

ERIC Educational Resources Information Center

Herron, Marshall D.

The Oregon State Board of Education has rejected the use of cut-off scores on a proficiency test to establish minimum performance standards for high school graduation. Instead, each school district is required to specify--by local board adoption--minimum competencies in reading, writing, listening, speaking, analyzing, and computing. These…
Prediction of true test scores from observed item scores and ancillary data.

PubMed

Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

2015-05-01

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Sealant retention is better assessed through colour photographs than through the replica and the visual examination methods.

PubMed

Hu, Xuan; Fan, Mingwan; Rong, Wensheng; Lo, Edward C M; Bronkhorst, Ewald; Frencken, Jo E

2014-08-01

The aim of this study was to test the hypothesis that the colour photograph method has a higher level of validity for assessing sealant retention than the visual clinical examination and replica methods. Sealed molars were assessed by two evaluators. The scores for the three methods were compared against consensus scores derived through assessing retention from scanning electron microscopy images (reference standard). The presence/absence (survival) of retained sealants on occlusal surfaces was determined according to the traditional and modified categorizations of retention. Sensitivity, specificity, and Youden-index scores were calculated. Sealant retention assessment scores for visual clinical examinations and for colour photographs were compared with those of the reference standard on 95 surfaces, and sealant retention assessment scores for replicas were compared with those of the reference standard on 33 surfaces. The highest mean Youden-index score for the presence/absence of sealant material was observed for the colour photograph method, followed by that for the replica method; the visual clinical examination method scored lowest. The mean Youden-index score for the survival of retained sealants was highest for the colour photograph method for both the traditional (0.882) and the modified (0.768) categories of sealant retention, whilst the visual clinical examination method had the lowest Youden-index score for these categories (0.745 and 0.063, respectively). The colour photograph method had a higher validity than the replica and the visual examination methods for assessing sealant retention. © 2014 Eur J Oral Sci.
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study

PubMed Central

Hashmi, Ali M.; Naz, Shahana; Asif, Aftab; Khawaja, Imran S.

2016-01-01

Objective: To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. Methods: After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. Results: The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. Conclusion: The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research. PMID:28083049
Urdu translation of the Hamilton Rating Scale for Depression: Results of a validation study.

PubMed

Hashmi, Ali M; Naz, Shahana; Asif, Aftab; Khawaja, Imran S

2016-01-01

To develop a standardized validated version of the Hamilton Rating Scale for Depression (HAM-D) in Urdu. After translation of the HAM-D into the Urdu language following standard guidelines, the final Urdu version (HAM-D-U) was administered to 160 depressed outpatients. Inter-item correlation was assessed by calculating Cronbach alpha. Correlation between HAM-D-U scores at baseline and after a 2-week interval was evaluated for test-retest reliability. Moreover, scores of two clinicians on HAM-D-U were compared for inter-rater reliability. For establishing concurrent validity, scores of HAM-D-U and BDI-U were compared by using Spearman correlation coefficient. The study was conducted at Mayo Hospital, Lahore, from May to December 2014. The Cronbach alpha for HAM-D-U was 0.71. Composite scores for HAM-D-U at baseline and after a 2-week interval were also highly correlated with each other (Spearman correlation coefficient 0.83, p-value < 0.01) indicating good test-retest reliability. Composite scores for HAM-D-U and BDI-U were positively correlated with each other (Spearman correlation coefficient 0.85, p < 0.01) indicating good concurrent validity. Scores of two clinicians for HAM-D-U were also positively correlated (Spearman correlation coefficient 0.82, p-value < 0.01) indicated good inter-rater reliability. The HAM-D-U is a valid and reliable instrument for the assessment of Depression. It shows good inter-rater and test-retest reliability. The HAM-D-U can be a tool either for clinical management or research.
Rhythm Perception and Its Role in Perception and Learning of Dysrhythmic Speech.

PubMed

Borrie, Stephanie A; Lansford, Kaitlin L; Barrett, Tyson S

2017-03-01

The perception of rhythm cues plays an important role in recognizing spoken language, especially in adverse listening conditions. Indeed, this has been shown to hold true even when the rhythm cues themselves are dysrhythmic. This study investigates whether expertise in rhythm perception provides a processing advantage for perception (initial intelligibility) and learning (intelligibility improvement) of naturally dysrhythmic speech, dysarthria. Fifty young adults with typical hearing participated in 3 key tests, including a rhythm perception test, a receptive vocabulary test, and a speech perception and learning test, with standard pretest, familiarization, and posttest phases. Initial intelligibility scores were calculated as the proportion of correct pretest words, while intelligibility improvement scores were calculated by subtracting this proportion from the proportion of correct posttest words. Rhythm perception scores predicted intelligibility improvement scores but not initial intelligibility. On the other hand, receptive vocabulary scores predicted initial intelligibility scores but not intelligibility improvement. Expertise in rhythm perception appears to provide an advantage for processing dysrhythmic speech, but a familiarization experience is required for the advantage to be realized. Findings are discussed in relation to the role of rhythm in speech processing and shed light on processing models that consider the consequence of rhythm abnormalities in dysarthria.
Stroop Color-Word Interference Test: Normative data for Spanish-speaking pediatric population.

PubMed

Rivera, D; Morlett-Paredes, A; Peñalver Guia, A I; Irías Escher, M J; Soto-Añari, M; Aguayo Arelis, A; Rute-Pérez, S; Rodríguez-Lorenzana, A; Rodríguez-Agudelo, Y; Albaladejo-Blázquez, N; García de la Cadena, C; Ibáñez-Alfonso, J A; Rodriguez-Irizarry, W; García-Guerrero, C E; Delgado-Mejía, I D; Padilla-López, A; Vergara-Moragues, E; Barrios Nevado, M D; Saracostti Schwartzman, M; Arango-Lasprilla, J C

2017-01-01

To generate normative data for the Stroop Word-Color Interference test in Spanish-speaking pediatric populations. The sample consisted of 4,373 healthy children from nine countries in Latin America (Chile, Cuba, Ecuador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Puerto Rico) and Spain. Each participant was administered the Stroop Word-Color Interference test as part of a larger neuropsychological battery. The Stroop Word, Stroop Color, Stroop Word-Color, and Stroop Interference scores were normed using multiple linear regressions and standard deviations of residual values. Age, age2, sex, and mean level of parental education (MLPE) were included as predictors in the analyses. The final multiple linear regression models showed main effects for age on all scores, except on Stroop Interference for Guatemala, such that scores increased linearly as a function of age. Age2 affected Stroop Word scores for all countries, Stroop Color scores for Ecuador, Mexico, Peru, and Spain; Stroop Word-Color scores for Ecuador, Mexico, and Paraguay; and Stroop Interference scores for Cuba, Guatemala, and Spain. MLPE affected Stroop Word scores for Chile, Mexico, and Puerto Rico; Stroop Color scores for Mexico, Puerto Rico, and Spain; Stroop Word-Color scores for Ecuador, Guatemala, Mexico, Puerto Rico and Spain; and Stroop-Interference scores for Ecuador, Mexico, and Spain. Sex affected Stroop Word scores for Spain, Stroop Color scores for Mexico, and Stroop Interference for Honduras. This is the largest Spanish-speaking pediatric normative study in the world, and it will allow neuropsychologists from these countries to have a more accurate approach to interpret the Stroop Word-Color Interference test in pediatric populations.
Title I Schools: The Student-Based Impact of Online, On-Demand Professional Development on Educators

ERIC Educational Resources Information Center

Shaha, Steven; Glassett, Kelly; Copas, Aimee; Ellsworth, Heather

2015-01-01

Title I students remain among the most challenging population for achieving significant gains in academic performance and standardized test scores. This multi-state, quasi-experimental, pre-versus-post study reflects the comparative Title I gains for math and reading scores for teachers participating in an online, on-demand professional…
Incorporating Learning Characteristics into Automatic Essay Scoring Models: What Individual Differences and Linguistic Features Tell Us about Writing Quality

ERIC Educational Resources Information Center

Crossley, Scott A.; Allen, Laura K.; Snow, Erica L.; McNamara, Danielle S.

2016-01-01

This study investigates a novel approach to automatically assessing essay quality that combines natural language processing approaches that assess text features with approaches that assess individual differences in writers such as demographic information, standardized test scores, and survey results. The results demonstrate that combining text…
Qualitative Analysis of the Performance of Introverts and Extraverts on Standard Progressive Matrices

ERIC Educational Resources Information Center

Mohan, Vidhu; Kumar, Dalip

1976-01-01

Does measurement of intelligence through a concolidated score imply that two or more subjects obtaining the same score are also undergoing the same mental process? Introverts are supposed to opt for accuracy and extraverts for speed. Attempts to investigate the qualitative differences between extraverts and introverts on an intelligence test.…
The American Education Diet: Can U.S. Students Survive on Junk Food?

ERIC Educational Resources Information Center

DeSchryver, Dave

U.S. student scores in science compare unfavorably with those of other nations, and other standardized test scores by U.S. students are also comparatively lower. Polls of parents indicate dissatisfaction with U.S. education. College teachers and employers feel that high school graduates are weak in skills. This document offers negative perceptions…
Academic and Nonacademic Characteristics as Predictors of Persistence in an Associate Degree Nursing Program. AIR Forum 1981 Paper.

ERIC Educational Resources Information Center

Donsky, Aaron P.; Judge, Albert J., Jr.

Academic and nonacademic variables that may predict persistence in the nursing program at Lakeland Community College, Ohio, were studied. The academic variables included American College Testing program standard scores, National League for Nursing (NLN) rank scores, high school grade point average, and previous college grade point average. The…
Turkish Students' Scientific Literacy Scores: A Multilevel Analysis of Data from Program for International Student Assessment

ERIC Educational Resources Information Center

Yilmaz, Haci Bayram

2009-01-01

A vast majority of the studies exploring the associations between student and school related factors and standardized test scores were conducted in developed countries. On the other hand, research suggests that the generalization of the findings of those studies to developing countries often leads to incorrect conclusions. The purpose of this…
The Relationship of Religious Involvement to a Variety of Indicators of Cognitive Ability and Achievement in College Students.

ERIC Educational Resources Information Center

Zern, David S.

1987-01-01

Undergraduates reported anonymously their degree of religiousness, their Scholastic Aptitude Test (SAT) scores, and their grade point averages (GPAs). Found religiousness negatively related to ability, and not related to achievement. The students' capacity to maximize their potential, measured by the standard score difference between GPA and SAT,…
Double oral esomeprazole after a 3-day intravenous esomeprazole infusion reduces recurrent peptic ulcer bleeding in high-risk patients: a randomised controlled study.

PubMed

Cheng, Hsiu-Chi; Wu, Chung-Tai; Chang, Wei-Lun; Cheng, Wei-Chun; Chen, Wei-Ying; Sheu, Bor-Shyang

2014-12-01

Patients with high Rockall scores have increased risk of ulcer rebleeding after 3-day esomeprazole infusions. To investigate whether double oral esomeprazole given after a 3-day esomeprazole infusion decreases ulcer rebleeding for patients with high Rockall scores. We prospectively enrolled 293 patients with peptic ulcer bleeding who had achieved endoscopic haemostasis. After a 3-day esomeprazole infusion, patients with Rockall scores ≥6 were randomised into the oral double-dose group (n=93) or the oral standard-dose group (n=94) to receive 11 days of oral esomeprazole 40 mg twice daily or once daily, respectively. The patients with Rockall scores <6 served as controls (n=89); they received 11 days of oral esomeprazole 40 mg once daily. Thereafter, all patients received oral esomeprazole 40 mg once daily for two more weeks until the end of the 28-day study period. The primary end point was peptic ulcer rebleeding. Among patients with Rockall scores ≥6, the oral double-dose group had a higher cumulative rebleeding-free proportion than the oral standard-dose group (p=0.02, log-rank test). The proportion of patients free from recurrent bleeding during the 4th-28th day in the oral double-dose group remained lower than that of the group with Rockall scores <6 (p=0.03, log-rank test). Among patients with Rockall scores ≥6, the rebleeding rate was lower in the oral double-dose group than in the oral standard-dose group (4th-28th day: 10.8% vs 28.7%, p=0.002). Double oral esomeprazole at 40 mg twice daily after esomeprazole infusion reduced recurrent peptic ulcer bleeding in high-risk patients with Rockall scores ≥6. NCT01591083. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Synthesizing Information From Language Samples and Standardized Tests in School-Age Bilingual Assessment

PubMed Central

Pham, Giang

2017-01-01

Purpose Although language samples and standardized tests are regularly used in assessment, few studies provide clinical guidance on how to synthesize information from these testing tools. This study extends previous work on the relations between tests and language samples to a new population—school-age bilingual speakers with primary language impairment—and considers the clinical implications for bilingual assessment. Method Fifty-one bilingual children with primary language impairment completed narrative language samples and standardized language tests in English and Spanish. Children were separated into younger (ages 5;6 [years;months]–8;11) and older (ages 9;0–11;2) groups. Analysis included correlations with age and partial correlations between language sample measures and test scores in each language. Results Within the younger group, positive correlations with large effect sizes indicated convergence between test scores and microstructural language sample measures in both Spanish and English. There were minimal correlations in the older group for either language. Age related to English but not Spanish measures. Conclusions Tests and language samples complement each other in assessment. Wordless picture-book narratives may be more appropriate for ages 5–8 than for older children. We discuss clinical implications, including a case example of a bilingual child with primary language impairment, to illustrate how to synthesize information from these tools in assessment. PMID:28055056
Language of administration and neuropsychological test performance in neurologically intact Hispanic American bilingual adults.

PubMed

Gasquoine, Philip Gerard; Croyle, Kristin L; Cavazos-Gonzalez, Cynthia; Sandoval, Omar

2007-11-01

This study compared the performance of Hispanic American bilingual adults on Spanish and English language versions of a neuropsychological test battery. Language achievement test scores were used to divide 36 bilingual, neurologically intact, Hispanic Americans from south Texas into Spanish-dominant, balanced, and English-dominant bilingual groups. They were administered the eight subtests of the Bateria Neuropsicologica and the Matrix Reasoning subtest of the WAIS-III in Spanish and English. Half the participants were tested in Spanish first. Balanced bilinguals showed no significant differences in test scores between Spanish and English language administrations. Spanish and/or English dominant bilinguals showed significant effects of language of administration on tests with higher language compared to visual perceptual weighting (Woodcock-Munoz Language Survey-Revised, Letter Fluency, Story Memory, and Stroop Color and Word Test). Scores on tests with higher visual-perceptual weighting (Matrix Reasoning, Figure Memory, Wisconsin Card Sorting Test, and Spatial Span), were not significantly affected by language of administration, nor were scores on the Spanish/California Verbal Learning Test, and Digit Span. A problem was encountered in comparing false positive rates in each language, as Spanish norms fell below English norms, resulting in a much higher false positive rate in English across all bilingual groupings. Use of a comparison standard (picture vocabulary score) reduced false positive rates in both languages, but the higher false positive rate in English persisted.
The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing

ERIC Educational Resources Information Center

Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi

2013-01-01

Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…

76 FR 26853 - Commercial Driver's License Testing and Commercial Learner's Permit Standards

Federal Register 2010, 2011, 2012, 2013, 2014

2011-05-09

... b. Pre-Trip Inspection c. Skills Test Banking Prohibition d. Gross Vehicle Weight Rating (GVWR... electronic method of transmitting test scores works best for them. At least one State currently has an... Issuing a CLP a. Passing the General Knowledge Test To Obtain a CLP b. Requiring the CLP To Be a Separate...
Letter Imperfect

ERIC Educational Resources Information Center

Kramer, Stephen

2003-01-01

In this essay, the author, a 5th-grade teacher, questions how well a standardized test can measure his students. This article presents a letter he wrote for the Washington state science test scorer regarding his students' test scores. He shares stories about some of the students in his class. He points out that tests can turn out to be more like…
The Effect of Eliminating Time Restraints on a Standardized Test with American Indian Adults.

ERIC Educational Resources Information Center

Immerman, Michael A.

To investigate the effect of time restraints on the diagnostic test scores of Native American students entering Bureau of Indian Affairs schools, two groups of students at Southwestern Indian Polytechnic Institute (SIPI) in Albuquerque, New Mexico, were given the Stanford Diagnostic Reading Test, (Blue Level), 1977 edition. The test scores…
Economic Literacy, Teacher Instruction, and Preparation for the World of Work.

ERIC Educational Resources Information Center

Walstad, William B.; Soper, John C.

This paper analyzes the economic knowledge of high school students based on national data from 8,000 students who took the revised "Test of Economic Literacy," a nationally normed and standardized achievement test in economics. First, the validity and reliability features of the test are presented and then the test scores are broken down…
Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme.

PubMed

Haley, Stephen M; Fragala-Pinkham, Maria; Ni, Pengsheng

2006-07-01

To examine the relative sensitivity to detect functional mobility changes with a full-length parent questionnaire compared with a computerized adaptive testing version of the questionnaire after a 16-week group fitness programme. Prospective, pre- and posttest study with a 16-week group fitness intervention. Three community-based fitness centres. Convenience sample of children (n = 28) with physical or developmental disabilities. A 16-week group exercise programme held twice a week in a community setting. A full-length (161 items) paper version of a mobility parent questionnaire based on the Pediatric Evaluation of Disability Inventory, but expanded to include expected skills of children up to 15 years old was compared with a 15-item computer adaptive testing version. Both measures were administered at pre- and posttest intervals. Both the full-length Pediatric Evaluation of Disability Inventory and the 15-item computer adaptive testing version detected significant changes between pre- and posttest scores, had large effect sizes, and standardized response means, with a modest decrease in the computer adaptive test as compared with the 161-item paper version. Correlations between the computer adaptive and paper formats across pre- and posttest scores ranged from r = 0.76 to 0.86. Both functional mobility test versions were able to detect positive functional changes at the end of the intervention period. Greater variability in score estimates was generated by the computerized adaptive testing version, which led to a relative reduction in sensitivity as defined by the standardized response mean. Extreme scores were generally more difficult for the computer adaptive format to estimate with as much accuracy as scores in the mid-range of the scale. However, the reduction in accuracy and sensitivity, which did not influence the group effect results in this study, is counterbalanced by the large reduction in testing burden.
Training improves laparoscopic tasks performance and decreases operator workload.

PubMed

Hu, Jesse S L; Lu, Jirong; Tan, Wee Boon; Lomanto, Davide

2016-05-01

It has been postulated that increased operator workload during task performance may increase fatigue and surgical errors. The National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is a validated tool for self-assessment for workload. Our study aims to assess the relationship of workload and performance of novices in simulated laparoscopic tasks of different complexity levels before and after training. Forty-seven novices without prior laparoscopic experience were recruited in a trial to investigate whether training improves task performance as well as mental workload. The participants were tested on three standard tasks (ring transfer, precision cutting and intracorporeal suturing) in increasing complexity based on the Fundamentals of Laparoscopic Surgery (FLS) curriculum. Following a period of training and rest, participants were tested again. Test scores were computed from time taken and time penalties for precision errors. Test scores and NASA-TLX scores were recorded pre- and post-training and analysed using paired t tests. One-way repeated measures ANOVA was used to analyse differences in NASA-TLX scores between the three tasks. NASA-TLX score was lowest with ring transfer and highest with intracorporeal suturing. This was statistically significant in both pre-training (p < 0.001) and post-training (p < 0.001). NASA-TLX scores mirror the changes in test scores for the three tasks. Workload scores decreased significantly after training for all three tasks (ring transfer = 2.93, p < 0.001, precision cutting = 3.74, p < 0.001, intracorporeal suturing = 2.98, p < 0.001). NASA-TLX score is an accurate reflection of the complexity of simulated laparoscopic tasks in the FLS curriculum. This also correlates with the relationship of test scores between the three tasks. Simulation training improves both performance score and workload score across the tasks.
Sentence level auditory comprehension treatment program for aphasic adults.

PubMed

Naeser, M A; Haas, G; Mazurski, P; Laughlin, S

1986-06-01

The purpose of this study was to investigate whether a newly developed sentence level auditory comprehension (SLAC) treatment program could be used to improve language comprehension test scores in adults with chronic aphasia. Results indicate that the SLAC treatment program can be used with chronic patients; performance on a standardized test (the Token Test) was improved after treatment; and improved performance could not be predicted from either anatomic CT scan lesion sites or pretreatment test scores. One advantage to the SLAC treatment program is that the patient can practice listening independently with a tape recorder device (Language Master) and earphones either in the hospital or at home.
The Effect of Extended Test Time for Students with Attention-Deficit Hyperactivity Disorder

ERIC Educational Resources Information Center

Wadley, M. Nichole; Liljequist, Laura

2013-01-01

The purpose of the present study was to investigate whether a specific testing accommodation (extended time) affects test scores for college students with and without ADHD. College students with ADHD (N = 61) and without ADHD (N = 68) took a math test, after having been told they had either standard time or extended time to complete the test.…
Providing Test Performance Feedback That Bridges Assessment and Instruction: The Case of Two Standardized English Language Tests in Japan

ERIC Educational Resources Information Center

Sawaki, Yasuyo; Koizumi, Rie

2017-01-01

This small-scale qualitative study considers feedback and results reported for two major large-scale English language tests administered in Japan: the Global Test of English Communication for Students (GTECfS) and the Eiken Test in Practical English Proficiency (Eiken). Specifically, it examines current score-reporting practices in student and…
Timely diagnosis of dairy calf respiratory disease using a standardized scoring system.

PubMed

McGuirk, Sheila M; Peek, Simon F

2014-12-01

Respiratory disease of young dairy calves is a significant cause of morbidity, mortality, economic loss, and animal welfare concern but there is no gold standard diagnostic test for antemortem diagnosis. Clinical signs typically used to make a diagnosis of respiratory disease of calves are fever, cough, ocular or nasal discharge, abnormal breathing, and auscultation of abnormal lung sounds. Unfortunately, routine screening of calves for respiratory disease on the farm is rarely performed and until more comprehensive, practical and affordable respiratory disease-screening tools such as accelerometers, pedometers, appetite monitors, feed consumption detection systems, remote temperature recording devices, radiant heat detectors, electronic stethoscopes, and thoracic ultrasound are validated, timely diagnosis of respiratory disease can be facilitated using a standardized scoring system. We have developed a scoring system that attributes severity scores to each of four clinical parameters; rectal temperature, cough, nasal discharge, ocular discharge or ear position. A total respiratory score of five points or higher (provided that at least two abnormal parameters are observed) can be used to distinguish affected from unaffected calves. This can be applied as a screening tool twice-weekly to identify pre-weaned calves with respiratory disease thereby facilitating early detection. Coupled with effective treatment protocols, this scoring system will reduce post-weaning pneumonia, chronic pneumonia, and otitis media.
Cross-cultural adaptation and validation of Persian Achilles tendon Total Rupture Score.

PubMed

Ansari, Noureddin Nakhostin; Naghdi, Soofia; Hasanvand, Sahar; Fakhari, Zahra; Kordi, Ramin; Nilsson-Helander, Katarina

2016-04-01

To cross-culturally adapt the Achilles tendon Total Rupture Score (ATRS) to Persian language and to preliminary evaluate the reliability and validity of a Persian ATRS. A cross-sectional and prospective cohort study was conducted to translate and cross-culturally adapt the ATRS to Persian language (ATRS-Persian) following steps described in guidelines. Thirty patients with total Achilles tendon rupture and 30 healthy subjects participated in this study. Psychometric properties of floor/ceiling effects (responsiveness), internal consistency reliability, test-retest reliability, standard error of measurement (SEM), smallest detectable change (SDC), construct validity, and discriminant validity were tested. Factor analysis was performed to determine the ATRS-Persian structure. There were no floor or ceiling effects that indicate the content and responsiveness of ATRS-Persian. Internal consistency was high (Cronbach's α 0.95). Item-total correlations exceeded acceptable standard of 0.3 for the all items (0.58-0.95). The test-retest reliability was excellent [(ICC)agreement 0.98]. SEM and SDC were 3.57 and 9.9, respectively. Construct validity was supported by a significant correlation between the ATRS-Persian total score and the Persian Foot and Ankle Outcome Score (PFAOS) total score and PFAOS subscales (r = 0.55-0.83). The ATRS-Persian significantly discriminated between patients and healthy subjects. Explanatory factor analysis revealed 1 component. The ATRS was cross-culturally adapted to Persian and demonstrated to be a reliable and valid instrument to measure functional outcomes in Persian patients with Achilles tendon rupture. II.
Visual-Motor Integration in Children With Mild Intellectual Disability: A Meta-Analysis.

PubMed

Memisevic, Haris; Djordjevic, Mirjana

2018-01-01

Visual-motor integration (VMI) skills, defined as the coordination of fine motor and visual perceptual abilities, are a very good indicator of a child's overall level of functioning. Research has clearly established that children with intellectual disability (ID) have deficits in VMI skills. This article presents a meta-analytic review of 10 research studies involving 652 children with mild ID for which a VMI skills assessment was also available. We measured the standardized mean difference (Hedges' g) between scores on VMI tests of these children with mild ID and either typically developing children's VMI test scores in these studies or normative mean values on VMI tests used by the studies. While mild ID is defined in part by intelligence scores that are two to three standard deviations below those of typically developing children, the standardized mean difference of VMI differences between typically developing children and children with mild ID in this meta-analysis was 1.75 (95% CI [1.11, 2.38]). Thus, the intellectual and adaptive skill deficits of children with mild ID may be greater (perhaps especially due to their abstract and conceptual reasoning deficits) than their relative VMI deficits. We discuss the possible meaning of this relative VMI strength among children with mild ID and suggest that their stronger VMI skills may be a target for intensive academic interventions as a means of attenuating problems in adaptive functioning.
Greater power and computational efficiency for kernel-based association testing of sets of genetic variants.

PubMed

Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer

2014-11-15

Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. heckerma@microsoft.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Public Perception of the Burden of Microtia.

PubMed

Byun, Stephanie; Hong, Paul; Bezuhly, Michael

2016-10-01

Microtia is associated with psychosocial burden and stigma. The authors' objective was to determine the potential impact of being born with microtia by using validated health state utility assessment measures. An online utility assessment using visual analogue scale, time tradeoff, and standard gamble was used to determine utilities for microtia with or without ipsilateral deafness, monocular blindness, and binocular blindness from a prospective sample of the general population. Utility scores were compared between health states using Wilcoxon and Kruskal-Wallis tests. Univariate regression was performed using sex, age, race, and education as independent predictors of utility scores. Over a 6-month enrollment period, 104 participants were included in the analysis. Visual analogue scale (median 0.80, interquartile range [0.72-0.85]), time tradeoff (0.88 [0.77-0.91]), and standard gamble (0.91 [0.84-0.97]) scores for microtia with ipsilateral deafness were higher (P <0.01) than those of binocular blindness (visual analogue scale, 0.30 [0.20-0.45]; time tradeoff, 0.42 [0.17-0.67]; and standard gamble, 0.52 [0.36-0.78]). Time trade-off scores for microtia with deafness were not different from monocular blindness (0.83 [0.67-0.91]). Higher level of education was associated with higher time tradeoff and standard gamble scores for microtia with or without deafness (P <0.05). Using objective health state utility scores, the current study demonstrates that the perceived burden of microtia with or without deafness is no different or less than monocular blindness. Given high utility scores for microtia, delaying autologous reconstruction beyond school entrance age may be justified.
Comparing the MMPI-2 scale scores of parents involved in parental competency and child custody assessments.

PubMed

Resendes, John; Lecci, Len

2012-12-01

MMPI-2 scores from a parent competency sample (N = 136 parents) are compared with a previously published data set of MMPI-2 scores for child custody litigants (N = 508 parents; Bathurst et al., 1997). Independent samples t tests yielded significant and in some cases substantial differences on the standard MMPI-2 clinical scales (especially Scales 4, 8, 2, and 0), with the competency sample obtaining higher clinical scores as well as higher scores on F, FB, VRIN, TRIN, and L, but lower scores on K, relative to the custody sample. Despite the higher scores in the competency sample, MMPI-2 mean scores did not exceed the clinical cutoff (T > 65). Moreover, the present competency sample essentially replicates the MMPI-2 scores of a previously published competency sample, suggesting that the present findings are representative of that population. The present findings suggest that separate reference groups be used when conducting child custody vs. parental competency evaluations, as these appear to be distinct populations despite there being similarities in the testing circumstances.
External validation of the HIT Expert Probability (HEP) score.

PubMed

Joseph, Lee; Gomes, Marcelo P V; Al Solaiman, Firas; St John, Julie; Ozaki, Asuka; Raju, Manjunath; Dhariwal, Manoj; Kim, Esther S H

2015-03-01

The diagnosis of heparin-induced thrombocytopenia (HIT) can be challenging. The HIT Expert Probability (HEP) Score has recently been proposed to aid in the diagnosis of HIT. We sought to externally and prospectively validate the HEP score. We prospectively assessed pre-test probability of HIT for 51 consecutive patients referred to our Consultative Service for evaluation of possible HIT between August 1, 2012 and February 1, 2013. Two Vascular Medicine fellows independently applied the 4T and HEP scores for each patient. Two independent HIT expert adjudicators rendered a diagnosis of HIT likely or unlikely. The median (interquartile range) of 4T and HEP scores were 4.5 (3.0, 6.0) and 5 (3.0, 8.5), respectively. There were no significant differences between area under receiver-operating characteristic curves of 4T and HEP scores against the gold standard, confirmed HIT [defined as positive serotonin release assay and positive anti-PF4/heparin ELISA] (0.74 vs 0.73, p = 0.97). HEP score ≥ 2 was 100 % sensitive and 16 % specific for determining the presence of confirmed HIT while a 4T score > 3 was 93 % sensitive and 35 % specific. In conclusion, the HEP and 4T scores are excellent screening pre-test probability models for HIT, however, in this prospective validation study, test characteristics for the diagnosis of HIT based on confirmatory laboratory testing and expert opinion are similar. Given the complexity of the HEP scoring model compared to that of the 4T score, further validation of the HEP score is warranted prior to widespread clinical acceptance.
Measuring What Matters: Robert Sternberg's Enlightened Approach to Admissions Testing

ERIC Educational Resources Information Center

Grace, Catherine O'Neill

2011-01-01

Psychologist Robert J. Sternberg's conviction that American standardized testing does not accurately reflect a child's intelligence or potential is far from theoretical. As an elementary school student in the 1950s, he scored poorly on the ubiquitous IQ test of the time, freezing up when the school psychologist entered the room. Thankfully for…
Assessment of American Indian Children as Measured by the SON-R and WISC-III.

ERIC Educational Resources Information Center

Curran, Lisa; And Others

A major criticism of standardized intelligence tests is their improper use in measuring the intellectual competence of culturally diverse children. Factors which complicate the issue are the definition of intelligence, content bias in intelligence tests, and the interpretation of test scores between white middle class children and children of…
Appropriateness Measurement with Polychotomous Item Response Models and Standardized Indices. Measurement Series, 84-1.

ERIC Educational Resources Information Center

Drasgow, Fritz; And Others

The test scores of some examinees on a multiple-choice test may not provide adequate measures of their abilities. The goal of appropriateness measurement is to identify such individuals. Earlier theoretical and experimental work considered examinees answering all, or almost all, test items. This article reports research that extends…
Test-Taking Strategy as a Mediator between Race and Academic Performance

ERIC Educational Resources Information Center

Dollinger, Stephen J.; Clark, M. H.

2012-01-01

The issue of race differences in standardized test scores and academic achievement continues to be a vexing one for behavioral scientists and society at large. Ellis and Ryan (2003) suggested that a portion of the cognitive-ability test performance differences between White/Caucasian-American and Black/African-American college students could be…

The Spelling Project. Technical Report 1992-2.

ERIC Educational Resources Information Center

Green, Kathy E.; Schroeder, David H.

Results of an analysis of a newly developed spelling test and several related measures are reported. Information about the reliability of a newly developed spelling test; its distribution of scores; its relationship with the standard battery of aptitude tests of the Johnson O'Connor Research Foundation; and its relationships with sex, age,…
The Effects of a Translation Bias on the Scores for the "Basic Economics Test"

ERIC Educational Resources Information Center

Hahn, Jinsoo; Jang, Kyungho

2012-01-01

International comparisons of economic understanding generally require a translation of a standardized test written in English into another language. Test results can differ based on how researchers translate the English written exam into one in their own language. To confirm this hypothesis, two differently translated versions of the "Basic…
Rationale and Use of Content-Relevant Achievement Tests for the Evaluation of Instructional Programs.

ERIC Educational Resources Information Center

Patalino, Marianne

Problems in current course evaluation methods are discussed and an alternative method is described for the construction, analysis, and interpretation of a test to evaluate instructional programs. The method presented represents a different approach to the traditional overreliance on standardized achievement tests and the total scores they provide.…
An Examination of Teachers' Effects on High, Middle, and Low Aptitude Students' Performance on a Standardized Achievement Test

ERIC Educational Resources Information Center

Good, Thomas L.; Beckerman, Terrill M.

1978-01-01

Teacher effectiveness was defined by students' mathematics score on the Iowa Test of Basic Skills while achievement was measured by the Cognitive Abilities Test. Relatively effective teachers generally produced achievement gains from all aptitude levels. Similarly, relatively ineffective teachers did not disproportionately depress achievement for…
Differential Gender Performance on the Major Field Test-Business

ERIC Educational Resources Information Center

Bielinska-Kwapisz, Agnieszka; Brown, F. William

2013-01-01

The Major Field Test in Business (MFT-B), a standardized assessment test of business knowledge among undergraduate business seniors, is widely used to measure student achievement. Many previous studies analyzing scores on the MFT-B report gender differences on the exam even after controlling for student's aptitude, general intellectual ability,…
Comparison of mortality prediction models and validation of SAPS II in critically ill burns patients.

PubMed

Pantet, O; Faouzi, M; Brusselaers, N; Vernay, A; Berger, M M

2016-06-30

Specific burn outcome prediction scores such as the Abbreviated Burn Severity Index (ABSI), Ryan, Belgian Outcome of Burn Injury (BOBI) and revised Baux scores have been extensively studied. Validation studies of the critical care score SAPS II (Simplified Acute Physiology Score) have included burns patients but not addressed them as a cohort. The study aimed at comparing their performance in a Swiss burns intensive care unit (ICU) and to observe whether they were affected by a standardized definition of inhalation injury. We conducted a retrospective cohort study, including all consecutive ICU burn admissions (n=492) between 1996 and 2013: 5 epochs were defined by protocol changes. As required for SAPS II calculation, stays <24h were excluded. Data were collected on age, gender, total body surface area burned (TBSA) and inhalation injury (systematic standardized diagnosis since 2006). Study epochs were compared (χ2 test, ANOVA). Score performance was assessed by receiver operating characteristic curve analysis. SAPS II performed well (AUC 0.89), particularly in burns <40% TBSA (AUC 0.93). Revised Baux and ABSI scores were not affected by the standardized diagnosis of inhalation injury and showed the best performance (AUC 0.92 and 0.91 respectively). In contrast, the accuracy of the BOBI and Ryan scores was lower (AUC 0.84 and 0.81) and reduced after 2006. The excellent predictive performance of the classic scores (revised Baux score and ABSI) was confirmed. SAPS II was nearly as accurate, particularly in burns <40% TBSA. Ryan and BOBI scores were least accurate, as they heavily weight inhalation injury.
Comparison of mortality prediction models and validation of SAPS II in critically ill burns patients

PubMed Central

Pantet, O.; Faouzi, M.; Brusselaers, N.; Vernay, A.; Berger, M.M.

2016-01-01

Summary Specific burn outcome prediction scores such as the Abbreviated Burn Severity Index (ABSI), Ryan, Belgian Outcome of Burn Injury (BOBI) and revised Baux scores have been extensively studied. Validation studies of the critical care score SAPS II (Simplified Acute Physiology Score) have included burns patients but not addressed them as a cohort. The study aimed at comparing their performance in a Swiss burns intensive care unit (ICU) and to observe whether they were affected by a standardized definition of inhalation injury. We conducted a retrospective cohort study, including all consecutive ICU burn admissions (n=492) between 1996 and 2013: 5 epochs were defined by protocol changes. As required for SAPS II calculation, stays <24h were excluded. Data were collected on age, gender, total body surface area burned (TBSA) and inhalation injury (systematic standardized diagnosis since 2006). Study epochs were compared (χ2 test, ANOVA). Score performance was assessed by receiver operating characteristic curve analysis. SAPS II performed well (AUC 0.89), particularly in burns <40% TBSA (AUC 0.93). Revised Baux and ABSI scores were not affected by the standardized diagnosis of inhalation injury and showed the best performance (AUC 0.92 and 0.91 respectively). In contrast, the accuracy of the BOBI and Ryan scores was lower (AUC 0.84 and 0.81) and reduced after 2006. The excellent predictive performance of the classic scores (revised Baux score and ABSI) was confirmed. SAPS II was nearly as accurate, particularly in burns <40% TBSA. Ryan and BOBI scores were least accurate, as they heavily weight inhalation injury. PMID:28149234
Demographically Corrected Normative Standards for the Spanish Language Version of the NIH Toolbox Cognition Battery.

PubMed

Casaletto, Kaitlin B; Umlauf, Anya; Marquine, Maria; Beaumont, Jennifer L; Mungas, Daniel; Gershon, Richard; Slotkin, Jerry; Akshoomoff, Natacha; Heaton, Robert K

2016-03-01

Hispanics are the fastest growing ethnicity in the United States, yet there are limited well-validated neuropsychological tools in Spanish, and an even greater paucity of normative standards representing this population. The Spanish NIH Toolbox Cognition Battery (NIHTB-CB) is a novel neurocognitive screener; however, the original norms were developed combining Spanish- and English-versions of the battery. We developed normative standards for the Spanish NIHTB-CB, fully adjusting for demographic variables and based entirely on a Spanish-speaking sample. A total of 408 Spanish-speaking neurologically healthy adults (ages 18-85 years) and 496 children (ages 3-7 years) completed the NIH Toolbox norming project. We developed three types of scores: uncorrected based on the entire Spanish-speaking cohort, age-corrected, and fully demographically corrected (age, education, sex) scores for each of the seven NIHTB-CB tests and three composites (Fluid, Crystallized, Total Composites). Corrected scores were developed using polynomial regression models. Demographic factors demonstrated medium-to-large effects on uncorrected NIHTB-CB scores in a pattern that differed from that observed on the English NIHTB-CB. For example, in Spanish-speaking adults, education was more strongly associated with Fluid scores, but showed the strongest association with Crystallized scores among English-speaking adults. Demographic factors were no longer associated with fully corrected scores. The original norms were not successful in eliminating demographic effects, overestimating children's performances, and underestimating adults' performances on the Spanish NIHTB-CB. The disparate pattern of demographic associations on the Spanish versus English NIHTB-CB supports the need for distinct normative standards developed separately for each population. Fully adjusted scores presented here will aid in more accurately characterizing acquired brain dysfunction among U.S. Spanish-speakers.
Validation of a Detailed Scoring Checklist for Use During Advanced Cardiac Life Support Certification

PubMed Central

McEvoy, Matthew D.; Smalley, Jeremy C.; Nietert, Paul J.; Field, Larry C.; Furse, Cory M.; Blenko, John W.; Cobb, Benjamin G.; Walters, Jenna L.; Pendarvis, Allen; Dalal, Nishita S.; Schaefer, John J.

2012-01-01

Introduction Defining valid, reliable, defensible, and generalizable standards for the evaluation of learner performance is a key issue in assessing both baseline competence and mastery in medical education. However, prior to setting these standards of performance, the reliability of the scores yielding from a grading tool must be assessed. Accordingly, the purpose of this study was to assess the reliability of scores generated from a set of grading checklists used by non-expert raters during simulations of American Heart Association (AHA) MegaCodes. Methods The reliability of scores generated from a detailed set of checklists, when used by four non-expert raters, was tested by grading team leader performance in eight MegaCode scenarios. Videos of the scenarios were reviewed and rated by trained faculty facilitators and by a group of non-expert raters. The videos were reviewed “continuously” and “with pauses.” Two content experts served as the reference standard for grading, and four non-expert raters were used to test the reliability of the checklists. Results Our results demonstrate that non-expert raters are able to produce reliable grades when using the checklists under consideration, demonstrating excellent intra-rater reliability and agreement with a reference standard. The results also demonstrate that non-expert raters can be trained in the proper use of the checklist in a short amount of time, with no discernible learning curve thereafter. Finally, our results show that a single trained rater can achieve reliable scores of team leader performance during AHA MegaCodes when using our checklist in continuous mode, as measures of agreement in total scoring were very strong (Lin’s Concordance Correlation Coefficient = 0.96; Intraclass Correlation Coefficient = 0.97). Discussion We have shown that our checklists can yield reliable scores, are appropriate for use by non-expert raters, and are able to be employed during continuous assessment of team leader performance during the review of a simulated MegaCode. This checklist may be more appropriate for use by Advanced Cardiac Life Support (ACLS) instructors during MegaCode assessments than current tools provided by the AHA. PMID:22863996
Using Data-Informed Instruction to Drive Education: Keeping Catholic Education a Viable and Educationally Sound Option in Challenging Times

ERIC Educational Resources Information Center

Niemeyer, Kristen; Casey, Laura B.; Williamson, Robert; Casey, Cort; Elswick, Susan E.; Black, Tom; Winsor, Denise

2016-01-01

Teachers in Catholic schools are not immune from pressures to improve students' scores on high stakes tests, and standards-based education is not new to Catholic schools. Nationally, many public school systems have moved to implement Common Core State Standards (CCSS) or other similar standards. Assessment, in turn, has been tied to these…
Composite Reliability and Standard Errors of Measurement for a Seven-Subtest Short Form of the Wechsler Adult Intelligence Scale-Revised.

ERIC Educational Resources Information Center

Schretlen, David; And Others

1994-01-01

Composite reliability and standard errors of measurement were computed for prorated Verbal, Performance, and Full-Scale intelligence quotient (IQ) scores from a seven-subtest short form of the Wechsler Adult Intelligence Scale-Revised. Results with 1,880 adults (standardization sample) indicate that this form is as reliable as the complete test.…
Reliability of visual acuity measurements taken with a notebook and a tablet computer in participants who were illiterate to Roman characters.

PubMed

Ruamviboonsuk, Paisan; Sudsakorn, Napitchareeya; Somkijrungroj, Thanapong; Engkagul, Chayanee; Tiensuwan, Montip

2012-03-01

Electronic measurement of visual acuity (VA) has been proposed and adopted as a method of determining VA scores in clinical research. Characters (optotypes) are displayed on a monitor screen and the examinee selects a match and inputs his choice to another electronic device. Unfortunately, the optotypes, called Sloan letters, in the standard protocol are 10 Roman characters. This limits their practicabilityfor measuring VA of patients who are illiterate to these characters. The authors introduced a method of displaying the Sloan letters one by one on a notebook and all 10 Sloan letters on a tablet computer screen. The former is for testing the patients whereas the latter is for them to input their responses by tapping on a letter that matches the one on the notebook screen. To assess test-retest reliability of VA scores determined with this method. Participants without ocular abnormality were recruited to have their right eyes measured with the same VA measurement method twice, one week apart. Those who were illiterate to Roman characters were enrolled for the aforementioned method for measuring their VA (Tablet group). A 15-inch display notebook computer and a 9-inch display tablet computer (iPad) communicated via a local wireless data network provided by a Wi-Fi router. Those who understood Roman characters were enrolled to have measurements with a 17-inch desktop computer and an infrared wireless keyboard (Keyboard group). Both methods used the same protocols and software for VA measurements. Reliability of VA scores obtained from each group was assessed by the confidence interval (CI) of the difference of the scores from the test and retest. The t test was used to analyze differences in mean VA scores between the test and retest in each group with p < 0.05 determined as statistically significant. There were 49 and 50 participants in the Tablet and Keyboard group respectively. The 95% CI of the difference between the scores from the test and retest in each group was 2 letters. Approximately 95% of participants in each group had an absolute difference of the scores between the test and retest of 7 letters. The mean of VA scores from the first test was significantly different from that of the second test in the Keyboard group (one-letter difference, p = 0.049); there was no significant difference between these scores in the Tablet group (0.1-letter difference, p = 0.86). Tablet computers may be used to assist patients who are illiterate to Roman characters in having their VA measured with the standard electronic protocol. This preliminary study suggested that the proposed method should be useful for reliable measuring VA outcome in multicenter international clinical trials without encountering a language barrier
A Preliminary Investigation into the Effect of Standards-Based Grading on the Academic Performance of African-American Students

NASA Astrophysics Data System (ADS)

Bradbury-Bailey, Mary

With the implementation of No Child Left Behind came a wave of educational reform intended for those working with student populations whose academic performance seemed to indicate an alienation from the educational process. Central to these reforms was the implementation of standards-based instruction and their accompanying standardized assessments; however, in one area reform seemed nonexistent---the teacher's gradebook. (Erickson, 2010, Marzano, 2006; Scriffiny, 2008). Given the link between the grading process and achievement motivation, Ames (1992) suggested the use of practices that promote mastery goal orientation. The purpose of this study was to examine the impact of standards-based grading system as a factor contributing to mastery goal orientation on the academic performance of urban African American students. To determine the degree of impact, this study first compared the course content averages and End-of-Course-Test (EOCT) scores for science classes using a traditional grading system to those using a standards-based grading system by employing an Analysis of Covariance (ANCOVA). While there was an increase in all grading areas, two showed a significant difference---the Physical Science course content average (p = 0.024) and ix the Biology EOCT scores (p = 0.0876). These gains suggest that standards-based grading can have a positive impact on the academic performance of African American students. Secondly, this study examined the correlation between the course content averages and the EOCT scores for both the traditional and standards-based grading system; for both Physical Science and Biology, there was a stronger correlation between these two scores for the standards-based grading system.
ThOMAs: the other means of assessment.

PubMed

Williams, Andrew N; Debelle, Geoffrey D; Davies, Paul; Barrett, Tim G

2005-02-01

We describe a pilot study to investigate whether drawing "Thomas the Tank Engine" could be as effective a measure of developmental progress as the Goodenough-Harris Draw A Man test against the ThOMAs test (The Other Means of Assessment), with internal validation. The study included 95 children aged between 3 and 11 years of age, including a subgroup of 13 children with registered special needs from community and general pediatric clinics within Birmingham, UK, as a means of validation. There was no significant evidence that ThOMAS was either culturally or sex biased. Using regression analysis, nine items were found to correlate highly with actual age, and their total score gave a correlation of 0.563 with age. Adding further items did not increase this. After being converted into age-standardized scores, ThOMAS was as sensitive and specific as the Draw A Man test, and more so above a defined age-standardized threshold. This pilot study suggests that drawing Thomas the Tank Engine would appear to be as sensitive and specific a means of identifying children with special needs as the Goodenough-Harris Draw A Man test. The relatively small sample size means that further research is necessary to further define the age standardizations and to refine the ThOMAs test.
Greater power and computational efficiency for kernel-based association testing of sets of genetic variants

PubMed Central

Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer

2014-01-01

Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test—a score test—with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene–gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test—up to 23 more associations—whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene–gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. Contact: heckerma@microsoft.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25075117
The relationship between participation in student-centered discussions and the academic achievement of fifth-grade science students

NASA Astrophysics Data System (ADS)

Mathues, Patricia Kelly

Although the social constructivist theory proposed by Vygotsky states the value of discourse as a contribution to the ability of the learner to create meaning, student-led discussions have often been relegated to the language arts classroom. The standards created by the National Council of Teachers of English and the International Reading Association have long recognized that learners create meaning in a social context. The National Science Education Standards have also challenged science teachers to facilitate discourse. However, the science standards document provides no specific structure through which such discourse should be taught. This study investigated the effectiveness of a discussion strategy provided by Shoop and Wright for teaching and conducting student-centered discussions (SCD). Fifth graders in one school were randomly selected and randomly assigned to one of two science classes; 22 students in one class learned and applied the SCD strategies while a second class with 19 students learned the same science concepts from a teacher using traditional methods as described by Cazden. This study used a pretest-posttest design to test the hypothesis that participation in SCD's would effect a difference in fifth-graders' abilities to comprehend science concepts. Results of independent-samples t-tests showed that while there was no significant difference between the mean ability scores of the two groups of subjects as measured by a standardized mental abilities test, the mean pretest score of the traditional group was significantly higher than the SCD group's mean pretest score. ANCOVA procedures demonstrated that the SCD group's mean posttest score was significantly higher than the mean posttest score of the traditional group. Data analysis supported the rejection of the null hypothesis. The investigator concluded that the SCD methodology contributed to students' understanding of the science concepts. Results of this study challenge content area teachers to provide direct instruction of the SCD strategies and to encourage students to engage in the construction of knowledge through such discourse. Future research should focus on the application of the SCD strategies in other settings and for various durations of time.
New Mexico Standards Based Assessment (NMSBA) Technical Report: 2006 Spring Administration

ERIC Educational Resources Information Center

Griph, Gerald W.

2006-01-01

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
The Effects of Education Accountability on Teachers: Are Policies Too-Stress Provoking for Their Own Good?

ERIC Educational Resources Information Center

Berryhill, Joseph; Linney, Jean Ann; Fromewick, Jill

2009-01-01

Education policies in the United States and other nations have established academic standards and made teachers accountable for improved standardized test scores. Because policies can have unintended effects, in this study we investigated U.S. elementary school teachers' perceptions of their state's accountability policy, particularly its effect…
Quantitative Analysis of Standardized Dress Code and Minority Academic Achievement

ERIC Educational Resources Information Center

Proctor, J. R.

2013-01-01

This study was designed to investigate if a statistically significant variance exists in African American and Hispanic students' attendance and Texas Assessment of Knowledge and Skills test scores in mathematics before and after the implementation of a standardized dress code. For almost two decades supporters and opponents of public school…
NBPTS Upgrades Profession, Most Agree, Despite Test-Score Letdown

ERIC Educational Resources Information Center

Keller, Bess

2006-01-01

Back when the National Board for Professional Teaching Standards was launched in 1987, most of the talk in its favor cited one overarching problem: the weakness of the teaching profession. If professional standards were better defined, if professional rewards were greater, the argument went, schools and learning would improve. These days…

Effects of Enhanced Anchored Instruction on Skills Aligned to Common Core Math Standards

ERIC Educational Resources Information Center

Bottge, Brian A.; Cho, Sun-Joo

2013-01-01

This study compared how students with learning difficulties in math (MLD) who were randomly assigned to two instructional conditions answered items on problem solving tests aligned to the Common Core State Standards Initiative for Mathematics. Posttest scores showed improvement in the math performance of students receiving Enhanced Anchored…
Mathematics Awareness through Technology, Teamwork, Engagement, and Rigor

ERIC Educational Resources Information Center

James, Laurie

2016-01-01

The purpose of this two-year observational study was to determine if the use of technology and intervention groups affected fourth-grade math scores. Specifically, the desire was to identify the percentage of students who met or exceeded grade-level standards on the state standardized test. This study indicated possible reasons that enhanced…
Hypothesis Testing Using Factor Score Regression: A Comparison of Four Methods

ERIC Educational Resources Information Center

Devlieger, Ines; Mayer, Axel; Rosseel, Yves

2016-01-01

In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and…
The Relationship between the Use of Spaced Repetition Software with a TOEIC Word List and TOEIC Score Gains

ERIC Educational Resources Information Center

Bower, Jack Victor; Rutson-Griffiths, Arthur

2016-01-01

A strong relationship between L2 vocabulary knowledge and L2 reading and listening comprehension is well established. However, less research has been conducted to explore correlations between pedagogic interventions to increase vocabulary knowledge and score gains on standardized L2 proficiency tests. This study addresses this gap in the research…
The Impact of SIM on FCAT Reading Scores of Special Education and At-Risk Students

ERIC Educational Resources Information Center

Matyo-Cepero, Jude

2013-01-01

The purpose of this study was to determine if special education and at-risk students educated exclusively in a school-within-a-school setting showed improved high-stakes standardized reading test scores after learning the strategic instruction model (SIM) inference strategy. This study was focused on four groups of eighth-grade students attending…
Fasting during Pregnancy and Children's Academic Performance. CEE DP 134

ERIC Educational Resources Information Center

Almond, Douglas; Mazumder, Bhashkar; van Ewijk, Reyn

2012-01-01

We consider the effects of daytime fasting by pregnant women during the lunar month of Ramadan on their children's test scores at age seven. Using English register data, we find that scores are 0.05 to 0.08 standard deviations lower for Pakistani and Bangladeshi students exposed to Ramadan in early pregnancy. These estimates are downward biased to…
A Computer Assisted Reading Program, Targeting Struggling Readers in a Title I Elementary School: A Program Evaluation

ERIC Educational Resources Information Center

Roy, Jon F.

2016-01-01

This study examined the effectiveness of READ 180, a computer based reading intervention on Lexile reading levels, Standardized Test of Reading Achievement (STAR) score, and attendance rates students in grades 3-5 who participated in the READ 180 program. The study compared the Scholastic Reading Inventory (SRI) scores of each group administered…
Empirical Implications of Matching Children with Specific Language Impairment to Children with Typical Development on Nonverbal IQ

ERIC Educational Resources Information Center

Earle, F. Sayako; Gallinat, Erica L.; Grela, Bernard G.; Lehto, Alexa; Spaulding, Tammie J.

2017-01-01

This study determined the effect of matching children with specific language impairment (SLI) and their peers with typical development (TD) for nonverbal IQ on the IQ test scores of the resultant groups. Studies published between January 2000 and May 2012 reporting standard nonverbal IQ scores for SLI and age-matched TD controls were categorized…
Fasting during Pregnancy and Children's Academic Performance. NBER Working Paper No. 17713

ERIC Educational Resources Information Center

Almond, Douglas; Mazumder, Bhashkar; van Ewijk, Reyn

2011-01-01

We consider the effects of daytime fasting by pregnant women during the lunar month of Ramadan on their children's test scores at age seven. Using English register data, we find that scores are 0.05 to 0.08 standard deviations lower for Pakistani and Bangladeshi students exposed to Ramadan in early pregnancy. These estimates are downward biased to…
Achievement of Elementary School Students and Attendance in Preschool Programs in Johnson County, Tennessee

ERIC Educational Resources Information Center

South, Emogene

2014-01-01

The purpose of this study was to determine if a difference in achievement scores exist between students who attended the Johnson County School System preschool program and those who did not as measured by standardized TCAP achievement test Reading/Language Arts and Math scores of students in the third and fourth grades. The variables of grade…
Out-of-School Time Program Test Score Impact for Black Children of Single-Parents

ERIC Educational Resources Information Center

Nagle, Barry T.

2013-01-01

Out-of-School Time programs and their impact on standardized college entrance exam scores for black or African-American children of single parents who have applied for a competitive college scholarship program is the study focus. Study importance is supported by the large percentage of black children raised by single parents, the large percentage…
Monitoring scale scores over time via quality control charts, model-based approaches, and time series techniques.

PubMed

Lee, Yi-Hsuan; von Davier, Alina A

2013-07-01

Maintaining a stable score scale over time is critical for all standardized educational assessments. Traditional quality control tools and approaches for assessing scale drift either require special equating designs, or may be too time-consuming to be considered on a regular basis with an operational test that has a short time window between an administration and its score reporting. Thus, the traditional methods are not sufficient to catch unusual testing outcomes in a timely manner. This paper presents a new approach for score monitoring and assessment of scale drift. It involves quality control charts, model-based approaches, and time series techniques to accommodate the following needs of monitoring scale scores: continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and assessment of autocorrelation. Performance of the methodologies is evaluated using manipulated data based on real responses from 71 administrations of a large-scale high-stakes language assessment.
Formation and assessment of a novel surgical video atlas for thyroidectomy.

PubMed

Tarpada, Sandip P; Hsueh, Wayne D; Newman, Seth B; Gibber, Marc J

2017-01-01

Within surgery, interactive media have previously been used to educate medical students and residents. Here, we develop and assess the efficacy of a novel surgical video atlas in teaching surgically relevant head and neck anatomy to medical students. A total thyroidectomy was recorded intraoperatively and subsequently narrated to develop a video atlas. Medical students were recruited and randomly assigned to one of the two interventions. One group was assigned to the video atlas, while the other was supplied with a traditional textbook atlas. Both groups underwent pre- and post- tests to evaluate anatomical knowledge and satisfaction. Thirty-seven students completed the study, with 18 students in the experimental group and 19 students as control. In the video atlas arm, mean pre and post-test scores were 57.2% and 84.5%, respectively. In the traditional textbook arm, the mean pre- and post-test scores were 55.3% and 76.51%, respectively. Students with the video atlas had a mean post-test score 8.07% points higher than those without (p = .035). Overall, students were significantly more satisfied with the surgical video atlas than with the standard traditional textbook. A surgical video atlas was shown to more effectively teach head and neck anatomy to medical students compared to standard textbook atlases.
Agreement between clinicians' and care givers' assessment of intelligence in Nigerian children with intellectual disability: 'ratio IQ' as a viable option in the absence of standardized 'deviance IQ' tests in sub-Saharan Africa

PubMed Central

Bakare, Muideen O; Ubochi, Vincent N; Okoroikpa, Ifeoma N; Aguocha, Chinyere M; Ebigbo, Peter O

2009-01-01

Background There may be need to assess intelligent quotient (IQ) scores in sub-Saharan African children with intellectual disability, either for the purpose of educational needs assessment or research. However, modern intelligence scales developed in the western parts of the world suffer limitation of widespread use because of the influence of socio-cultural variations across the world. This study examined the agreement between IQ scores estimation among Nigerian children with intellectual disability using clinicians' judgment based on International Classification of Diseases, tenth Edition (ICD - 10) criteria for mental retardation and caregivers judgment based on 'ratio IQ' scores calculated from estimated mental age in the context of socio-cultural milieu of the children. It proposed a viable option of IQ score assessment among sub-Saharan African children with intellectual disability, using a ratio of culture-specific estimated mental age and chronological age of the child in the absence of standardized alternatives, borne out of great diversity in socio-cultural context of sub-Saharan Africa. Methods Clinicians and care-givers independently assessed the children in relation to their socio-cultural background. Clinicians assessed the IQ scores of the children based on the ICD - 10 diagnostic criteria for mental retardation. 'Ratio IQ' scores were calculated from the ratio of estimated mental age and chronological age of each child. The IQ scores as assessed by the clinicians were then compared with the 'ratio IQ' scores using correlation statistics. Results A total of forty-four (44) children with intellectual disability were assessed. There was a significant correlation between clinicians' assessed IQ scores and the 'ratio IQ' scores employing zero order correlation without controlling for the chronological age of the children (r = 0.47, df = 42, p = 0.001). First order correlation controlling for the chronological age of the children showed higher correlation score between clinicians' assessed IQ scores and 'ratio IQ' scores (r = 0.75, df = 41, p = 0.000). Conclusion Agreement between clinicians' assessed IQ scores and 'ratio IQ' scores was good. 'Ratio IQ' test would provide a viable option of assessing IQ scores in sub-Saharan African children with intellectual disability in the absence of culture-appropriate standardized intelligence scales, which is often the case because of great diversity in socio-cultural structures of sub-Saharan Africa. PMID:19754953
Methodology for the development of normative data for Spanish-speaking pediatric populations.

PubMed

Rivera, D; Arango-Lasprilla, J C

2017-01-01

To describe the methodology utilized to calculate reliability and the generation of norms for 10 neuropsychological tests for children in Spanish-speaking countries. The study sample consisted of over 4,373 healthy children from nine countries in Latin America (Chile, Cuba, Ecuador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Puerto Rico) and Spain. Inclusion criteria for all countries were to have between 6 to 17 years of age, an Intelligence Quotient of≥80 on the Test of Non-Verbal Intelligence (TONI-2), and score of <19 on the Children's Depression Inventory. Participants completed 10 neuropsychological tests. Reliability and norms were calculated for all tests. Test-retest analysis showed excellent or good- reliability on all tests (r's>0.55; p's<0.001) except M-WCST perseverative errors whose coefficient magnitude was fair. All scores were normed using multiple linear regressions and standard deviations of residual values. Age, age2, sex, and mean level of parental education (MLPE) were included as predictors in the models by country. The non-significant variables (p > 0.05) were removed and the analysis were run again. This is the largest Spanish-speaking children and adolescents normative study in the world. For the generation of normative data, the method based on linear regression models and the standard deviation of residual values was used. This method allows determination of the specific variables that predict test scores, helps identify and control for collinearity of predictive variables, and generates continuous and more reliable norms than those of traditional methods.
Self-Discipline Gives Girls the Edge: Gender in Self-Discipline, Grades, and Achievement Test Scores

ERIC Educational Resources Information Center

Duckworth, Angela Lee; Seligman, Martin E. P.

2006-01-01

Throughout elementary, middle, and high school, girls earn higher grades than boys in all major subjects. Girls, however, do not out perform boys on achievement or IQ tests. To date, explanations for the underprediction of girls' GPAs by standardized tests have focused on gender differences favoring boys on such tests. The authors' investigation…
Racial Differences in Test Preparation Strategies: A Commentary on "Shadow Education, American Style: Test Preparation, the SAT and College Enrollment"

ERIC Educational Resources Information Center

Alon, Sigal

2010-01-01

Claudia Buchmann, Dennis Condron and Vincent Roscigno's study, titled "Shadow Education, American Style: Test Preparation, the SAT and College Enrollment," demonstrates that vigorous use of expensive test preparation tools, such as private classes and tutors, significantly boosts scores on standardized exams such as the SAT or ACT. This…
The Nature of the Average Difference Between Whites and Blacks on Psychometric Tests: Spearman's Hypothesis.

ERIC Educational Resources Information Center

Jensen, Arthur R.

Charles Spearman originally suggested in 1927 that the varying magnitudes of the mean differences between whites and blacks in standardized scores on a variety of mental tests are directly related to the size of the tests' loadings on g, the general factor common to all complex tests of mental ability. Several independent large-scale studies…
Confirmatory Factor Analysis of the Test of Gross Motor Development-2

ERIC Educational Resources Information Center

Wong, Ka Yee Allison; Cheung, Siu Yin

2010-01-01

The purpose of this study was to examine the underlying structure of the second edition of the Test of Gross Motor Development-2 (Ulrich, 2000) as applied to Chinese children. The Test of Gross Motor Development-2 was administered to 626 Hong Kong Chinese children. The outlier test with standard scoring was utilized. After data screening, a total…
Why Has High-Stakes Testing So Easily Slipped into Contemporary American Life?

ERIC Educational Resources Information Center

Nichols, Sharon L.; Berliner, David C.

2008-01-01

High-stakes testing is the practice of attaching important consequences to standardized test scores, and it is the engine that drives the No Child Left Behind (NCLB) Act. The rationale for high-stakes testing is that the promise of rewards and the threat of punishments will cause teachers to work more effectively, students to be more motivated,…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.