Chevalier, Shirley A.
In conventional practice, most educators and educational researchers score cognitive tests using a dichotomous right-wrong scoring system. Although simple and straightforward, this method does not take into consideration other factors, such as partial knowledge or guessing tendencies and abilities. This paper discusses alternative scoring models:…
Matton, Nadine; Vautier, Stephane; Raufaste, Eric
Mean gain scores for cognitive ability tests between two sessions in a selection setting are now a robust finding, yet not fully understood. Many authors do not attribute such gain scores to an increase in the target abilities. Our approach consists of testing a longitudinal SEM model suitable to this view. We propose to model the scores' changes…
Hansen, Karsten; Heckman, James J.; Mullen, Kathleen J.
This study developed two methods for estimating the effect of schooling on achievement test scores that control for the endogeneity of schooling by postulating that both schooling and test scores are generated by a common unobserved latent ability. The methods were applied to data on schooling and test scores. Estimates from the two methods are in…
Strand, Steve; Deary, Ian J.; Smith, Pauline
Background and aims: There is uncertainty about the extent or even existence of sex differences in the mean and variability of reasoning test scores ( Jensen, 1998; Lynn, 1994, ; Mackintosh, 1996). This paper analyses the Cognitive Abilities Test (CAT) scores of a large and representative sample of UK pupils to determine the extent of any sex…
Lowry, Stephen R.
The effects of luck and misinformation on ability of multiple-choice test scores to estimate examinee ability were investigated. Two measures of examinee ability were defined. Misinformation was shown to have little effect on ability of raw scores and a substantial effect on ability of corrected-for-guessing scores to estimate examinee ability.…
Walton, Gregory M; Spencer, Steven J
Past research has assumed that group differences in academic performance entirely reflect genuine differences in ability. In contrast, extending research on stereotype threat, we suggest that standard measures of academic performance are biased against non-Asian ethnic minorities and against women in quantitative fields. This bias results not from the content of performance measures, but from the context in which they are assessed-from psychological threats in common academic environments, which depress the performances of people targeted by negative intellectual stereotypes. Like the time of a track star running into a stiff headwind, such performances underestimate the true ability of stereotyped students. Two meta-analyses, combining data from 18,976 students in five countries, tested this latent-ability hypothesis. Both meta-analyses found that, under conditions that reduce psychological threat, stereotyped students performed better than nonstereotyped students at the same level of past performance. We discuss implications for the interpretation of and remedies for achievement gaps. PMID:19656335
Rich, John D., Jr.; Fullard, William; Overton, Willis
One Hundred and Twelve Latino students from Philadelphia participated in this study, which examined the development of deductive reasoning across adolescence, and the relation of reasoning to test anxiety and standardized test scores. As predicted, 11th and ninth graders demonstrated significantly more advanced reasoning than seventh graders.…
Ramos, Erica; Alfonso, Vincent C.; Schermerhorn, Susan M.
The interpretation of cognitive test scores often leads to decisions concerning the diagnosis, educational placement, and types of interventions used for children. Therefore, it is important that practitioners administer and score cognitive tests without error. This study assesses the frequency and types of examiner errors that occur during the…
Jones, Tracy Anne
Researchers are increasingly aware of the role of spatial skills in preparing children for future mathematics achievement (National Mathematics Advisory Panel, 2008). In addition, sex differences have been consistently documented showing boys score higher than girls in assessments of spatial ability, particularly mental rotation (Linn & Peterson,…
Jones, Gwen E.; Ree, Malcolm James
This study tested the specificity-generality hypothesis regarding moderation of aptitude test validity by job ability requirement differences using 24,482 Air Force enlistees in 37 jobs. Moderating effects due to job differences were not found, and job ability differences did not moderate the relationship between the amount of "g" measured by a…
Khasu, Denis S.; Williams, Thomas O., Jr.
In this brief article, the reliability of scores for the Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults (DAP: IQ; Reynolds & Hickman, 2004) was examined through several analyses with a sample of 147 children from rural Malawi, Africa using a Chichewa translation of instructions. Cronbach alpha coefficients for…
Legg, Sue M.; Ware, William B.
Student and test characteristics were examined by multiple regression analysis and discriminant function analysis to explain why 171 political science undergraduates scored differently on essay versus objective final examinations. Student characteristics included: (1) patterns of creative, crystallized, and fluid abilities as measured by the…
Hofer, Manfred; Kuhnle, Claudia; Kilian, Britta; Fries, Stefan
The predictive power of cognitive ability and self-control strength for self-reported grades and an achievement test were studied. It was expected that the variables use of time structure, academic procrastination, and motivational interference during learning further aid in predicting students' achievement because they are operative in situations…
Jones, Dorothy L.
A verbal concept-learning task permitting the externalizing and quantifying of learning behavior and 16 ability tests were administered to female graduate students. Data were analyzed by alpha factor analysis and incomplete image analysis. Six alpha factors and 12 image factors were extracted and orthogonally rotated. Four areas of cognitive…
Berry, Christopher M; Zhao, Peng
Predictive bias studies have generally suggested that cognitive ability test scores overpredict job performance of African Americans, meaning these tests are not predictively biased against African Americans. However, at least 2 issues call into question existing over-/underprediction evidence: (a) a bias identified by Aguinis, Culpepper, and Pierce (2010) in the intercept test typically used to assess over-/underprediction and (b) a focus on the level of observed validity instead of operational validity. The present study developed and utilized a method of assessing over-/underprediction that draws on the math of subgroup regression intercept differences, does not rely on the biased intercept test, allows for analysis at the level of operational validity, and can use meta-analytic estimates as input values. Therefore, existing meta-analytic estimates of key parameters, corrected for relevant statistical artifacts, were used to determine whether African American job performance remains overpredicted at the level of operational validity. African American job performance was typically overpredicted by cognitive ability tests across levels of job complexity and across conditions wherein African American and White regression slopes did and did not differ. Because the present study does not rely on the biased intercept test and because appropriate statistical artifact corrections were carried out, the present study's results are not affected by the 2 issues mentioned above. The present study represents strong evidence that cognitive ability tests generally overpredict job performance of African Americans. PMID:25150378
Allen, Denise A.
Little empirical evidence suggested that independent reading abilities of students enrolled in biology predicted their performance on the Biology I Graduation End-of-Course Assessment (ECA). An archival study was conducted at one Indiana urban public high school in Indianapolis, Indiana, by examining existing educational assessment data to test…
Bing, Mark N.; Stewart, Susan M.; Davison, H. Kristl
Handheld calculators have been used on the job for more than 30 years, yet the degree to which these devices can affect performance on employment tests of mathematical ability has not been thoroughly examined. This study used a within-subjects research design (N = 167) to investigate the effects of calculator use on test score reliability, test…
Allen, S; Ragab, S
Clinical observations have shown that some older patients are unable to learn to use a metered dose inhaler (MDI) despite having a normal abbreviated mental test (AMT) score, possibly because of dyspraxia or unrecognised cognitive impairment. Thirty inhaler-naive inpatients (age 76–94) with an AMT score of 8–10 (normal) were studied. Standard MDI training was given and the level of competence reached was scored (inhalation score). A separate observer performed the minimental test (MMT), Barthel index, geriatric depression score (GDS), ideational dyspraxia test (IDT), and ideomotor dyspraxia test (IMD). No correlative or threshold relationship was found between inhalation score and Barthel index, GDS, or IDT. However, a significant correlation was found between inhalation score and IMD (r = 0.45, p = 0.039) and MMT (r = 0.48, p = 0.032) and threshold effects emerged in that no subject with a MMT score of less than 23/30 had an inhalation score of 5/10 or more (adequate technique requires 6/10 or more), and all 17/18 with an inhalation score of 6/10 or more had an IMD of 14/20 or more. The three patients with a MMT >22 and inhalation score <6 had abnormal IMD scores. Inability to learn an adequate inhaler technique in subjects with a normal AMT score appears to be due to unrecognised cognitive impairment or dyspraxia. The MMT is probably a more useful screening test than the AMT score in this context. PMID:11796871
A study of 1,283 academically talented junior high students found that males had higher scores on three of the four subtests of the Spatial Test Battery of the Institute for the Academic Advancement of Youth. Females scored higher on the visual memory test and spent more time on the tests. (Author/CR)
Weigle, Sara Cushing
Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study approaches validity by comparing human and automated scores on responses to…
Crowson, H. Michael; DeBacker, Teresa K.; Thoma, Stephen J.
Emler, Renwick, and Malone (1983) argued against a developmental interpretation of the Defining Issues Test (DIT), suggesting instead that it actually measures a social psychological phenomenon--political identification. On the other hand, Sanders, Lubinski, and Benbow (1995) have argued that DIT scores measure intellectual ability. In this study,…
Reeve, Charlie L.; Bonaccio, Silvia
Claims of changes in the validity coefficients associated with general mental ability (GMA) tests due to the passage of time (i.e., temporal validity degradation) have been the focus of an on-going debate in applied psychology. To evaluate whether and, if so, under what conditions this degradation may occur, we integrate evidence from multiple…
de la Torre, Jimmy
Recent work has shown that multidimensionally scoring responses from different tests can provide better ability estimates. For educational assessment data, applications of this approach have been limited to binary scores. Of the different variants, the de la Torre and Patz model is considered more general because implementing the scoring procedure…
IN ORDER TO ESTABLISH THE FEASIBILITY OF A CUT-OFF SCORE FOR ENTRANCE INTO TEACHER EDUCATION PROGRAMS AT NORTH TEXAS STATE UNIVERSITY, SCORES OF 1,346 STUDENTS WHO EITHER PLACED ABOVE THE 80TH PERCENTILE (N-672) OR BELOW THE 20TH PERCENTILE (N-674) ON EITHER THE SCHOOL AND COLLEGE ABILITY TEST OR THE WATSON-GLASER TEST OF CRITICAL THINKING WERE…
Moody, M. Suzanne
Whether or not fluctuations in spatial ability as measured by S. G. Vandenberg's Mental Rotations Test occur during the menstrual cycle was studied with 133 female students from 9 undergraduate educational psychology and nursing classes. For comparison, 28 male students also took the test. Scores from 55 females fell into the relevant menstrual…
van der Linden, Wim J.
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…
What strategies can improve test scores? According to research done by Amrein and Berliner, who studied 18 states with high stakes testing, their conclusion was that students did not necessarily score higher and often remained at the same level prior to the introduction of the high stakes testing. In other research done by Carnoy and Loeb, their…
Education Digest: Essential Readings Condensed for Quick Review, 2004
This article presents an adaptation of an article from School Board News, January 6, 2004 edition. The article describes the effort of de-tracking students of varying ability levels, made by officials of South Side High School, in Rockville Centre, New York, and Noble High School, in North Berwick, Maine. Officials from both schools say that the…
McIntosh, James; Munk, Martin D.
Latent class Poisson count models are used to analyse a sample of Danish test score results from a cohort of individuals born in 1954-1955, tested in 1968, and followed until 2011. The procedure takes account of unobservable effects as well as excessive zeros in the data. We show that the test scores measure manifest or measured ability as it has…
Hopkins, W G; Green, J R
Simulation was used to investigate the validities of nine measures of ability derived from scores of two or more competitive events. The measures were: raw means and least-squares means of raw scores, z scores, and normal scores; two measures derived from ranked scores; and the "personal-best" raw score. Simulations were performed for different numbers of competitors, events, and event entries, each for a range of validity of performance in a single event. A complete set of simulations was repeated for each of the following conditions: normal distribution of competitors' ability; skewed distribution of ability; event validity related to ability; validity, ability, and spread of scores differing between events; and events differing in difficulty. The raw mean of raw scores was generally the most valid measure. The personal best was comparable to the mean only when the number of entries approached one per competitor. The least-squares mean of raw scores had highest validity when events differed substantially in difficulty; it should therefore be used when events differ in length, or when event scores are affected by environmental conditions, judging bias, or by uneven matching of competitors in match-play sports. PMID:7791592
Chuderski, Adam; Andrelczyk, Krzysztof
Several existing computational models of working memory (WM) have predicted a positive relationship (later confirmed empirically) between WM capacity and the individual ratio of theta to gamma oscillatory band lengths. These models assume that each gamma cycle represents one WM object (e.g., a binding of its features), whereas the theta cycle integrates such objects into the maintained list. As WM capacity strongly predicts reasoning, it might be expected that this ratio also predicts performance in reasoning tasks. However, no computational model has yet explained how the differences in the theta-to-gamma ratio found among adult individuals might contribute to their scores on a reasoning test. Here, we propose a novel model of how WM capacity constraints figural analogical reasoning, aimed at explaining inter-individual differences in reasoning scores in terms of the characteristics of oscillatory patterns in the brain. In the model, the gamma cycle encodes the bindings between objects/features and the roles they play in the relations processed. Asynchrony between consecutive gamma cycles results from lateral inhibition between oscillating bindings. Computer simulations showed that achieving the highest WM capacity required reaching the optimal level of inhibition. When too strong, this inhibition eliminated some bindings from WM, whereas, when inhibition was too weak, the bindings became unstable and fell apart or became improperly grouped. The model aptly replicated several empirical effects and the distribution of individual scores, as well as the patterns of correlations found in the 100-people sample attempting the same reasoning task. Most importantly, the model's reasoning performance strongly depended on its theta-to-gamma ratio in same way as the performance of human participants depended on their WM capacity. The data suggest that proper regulation of oscillations in the theta and gamma bands may be crucial for both high WM capacity and effective complex
Floyd, Randy G.; Bergeron, Renee; McCormack, Allison C.; Anderson, Janice L.; Hargrove-Owens, Gabrielle L.
Many school psychologists use the Cattell-Horn-Carroll (CHC) theory of cognitive abilities to guide their interpretation of scores from intelligence test batteries. Some may frequently assume that composite scores purported to measure the same CHC broad abilities should be relatively similar for individuals no matter what subtests or batteries…
Park, Wan Beom; Kang, Seok Hoon; Lee, Yoon-Seong
Abstract: Background: Clinical reasoning ability is an important factor in a physician's competence and thus should be taught and tested in medical schools. Medical schools generally use objective structured clinical examinations (OSCE) to measure the clinical competency of medical students. However, it is unknown whether OSCE can also evaluate clinical reasoning ability. In this study, the authors investigated whether OSCE scores reflected students' clinical reasoning abilities. Methods: Sixty-five fourth-year medical students participated in this study. Medical students completed the OSCE with 4 cases using standardized patients. For assessment of clinical reasoning, students were asked to list differential diagnoses and the findings that were compatible or not compatible with each diagnosis. The OSCE score (score of patient encounter), diagnostic accuracy score, clinical reasoning score, clinical knowledge score and grade point average (GPA) were obtained for each student, and correlation analysis was performed. Results: Clinical reasoning score was significantly correlated with diagnostic accuracy and GPA (correlation coefficient = 0.258 and 0.380; P = 0.038 and 0.002, respectively) but not with OSCE score or clinical knowledge score (correlation coefficient = 0.137 and 0.242; P = 0.276 and 0.052, respectively). Total OSCE score was not significantly correlated with clinical knowledge test score, clinical reasoning score, diagnostic accuracy score or GPA. Conclusions: OSCE score from patient encounters did not reflect the clinical reasoning abilities of the medical students in this study. The evaluation of medical students' clinical reasoning abilities through OSCE should be strengthened. PMID:25647834
Levin, Henry M.
Around the world we hear considerable talk about creating world-class schools. Usually the term refers to schools whose students get very high scores on the international comparisons of student achievement such as PISA or TIMSS. The practice of restricting the meaning of exemplary schools to the narrow criterion of achievement scores is usually…
Quereshi, M. Y.; Veeser, William R.
Investigates the influence of various scoring cutoffs on mental test performance as measured by the Michell General Ability Test (MGAT) and develops a rationale for selecting the optimum cutoff based on raw scores, internal consistency, stability, parallel-form reliability and concurrent validity estimates. (MB)
Kolen, Michael J.
Estimation/smoothing methods that are flexible enough to fit a wide variety of test score distributions are reviewed: kernel method, strong true-score model-based method, and method that uses polynomial log-linear models. Applications of these methods include describing/comparing test score distributions, estimating norms, and estimating…
Notenboom, Kim; Vromans, Herman; Schipper, Maarten; Leufkens, Hubert G. M.; Bouvy, Marcel L.
Background: Practical problems with the use of medicines, such as difficulties with breaking tablets, are an often overlooked cause for non-adherence. Tablets frequently break in uneven parts and loss of product can occur due to crumbling and powdering. Health characteristics, such as the presence of peripheral neuropathy, decreased grip strength and manual dexterity, can affect a patient's ability to break tablets. As these impairments are associated with aging and age-related diseases, such as Parkinson's disease and arthritis, difficulties with breaking tablets could be more prevalent among older adults. The objective of this study was to investigate the relationship between age and the ability to break scored tablets. Methods: A comparative study design was chosen. Thirty-six older adults and 36 young adults were systematically observed with breaking scored tablets. Twelve different tablets were included. All participants were asked to break each tablet by three techniques: in between the fingers with the use of nails, in between the fingers without the use of nails and pushing the tablet downward with one finger on a solid surface. It was established whether a tablet was broken or not, and if broken, whether the tablet was broken accurately or not. Results: The older adults experienced more difficulties to break tablets compared to the young adults. On average, the older persons broke 38.1% of the tablets, of which 71.0% was broken accurately. The young adults broke 78.2% of the tablets, of which 77.4% was broken accurately. Further analysis by mixed effects logistic regression revealed that age was associated with the ability to break tablets, but not with the accuracy of breaking. Conclusions: Breaking scored tablets by hand is less successful in an elderly population compared to a group of young adults. Health care providers should be aware that tablet breaking is not appropriate for all patients and for all drugs. In case tablet breaking is unavoidable, a
Trimble, Susan; Gay, Anne; Matthews, Jan
Advances in technology available to access test data coupled with the challenges of No Child Left Behind (NCLB) are pushing schools to grapple with the complexities of test score data. With the current frenzy to raise test scores, there is little attention being paid to teacher development in learning to use data to improve learning. For the past…
Pandiani, John A.; Simon, Monica M.; Banks, Steven M.
This paper reports on an ongoing effort of the Vermont Mental Health Performance Indicator Project (PIP) to examine the relevance and utility of standardized test scores for evaluating community mental health programs. This analysis is of test scores from Vermont's first four years of statewide testing. The study is examining anonymous…
Hunter, William J.
An essential function of the school guidance worker is the translation of test results into plain language and/or concrete recommendations. To do so requires a thorough understanding of the various test scores publishers provide. (Author)
The paper investigates if the provision of financial incentives has an impact on the performance of students in educational tests. The analysis is based on data from an experiment with high school students who answered multiple-choice items from the Third International Mathematics and Science Study (TIMSS). As in TIMSS, the setup did not…
Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G.
This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…
Weinstein, Lawrence; Laverghetta, Antonio; Alexander, Ralph; Stewart, Megan
The current study is an extension of a previous investigation dealing with teacher greetings to students. The present investigation used teacher greetings with college students and academic performance (test scores). We report data using university students and in-class test performance. Students in introductory psychology who received teachers'…
Śmieja, Magdalena; Orzechowski, Jarosław; Stolarski, Maciej S.
The Test of Emotional Intelligence (TIE) is a new ability scale based on a theoretical model that defines emotional intelligence as a set of skills responsible for the processing of emotion-relevant information. Participants are provided with descriptions of emotional problems, and asked to indicate which emotion is most probable in a given situation, or to suggest the most appropriate action. Scoring is based on the judgments of experts: professional psychotherapists, trainers, and HR specialists. The validation study showed that the TIE is a reliable and valid test, suitable for both scientific research and individual assessment. Its internal consistency measures were as high as .88. In line with theoretical model of emotional intelligence, the results of the TIE shared about 10% of common variance with a general intelligence test, and were independent of major personality dimensions. PMID:25072656
Miller, Steven C.
The Wyoming Department of Education (WDE) has invested time and money developing standardized achievement test score reports designed to give teachers data about each of their students' levels of mastery of particular concepts in order to differentiate their instruction. The purpose of this study was to determine the extent to which…
Singapore students have scored exceedingly well on international tests in mathematics. In response, there has been a desire in the United States--both at the policy level and at the school level--to emulate Singapore. Because what can be identified most easily about Singapore's school mathematics can be gleaned from curriculum documents from the…
Bracey, Gerald W.
Examines correlation between national test scores in mathematics from the Third International Mathematics and Science Study (TIMSS) and the Current Competitiveness Index (CCI). Finds, for example, that while the United States ranks 29th in TIMSS mathematics, it ranks second in competitiveness on the CCI. Korea ranks 3rd in mathematics, but 27th in…
Smith, Vernon G.; Szymanski, Antonia
This article is for practicing or aspiring school administrators. The demand for excellence in public education has lead to an emphasis on standardized test scores. This article explores the development of a professional enhancement program designed to prepare teachers to teach higher order thinking skills. Higher order thinking is the primary…
Jencks, Christopher, Ed.; Phillips, Meredith, Ed.
The 15 chapters of this book address issues related to the continuing test score gap between black and white students. The editors argue against traditional explanations which emphasize differences in economic resources and demographic factors, and they urge that more emphasis be put on psychological and cultural factors. The book suggests studies…
According to the 2004 National Assessment of Educational Progress, males who have made it through 12 years of school have significantly poorer reading skills than their female peers. In every age group, boys have been scoring lower than girls annually for more than three decades on U.S. Department of Education reading tests. The longer boys are in…
The prevalence of childhood overweight and obesity increased dramatically in the United States during the past three decades. This increase has adverse public health implications, but its implication for children's academic outcomes is less clear. This paper uses data from five waves of the Early Childhood Longitudinal Study-Kindergarten to examine how children's weight is related to their scores on standardized tests and to their teachers' assessments of their academic ability. The results indicate that children's weight is more negatively related to teacher assessments of their academic performance than to test scores. PMID:24014932
Margolis, Amy; Bansal, Ravi; Hao, Xuejun; Algermissen, Molly; Erickson, Cole; Klahr, Kristin W; Naglieri, Jack A; Peterson, Bradley S
The underlying neural determinants of general intelligence have been studied intensively, and seem to derive from the anatomical and functional characteristics of a frontoparietal network. Little is known, however, about the underlying neural correlates of domain-specific cognitive abilities, the other factors hypothesized to explain individual performance on intelligence tests. Previous preliminary studies have suggested that spatially distinct neural structures do not support domain-specific cognitive abilities. To test whether differences between abilities that affect performance on verbal and performance tasks derive instead from the morphological features of a single anatomical network, we assessed in two independent samples of healthy human participants (N=83 and N=58; age range, 5-57 years) the correlation of cortical thickness with the magnitude of the verbal intelligence quotient (VIQ)-performance intelligence quotient (PIQ) discrepancy. We operationalized the VIQ-PIQ discrepancy by regressing VIQ onto PIQ (VIQ-regressed-on-PIQ score), and by regressing PIQ onto VIQ (PIQ-regressed-on-VIQ score). In both samples, a progressively thinner cortical mantle in anterior and posterior regions bilaterally was associated with progressively greater (more positive) VIQ-regressed-on-PIQ scores. A progressively thicker cortical mantle in anterior and posterior regions bilaterally was associated with progressively greater (more positive) PIQ-regressed-on-VIQ scores. Variation in cortical thickness in these regions accounted for a large portion of the overall variance in magnitude of the VIQ-PIQ discrepancy. The degree of hemispheric asymmetry in cortical thickness accounted for a much smaller but statistically significant portion of variance in VIQ-PIQ discrepancy. PMID:23986248
Margolis, Amy; Bansal, Ravi; Hao, Xuejun; Algermissen, Molly; Erickson, Cole; Klahr, Kristin W.; Naglieri, Jack A.
The underlying neural determinants of general intelligence have been studied intensively, and seem to derive from the anatomical and functional characteristics of a frontoparietal network. Little is known, however, about the underlying neural correlates of domain-specific cognitive abilities, the other factors hypothesized to explain individual performance on intelligence tests. Previous preliminary studies have suggested that spatially distinct neural structures do not support domain-specific cognitive abilities. To test whether differences between abilities that affect performance on verbal and performance tasks derive instead from the morphological features of a single anatomical network, we assessed in two independent samples of healthy human participants (N = 83 and N = 58; age range, 5–57 years) the correlation of cortical thickness with the magnitude of the verbal intelligence quotient (VIQ)-performance intelligence quotient (PIQ) discrepancy. We operationalized the VIQ-PIQ discrepancy by regressing VIQ onto PIQ (VIQ-regressed-on-PIQ score), and by regressing PIQ onto VIQ (PIQ-regressed-on-VIQ score). In both samples, a progressively thinner cortical mantle in anterior and posterior regions bilaterally was associated with progressively greater (more positive) VIQ-regressed-on-PIQ scores. A progressively thicker cortical mantle in anterior and posterior regions bilaterally was associated with progressively greater (more positive) PIQ-regressed-on-VIQ scores. Variation in cortical thickness in these regions accounted for a large portion of the overall variance in magnitude of the VIQ-PIQ discrepancy. The degree of hemispheric asymmetry in cortical thickness accounted for a much smaller but statistically significant portion of variance in VIQ-PIQ discrepancy. PMID:23986248
Dykiert, Dominika; Deary, Ian J
In order to assess the degree of cognitive decline resulting from a pathological state, such as dementia, or from a normal aging process, it is necessary to know or to have a valid estimate of premorbid (or prior) cognitive ability. The National Adult Reading Test (NART; Nelson & Willison, 1991) and the Wechsler Test of Adult Reading (WTAR; Psychological Corporation, 2001) are 2 tests developed to estimate premorbid or prior ability. Due to the rarity of actual prior ability data, validation studies usually compare NART/WTAR performance with measures of current abilities in pathological and nonpathological groups. In this study, we validate the use of WTAR scores and extend the validation of the use of NART scores as estimates of prior ability, vis-à-vis the actual prior (childhood) cognitive ability. We do this in a large sample of healthy older people, the Lothian Birth Cohort 1936 (Deary, Gow, Pattie, & Starr, 2012; Deary et al., 2007). Both NART and WTAR scores were correlated with cognitive ability tested in childhood (r = .66-.68). Scores on both the NART and the WTAR had high stability over a period of 3 years in old age (r in excess of .90) and high interrater reliability. The NART accounted for more unique variance in childhood intelligence than did the WTAR. PMID:23815111
Chapelle, Carol A.; Chung, Yoo-Ree; Hegelheimer, Volker; Pendar, Nick; Xu, Jing
This study piloted test items that will be used in a computer-delivered and scored test of productive grammatical ability in English as a second language (ESL). Findings from research on learners' development of morphosyntactic, syntactic, and functional knowledge were synthesized to create a framework of grammatical features. We outline the…
Austin, Elizabeth J
Emotional intelligence (EI) has attracted considerable interest amongst both individual differences researchers and those in other areas of psychology who are interested in how EI relates to criteria such as well-being and career success. Both trait (self-report) and ability EI measures have been developed; the focus of this paper is on ability EI. The associations of two new ability EI tests with psychometric intelligence, emotion perception, and the Mayer-Salovey-Caruso EI test (MSCEIT) were examined. The new EI tests were the Situational Test of Emotion Management (STEM) and the Situational Test of Emotional Understanding (STEU). Only the STEU and the MSCEIT Understanding Emotions branch were significantly correlated with psychometric intelligence, suggesting that only understanding emotions can be regarded as a candidate new intelligence component. These understanding emotions tests were also positively correlated with emotion perception tests, and STEM and STEU scores were positively correlated with MSCEIT total score and most branch scores. Neither the STEM nor the STEU were significantly correlated with trait EI tests, confirming the distinctness of trait and ability EI. Taking the present results as a starting-point, approaches to the development of new ability EI tests and models of EI are suggested. PMID:19843352
Schrader, William B.
This report provides information on test development, test administration, and score interpretation for the Graduate Management Admission Test (GMAT). The GMAT, first administered in 1954, provides objective measures of an applicant's abilities for use in admissions decisions by graduate management schools. It is currently composed of five…
The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…
Spencer, Bruce D.
Because test scores are ordinal not cordinal attributes, the average test score often is a misleading way to summarize the scores of a group of individuals. Similarly, correlation coefficients may be misleading summary measures of association between test scores. Proper, readily interpretable, summary statistics are developed from a theory of…
Kane, Michael T.
To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…
Napier, John D.
The report describes two experiments involving the ability of preservice social studies teachers to stage score moral thought statements. Stage scoring is defined as keeping a record of statements in accordance with the stages of moral development originated by psychologist Lawrence Kohlberg. The two experiments involved the use of three stage…
Napier, John D.
The study examined (1) whether 60 elementary school teachers could score moral thought statements into Kohlberg's moral stages by receiving special training and using a rater manual, and (2) what factors were related to their stage-scoring ability. Major conclusion was that the rater manual and training were ineffective. (Author/ND)
Sachar, Jane; Suppes, Patrick
The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
Kuentzel, Jeffrey G.; Hetterscheidt, Lesley A.; Barnett, Douglas
The rigors of standardized testing make for numerous opportunities for examiner error, including simple computational mistakes in scoring. Although experts recommend that test scoring be double-checked, the extent to which independent double-checking would reduce scoring errors is not known. A double-checking procedure was established at a…
According to 1999 data from the Centers for Disease Control and Prevention, traumatic brain injuries (TBI) caused by motor vehicle accidents, firearms, and falls are recorded as a leading cause of death and lifelong disability for young adults in the United States. Researchers have investigated if correlations exist between variables in the acute stage of injury and outcome measures in TBI patients. The Glasgow Coma Scale (GCS) score is one variable that was extensively studied for its ability to predict outcome in TBI patients. However, the use of different designs and methodologies in these studies makes the interpretation of the cumulative findings difficult. Therefore the purpose of this review was to provide a summary of the research findings on the ability of the GCS scores to predict outcome in TBI patients. A search was done on MEDLINE and CINAHL to identify studies that investigated the predictive ability of the GCS score. Studies that used the GCS as a variable in predicting outcome with adult patients who had sustained some type of head injury were included. GCS scores are most accurate at predicting outcome in head-injured patients when they are combined with patient age and pupillary response and when broad outcome categories are used. The motor component of the GCS yields similar prediction rates as the summed GCS score, and better prediction occurs with very high or very low GCS scores. Information about the cumulative research findings on the predictive ability of GCS scores aids nurses in providing support and education to family members during the acute stage of injury, and in coordinating the services of members of the healthcare team, which could result in improved outcomes for both patient and family. PMID:17477220
Gaddis, S Michael; Lauen, Douglas Lee
Since at least the 1960s, researchers have closely examined the respective roles of families, neighborhoods, and schools in producing the black-white achievement gap. Although many researchers minimize the ability of schools to eliminate achievement gaps, the No Child Left Behind Act (NCLB) increased pressure on schools to do so by 2014. In this study, we examine the effects of NCLB's subgroup-specific accountability pressure on changes in black-white math and reading test score gaps using a school-level panel dataset on all North Carolina public elementary and middle schools between 2001 and 2009. Using difference-in-difference models with school fixed effects, we find that accountability pressure reduces black-white achievement gaps by raising mean black achievement without harming mean white achievement. We find no differential effects of accountability pressure based on the racial composition of schools, but schools with more affluent populations are the most successful at reducing the black-white math achievement gap. Thus, our findings suggest that school-based interventions have the potential to close test score gaps, but differences in school composition and resources play a significant role in the ability of schools to reduce racial inequality. PMID:24468431
Schoeman, Scarpa; Chandratilake, Madawa
The assessment of students' ability in gross anatomy is a complex process as it involves the measurement of multiple facets. In this work, the authors developed and introduced the Anatomy Competence Score (ACS), which incorporates the three domains of anatomy teaching and assessment namely: theoretical knowledge, practical 3D application of the…
Byrd, C J; Main, R P; Makagon, M M
Gait scoring is the most popular method for assessing the walking ability of poultry species. Although inexpensive and easy to implement, gait scoring systems are often criticized for being subjective. Using a treadmill performance test we assessed whether observable differences in Pekin duck walking ability identified using a gait scoring system translated to differences in walking performance. One hundred and eighty ducks were selected using a three-category gait scoring system (GS0 = smooth gait, n = 55; GS0.5 = labored walk without easily identifiable impediment, n = 56; GS1 = obvious impediment, n = 59) and the amount of time each duck was able to sustain walking on a treadmill at a speed of 0.31 m/s was evaluated. The walking test ended when each duck met one of three elimination criteria: (1) The duck walked for a maximum time of ten minutes, (2) the duck required support from the observer's hand for more than three seconds in order to continue walking on the treadmill, or (3) the duck sat down on the treadmill and made no attempt to stand despite receiving assistance from the observer. Data were analyzed in SAS 9.4 using PROC GLM. Tukey's multiple comparison test was used to compare differences in time spent walking between gait scores. Significant differences were found between all gait scores (P < 0.05). Behavioral correlates of walking performance were investigated. Video recorded during the treadmill test was analyzed for counts of sitting, standing, and leaning behaviors. Data were analyzed in SAS 9.4 using a negative binomial model for count data. No differences were found between gait scores for counts of sitting, standing, and leaning behaviors (P > 0.05). In conclusion, the amount of time spent walking on the treadmill corresponded to gait score and was an effective measurement for quantifying Pekin duck walking ability. The test could be a valuable tool for assessing the development of walking issues or the effectiveness of
This article reports an empirical study that examined the pattern of test preparation for College English Test Band 4 (CET4) and the differential effects of test preparation practices on its scores, thereby drawing implications for CET4 score validity. Data collection involved 1,003 test takers of CET4. A pretest was administered at the beginning…
Cypress, Beulah K.
The potential of the Rasch model to develop scores, on a ratio scale, suitable for interindividual comparisons, from intact groups with disparate distribution characteristics was investigated. The specific problems studied were: (1) the effects of skewed test score distributions on the ability parameter of the Rasch measurement model; (2) the…
Biswas, Ajoy Kumar
This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
Wise, Vicki L.; Wise, Steven L.; Bhola, Dennison S.
Accountability for educational quality is a priority at all levels of education. Low-stakes testing is one way to measure the quality of education that students receive and make inferences about what students know and can do. Aggregate test scores from low-stakes testing programs are suspect, however, to the degree that these scores are influenced…
Sachar, Jane; Suppes, Patrick
It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Anandpara, Vivek; Dingman, Andrew; Jakobsson, Markus; Liu, Debin; Roinestad, Heather
We argue that phishing IQ tests fail to measure susceptibility to phishing attacks. We conducted a study where 40 subjects were asked to answer a selection of questions from existing phishing IQ tests in which we varied the portion (from 25% to 100%) of the questions that corresponded to phishing emails. We did not find any correlation between the actual number of phishing emails and the number of emails that the subjects indicated were phishing. Therefore, the tests did not measure the ability of the subjects. To further confirm this, we exposed all the subjects to existing phishing education after they had taken the test, after which each subject was asked to take a second phishing test, with the same design as the first one, but with different questions. The number of stimuli that were indicated as being phishing in the second test was, again, independent of the actual number of phishing stimuli in the test. However, a substantially larger portion of stimuli was indicated as being phishing in the second test, suggesting that the only measurable effect of the phishing education (from the point of view of the phishing IQ test) was an increased concern—not an increased ability.
Thompson, Simon M; Salmon, Lucy J; Webb, Justin M; Pinczewski, Leo A; Roe, Justin P
Consecutive patients undergoing knee arthroplasty completed questionnaires: FJS, Knee Injury and Osteoarthritis Outcome Score (KOOS) and WOMAC Score (mean 39 months after surgery), and were mailed a repeat questionnaire after 4 to 6 weeks. The test-retest reliability was almost perfect for the FJS (ICC = 0.97), and the FJS subdomains (ICC > 0.8). Convergent construct validity of the FJS was correlated with the KOOS Subscores of Quality of Life (0.63, P = 0.001), Symptom (0.33, P = 0.001), Pain (0.68, P = 0.001) and ADL (0.66, P = 0.001) and the Total WOMAC (0.70, P = 0.001). The FJS demonstrates high test-retest reliability and construct validity compared to the Normalised WOMAC and KOOS Subscales. The FJS does not demonstrate the ceiling effect of the WOMAC or KOOS pain scores so may have greater discriminatory ability following TKR. PMID:26027525
Center on Education Policy, 2010
This paper profiles Washington's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) decreased in grade 4 reading. In grade 4 math, the percentage scoring proficient on the state test decreased…
Brown, Steven M.; Walberg, Herbert J.
To examine the effect of motivational manipulated conditions on students' mathematics scores, elementary students received either ordinary standardized test instructions or special instructions (do as well as possible for themselves, parents, and teachers). Those given special instructions scored significantly higher in the test, implying that…
Integrating mathematics with family and consumer sciences (FCS) has enabled youth to pass the Minnesota 8th Grade Math Basic Skills test. The test focuses on the eight content areas: (1) problem solving with whole numbers and fractions; (2) problem solving with percentage/ratio; (3) number sense; (4) estimation; 5) measurement; (6) tables and…
Blai, Boris, Jr.
Reading test results and their interpretation are stressed because of their importance in student achievement. The Nelson-Denny Reading Test used at Harcum Junior College is a useful measuring instrument for predicting academic achievement, screening students, and diagnosing reading and learning problems. General hints for interpretation of the…
Sireci, Stephen G.; Han, Kyung T.; Wells, Craig S.
In the United States, when English language learners (ELLs) are tested, they are usually tested in English and their limited English proficiency is a potential cause of construct-irrelevant variance. When such irrelevancies affect test scores, inaccurate interpretations of ELLs' knowledge, skills, and abilities may occur. In this article, we…
de la Torre, Jimmy
For one reason or another, various sources of information, namely, ancillary variables and correlational structure of the latent abilities, which are usually available in most testing situations, are ignored in ability estimation. A general model that incorporates these sources of information is proposed in this article. The model has a general…
Hsu, Wen-Chuin; Chu, Yi-Chuan; Fung, Hon-Chung; Wai, Yau-Yau; Wang, Jiun-Jie; Lee, Jiann-Der; Chen, Yi-Chun
Mounting evidence shows that hyperhomocysteinemia is a risk factor for cognitive decline. This study enrolled subjects with normal serum levels of B12 and folate and performed thorough neuropsychological assessments to illuminate the independent role of homocysteine on cognitive functions.Participants between ages 50 and 85 were enrolled with Modified Hachinski ischemic score of <4, adequate visual and auditory acuity to allow neuropsychological testing, and good general health. Subjects with cognitive impairment resulting from secondary causes were excluded. Each of the participants completed evaluations of general intellectual function, including the Mini-Mental State Examination, Cognitive Abilities Screening Instrument, Clinical Dementia Rating, and a battery of neuropsychological assessments.This study enrolled 225 subjects (90 subjects younger than 65 years and 135 subjects aged 65 years or older). The sex proportion was similar between the 2 age groups. Years of education were significantly fewer in the elderly (7.49 ± 5.40 years) than in the young (9.76 ± 4.39 years, P = 0.001). There was no significant difference in body mass index or levels of vitamin B12 and folate between the 2 age groups. Homocysteine levels were significantly higher in the elderly group compared to the younger group (10.8 ± 2.7 vs. 9.5 ± 2.5 μmol/L, respectively, P = 0.0006). After adjusting for age, sex, and education, only the Digit Symbol Substitution (DSS) score was significantly lower in subjects with hyperhomocysteinemia (homocysteine >12 μmol/L) than those with homocysteine ≤12 μmol/L in the elderly group (DSS score: 7.1 ± 2.7 and 9.0 ± 3.0, respectively, beta = -1.6, 95% confidence interval [CI] = -2.8∼-0.5, P = 0.001) and borderline significance was noted in the combined age group (beta = -1.1, 95% CI = -2.1∼-0.1, P = 0.04). We did not find an association between hyperhomocysteinemia and other
Hsu, Wen-Chuin; Chu, Yi-Chuan; Fung, Hon-Chung; Wai, Yau-Yau; Wang, Jiun-Jie; Lee, Jiann-Der; Chen, Yi-Chun
Abstract Mounting evidence shows that hyperhomocysteinemia is a risk factor for cognitive decline. This study enrolled subjects with normal serum levels of B12 and folate and performed thorough neuropsychological assessments to illuminate the independent role of homocysteine on cognitive functions. Participants between ages 50 and 85 were enrolled with Modified Hachinski ischemic score of <4, adequate visual and auditory acuity to allow neuropsychological testing, and good general health. Subjects with cognitive impairment resulting from secondary causes were excluded. Each of the participants completed evaluations of general intellectual function, including the Mini-Mental State Examination, Cognitive Abilities Screening Instrument, Clinical Dementia Rating, and a battery of neuropsychological assessments. This study enrolled 225 subjects (90 subjects younger than 65 years and 135 subjects aged 65 years or older). The sex proportion was similar between the 2 age groups. Years of education were significantly fewer in the elderly (7.49 ± 5.40 years) than in the young (9.76 ± 4.39 years, P = 0.001). There was no significant difference in body mass index or levels of vitamin B12 and folate between the 2 age groups. Homocysteine levels were significantly higher in the elderly group compared to the younger group (10.8 ± 2.7 vs. 9.5 ± 2.5 μmol/L, respectively, P = 0.0006). After adjusting for age, sex, and education, only the Digit Symbol Substitution (DSS) score was significantly lower in subjects with hyperhomocysteinemia (homocysteine >12 μmol/L) than those with homocysteine ≤12 μmol/L in the elderly group (DSS score: 7.1 ± 2.7 and 9.0 ± 3.0, respectively, beta = −1.6, 95% confidence interval [CI] = −2.8∼−0.5, P = 0.001) and borderline significance was noted in the combined age group (beta = −1.1, 95% CI = −2.1∼−0.1, P = 0.04). We did not find an association between
Lohman, David F.; Lakin, Joni M.
Background: Strand, Deary, and Smith (2006) reported an analysis of sex differences on the Cognitive Abilities Test (CAT) for over 320,000 UK students 11-12 years old. Although mean differences were small, males were overrepresented at the upper and lower extremes of the score distributions on the quantitative and non-verbal batteries and at the…
As a public school English teacher, the author observes standardized testing season each year with a sort of grim fascination. "So this is it," she thinks as she paces around her silent classroom, peering over kids' shoulders at articles about parasailing. Line graphs tracking the rainfall in Tulsa. Parts of speech. Functions of "x." "These are…
Turnipseed, Stephan; Darling-Hammond, Linda
The number one quality business leaders look for in employees is creativity and yet the U.S. education system undermines the development of the higher-order skills that promote creativity by its dogged focus on multiple-choice tests. Stephan Turnipseed and Linda DarlingHammond discuss the kind of rich accountability system that will help students…
Recent calls for an increase in educational accountability in K-16 resulted in an uptick of low-stakes testing and, consequently, an increased need for ensuring that students' test scores are reliable and valid representations of their true ability. Focusing on accountability testing in higher education, the current program of research was…
Tanner, John R.
State test scores administered for accountability purposes are regularly used to adjust instruction in nuanced ways. This is no accident--No Child Left Behind demanded that students' scores be returned quickly to teachers in order that this might be the case, and the idea of data-driven decision making continues as one way the promise of education…
Spencer, Harry E.
Explores the comparative performance of various segments of the student sample in general chemistry courses relative to their scores on the mathematical SAT test. Results indicate that mathematical skill measured by the SAT scores is an important factor in determining grades while factors that are not important in determining grades are gender…
Huret, J L
A sperm nuclear decondensation ability test using 1% SDS + 6 mM EDTA was used to evaluate: a system of classification and nomenclature for the decondensation of nuclear chromatin; the progress of decondensation as a function of the duration of exposure to SDS/EDTA; the residual variance, or "scoring error;" the within-subject variance (N = 5); and the between-subject variance (N = 10). The process of chromatin decondensation was found to be a continuous phenomenon, but a scheme of nomenclature using four categories along with a system of data analysis using class weightings were developed. A 5-min exposure to SDS/EDTA resulted in a minimum scoring error (8.34%). The within- and between-subject variances were not significantly different from each other, but both were individually different (p less than 0.001) from the residual variance. PMID:6414391
Lyman, Howard B.
The first edition of this book was written to give information about testing to people whose work gave them access to test results, but whose training included little or nothing about the use and interpretation of tests. Later editions have been intended for a broader audience as the need for understanding what test scores really mean has…
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. PMID:25773314
Annell, Stefan; Sjöberg, Anders; Sverke, Magnus
Single scores from limited and unbalanced test batteries of cognitive ability can be ambiguous to interpret theoretically. In this study, a limited verbally and knowledge-loaded cognitive test battery, from applicants to the Swedish police academies (N = 1,344), was examined to provide foundations for the use and interpretation of test scores. Three measurement models were compared: one single factor model and two bifactor models, which decomposed the variance of the battery into orthogonal components. The models were evaluated by fit indices and omega coefficients, and then applied to the prediction of academic performance. The overall prediction of all models was similar, although specific abilities also were found to provide substantial predictive validity over and above general intelligence (g). The findings provide support for the use of single scores in applied settings (selection), but suggest that it may be more appropriate to interpret such scores as composites of substantive components, and not just as measures of g. PMID:25040205
Creighton, Susan Dabney
There is no consensus regarding the most reliable and valid scoring methods for the assessment of higher order thinking skills. Most of the research on alternative formats has focused on the scoring of writing ability. This study examined the value of different types of performance assessment scoring guides on state mandated science and social studies tests. A proportional stratified sample of raters were randomly assigned to one of four scoring groups: checklist, analytic rubric, holistic rubric, and generic rubrics. A fifth method, the weighted analytic rubric, was included by applying an algorithmic formula to the scores assigned by raters using the analytic rubric. A comparison of the mean scores for the five scoring groups suggests that there may be a difference in the way raters applied the rubric for each group. Although the literature suggests that it is possible to achieve high levels of inter-rater reliability, across forms of scoring, phi coefficients of moderate strength were obtained for three of the four constructed-response items. Results for each scoring group were compared indicating that item complexity may impact the level of inter-rate, reliability and the selection of the most reliable rubric for each discipline. Analytic rubrics appear to achieve more reliable results with less complex items. A multitrait-multimethod approach was utilized to investigate the external validity of the social studies and science tasks. As expected, there tended to be a stronger association between the PACT science constructed-response scores with scores based on science multiple-choice scores than between the science constructed-response scores and the writing ability subtest scores. A similar pattern was seen with social studies items. These results provide some evidence for the validity of the performance assessments. A post study survey completed by raters provided qualitative information regarding their thought processes and their primary focus during the
Gorney, Barbara; Maury, Marcia
The purpose of this study was to determine if a relationship exists between scores and the times that medical students choose to take a computer-administered test. The results indicate that students who choose to take a test later within a given time period tend to perform less well than students who take the test earlier. Although the magnitude…
Kopriva, Rebecca J.; Thurlow, Martha L.; Perie, Marianne; Lazarus, Sheryl S.; Clark, Amy
This article argues that test takers are as integral to determining validity of test scores as defining target content and conditioning inferences on test use. A principled sustained attention to how students interact with assessment opportunities is essential, as is a principled sustained evaluation of evidence confirming the validity or calling…
As more colleges move to "test optional" admissions policies, the debate over the utility and interpretation of standardized-test scores continues. In this article, the author interviews Daniel Koretz, a professor of education at Harvard University and author of "Measuring Up: What Educational Testing Really Tells Us". Koretz shares his thoughts…
van der Linden, Wim J.; Luecht, Richard M.
Derives a set of linear conditions of item-response functions that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly. An example illustrates the use of the model for an item pool from the Law School Admissions Test (LSAT). (SLD)
Carroll, Alexander M; Schuster, Gregory M
The aim of this study was to investigate whether there was a statistically significant positive correlation between dental students' Dental Admission Test (DAT) scores, particularly on the Perceptual Ability Test (PAT), and their performance on a dental school's competency exam. Scores from the written and clinical competency exam administered in the fall quarter of the fourth year of the curriculum at Midwestern University College of Dental Medicine-Arizona were compared to DAT scores of all 216 members of the graduating classes of 2012 and 2013. It was hypothesized that students who performed highly on one or more sections of the DAT would perform highly on the competency exam. Backward stepwise regression analyses were used to analyze the data. The results showed that the PAT scores were most strongly correlated with the competency exam scores and were a positive predictor for all three clinical sections of the exam (operative dentistry, periodontics, and endodontics). Positive predictors for the written portion of the exam were total DAT score for patient assessment and treatment planning and the DAT reading comprehension score for prosthodontics; there were no predictors for periodontics. The total variance explained by the results ranged from 4% to 15%. While statistically significant relationships were found between the students' PAT scores and clinical performance, DAT scores explained relatively little variance in the competency exam scores. According to these findings, neither the PAT nor any of the DAT components contributed to predicting these students' clinical performance. PMID:26522638
Cornelius, Marie D.; Goldschmidt, Lidush; De Genna, Natacha M.; Richardson, Gale A.; Leech, Sharon L.; Day, Richard
Objective This study investigates change in IQ scores among 290 children born to teenage mothers and identifies social, economic, and environmental variables that may be associated with change in intelligence test performance. Methods The children of 290 teenage mothers (72% African American and 28% European American) were assessed with the Stanford-Binet Intelligence Scale-4th Edition (SBIS) at ages 6 and 10. Results The mean composite score at age 6 was 84.8 and was 91.2 at age 10, an improvement of 6.4 points. Significant cross-sectional predictors at both ages 6 and 10 of higher SBIS scores were maternal cognitive ability, school grade, Caucasian ethnicity, and caregiver education. Having more children in the household significantly predicted lower SBIS scores at age 6. Higher satisfaction with maternal social support predicted higher SBIS scores at age 10. Change in IQ scores was not related to maternal socioeconomic status, social support, home environment, ethnicity, or family interactions. Custodial stability was associated with an improvement in IQ scores, while increase in caregiver depression was related to decline in IQ scores. Conclusions Our findings suggest that improvement in IQ scores of offspring of teenage mothers may be related to stability of maternal custody. More research is needed to determine the impact of the maturation of adolescent mothers' parenting and the role of early education on improvement in cognitive abilities. PMID:20495472
Elliott, Colin D.; Hale, James B.; Fiorello, Catherine A.; Dorvil, Cledicianne; Moldovan, Jaime
This study investigated the effects of broad cognitive abilities derived from the Cattell-Horn-Carroll (CHC) taxonomy, together with the effect of the general factor ("g"), on Wechsler Individual Achievement Test, Second Edition (WIAT-II) reading achievement. Structural equation modeling (SEM) and commonality analyses were applied to the…
Cottrell, Jonathan M; Newman, Daniel A; Roisman, Glenn I
In understanding the causes of adverse impact, a key parameter is the Black-White difference in cognitive test scores. To advance theory on why Black-White cognitive ability/knowledge test score gaps exist, and on how these gaps develop over time, the current article proposes an inductive explanatory model derived from past empirical findings. According to this theoretical model, Black-White group mean differences in cognitive test scores arise from the following racially disparate conditions: family income, maternal education, maternal verbal ability/knowledge, learning materials in the home, parenting factors (maternal sensitivity, maternal warmth and acceptance, and safe physical environment), child birth order, and child birth weight. Results from a 5-wave longitudinal growth model estimated on children in the NICHD Study of Early Child Care and Youth Development from ages 4 through 15 years show significant Black-White cognitive test score gaps throughout early development that did not grow significantly over time (i.e., significant intercept differences, but not slope differences). Importantly, the racially disparate conditions listed above can account for the relation between race and cognitive test scores. We propose a parsimonious 3-Step Model that explains how cognitive test score gaps arise, in which race relates to maternal disadvantage, which in turn relates to parenting factors, which in turn relate to cognitive test scores. This model and results offer to fill a need for theory on the etiology of the Black-White ethnic group gap in cognitive test scores, and attempt to address a missing link in the theory of adverse impact. PMID:25867168
Center on Education Policy, 2010
This paper profiles Maryland's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased at grades 4 and 8 in both reading and math. Average annual gains were larger on the state test than…
Center on Education Policy, 2010
This paper profiles Pennsylvania's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 8 reading and math. Average annual gains were larger on the state test than on NAEP in…
The uses and misuses of standardized test results used for program evaluation as seen by a staff member of an Elementary Secondary Education Act (ESEA) Title I Technical Assistance Center are described. In ESEA Title I, test scores are used to select students for the program. Although federal requirements do not require using standardized test…
A widely held view is that good schools are essential to a nation's international economic success and that high test scores on international tests of academic skills and knowledge indicate how good a nation's schools are. The widespread belief that good schools are an important contributor to a nation's economic success in the world is supported…
Wise, Steven L.
Whenever the purpose of measurement is to inform an inference about a student's achievement level, it is important that we be able to trust that the student's test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student's test event is not unduly influenced by construct-irrelevant factors…
Brown, Sarah Lee
The researcher interviewed two groups of eleventh grade students, in a rural Appalachian setting, who tended to score low on the state mandated high stakes/low stakes test to discover their efforts on the test, specifically in reading, and to obtain their opinions concerning the effects of a specific incentive or consequence. Before the eleventh…
Xie, Yan; Xian, Hong; Chandiramani, Pooja; Bainter, Emily; Wan, Leping; Martin, Wade H
Objective Arm exercise stress testing may be an equivalent or better predictor of mortality outcome than pharmacological stress imaging for the ≥50% for patients unable to perform leg exercise. Thus, our objective was to develop an arm exercise ECG stress test scoring system, analogous to the Duke Treadmill Score, for predicting outcome in these individuals. Methods In this retrospective observational cohort study, arm exercise ECG stress tests were performed in 443 consecutive veterans aged 64.1 (11.1) years. (mean (SD)) between 1997 and 2002. From multivariate Cox models, arm exercise scores were developed for prediction of 5-year and 12-year all-cause and cardiovascular mortality and 5-year cardiovascular mortality or myocardial infarction (MI). Results Arm exercise capacity in resting metabolic equivalents (METs), 1 min heart rate recovery (HRR) and ST segment depression ≥1 mm were the stress test variables independently associated with all-cause and cardiovascular mortality by step-wise Cox analysis (all p<0.01). A score based on the relation HRR (bpm)+7.3×METs−10.5×ST depression (0=no; 1=yes) prognosticated 5-year cardiovascular mortality with a C-statistic of 0.81 before and 0.88 after adjustment for significant demographic and clinical covariates. Arm exercise scores for the other outcome end points yielded C-statistic values of 0.77–0.79 before and 0.82–0.86 after adjustment for significant covariates versus 0.64–0.72 for best fit pharmacological myocardial perfusion imaging models in a cohort of 1730 veterans who were evaluated over the same time period. Conclusions Arm exercise scores, analogous to the Duke Treadmill Score, have good power for prediction of mortality or MI in patients who cannot perform leg exercise. PMID:26835142
Center on Education Policy, 2010
This paper profiles Massachusetts' test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 4 reading and math and grade 8 math. Average annual gains were larger on the state test…
Center on Education Policy, 2010
This paper profiles Texas' test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in reading at grades 4 and 8 and in math at grade 8. In grade 4 math, however, the percentage scoring…
Weiss, David J.
Three and one-half years of research on computerized ability testing are summarized. The original objectives of the research were: (1) to develop and implement the stratified computer-based ability test; (2) to compare, on psychometric criteria, the various approaches to computer-based ability testing, including the stratified computerized test,…
Petrilli, Michael J.; Wright, Brandon L.
At a time when the national conversation is focused on lagging upward mobility, it is no surprise that many educators point to poverty as the explanation for mediocre test scores among U.S. students compared to those of students in other countries. If American teachers in struggling U.S. schools taught in Finland, says Finnish educator Pasi…
The justification for national standards is that test scores predict a nation's future economic success. There is no evidence that supports this assumption. There is evidence that it is wrong. For more than half a century, reformers have been trying to fix our schools with little success. The obvious conclusion is that something that can't be…
Klein, Stephen P.; Hamilton, Laura S.; McCaffrey, Daniel F.; Stecher, Brian M.
Compared results on the Texas Assessment of Academic Skills to Texas (TAAS) score changes on the National Assessment of Educational Progress (NAEP). Texas fourth graders did improve significantly more on the NAEP mathematics test than their counterparts nationally, but this gain was smaller than their TAAS gains, and a similar gain was not seen…
Fahle, Erin; Reardon, Sean
Describing the variation in test scores between and within school districts is critical for: (1) for policy-related and descriptive work that investigates the sorting of students among districts and the differential effectiveness of those districts; and (2) for methodological work planning future experiments or interventions. Intraclass…
To achieve perpetually better test results each year as mandated by the No Child Left Behind Act (NCLB), teachers in successful schools such as Leroy Anderson Elementary in San Jose, California, will "try anything" to raise scores, as the school's principal stated in an interview with "The San Jose Mercury News." In schools across California for…
Kposowa, Augustine J.; Valdez, Amanda D.
Objectives: The primary objective of the study was to investigate the relationship between ubiquitous laptop use and academic achievement. It was hypothesized that students with ubiquitous laptops would score on average higher on standardized tests than those without such computers. Methods: Data were obtained from two sources. First, demographic…
Homeschooling, one of the fastest growing educational alternatives, is enjoying increasing respect from educators and parents alike. This is partly because homeschooling children score as well and often better on standardized tests than their publicly schooled counterparts. However, the vast majority of homeschooled students come from the…
Grissom, Jason A.; Kalogrides, Demetra; Loeb, Susanna
Expansion of the use of student test score data to measure teacher performance has fueled recent policy interest in using those data to measure the effects of school administrators as well. However, little research has considered the capacity of student performance data to uncover principal effects. Filling this gap, this article identifies…
Bender, Robert C.
Because most counselors have experienced a significant amount of success, they often have difficulty understanding the impact of test scores on persons who do not perform well. Counselor educators must develop experiential awareness in an area normally outside the realm of their students. To provide such an experience, 25 counselor trainees took…
Report on Education Research, 1983
THE FOLLOWING IS THE FULL TEXT OF THIS DOCUMENT: A new study by a pair of Harvard University researchers discounts earlier findings that coaching can substantially improve student performance on the Scholastic Aptitude Test (SAT). "There is simply insufficient evidence that large score increases are a result of a coaching program," write Rebecca…
Brennan, Robert L.
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas
This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Holland, Paul W.; Thayer, Dorothy T.
Applied the theory of exponential families of distributions to the problem of fitting the univariate histograms and discrete bivariate frequency distributions that often arise in the analysis of test scores. Considers efficient computation of the maximum likelihood estimates of the parameters using Newton's Method and computationally efficient…
Dougherty, Jack; Harelson, Jeffrey; Maloney, Laura; Murphy, Drew; Smith, Russell; Snow, Michael; Zannoni, Diane
Home buyers exercise school choice when shopping for a private residence due to its location in a public school district or attendance area. In this quantitative study of one Connecticut suburban district, we measure the effect of elementary school test scores and racial composition on home buyers' willingness to purchase single-family homes over…
Rangvid, Beatrice Schindler
We combine data from three studies for Denmark in the PISA 2000 framework to investigate differences in the native-immigrant test score gap by country of origin. In addition to the controls available from PISA data sources, we use student-level data on home background and individual migration histories linked from administrative registers. We find…
Matejko, Anna A; Price, Gavin R; Mazzocco, Michèle M M; Ansari, Daniel
Mathematical skills are of critical importance, both academically and in everyday life. Neuroimaging research has primarily focused on the relationship between mathematical skills and functional brain activity. Comparatively few studies have examined which white matter regions support mathematical abilities. The current study uses diffusion tensor imaging (DTI) to test whether individual differences in white matter predict performance on the math subtest of the Preliminary Scholastic Aptitude Test (PSAT). Grades 10 and 11 PSAT scores were obtained from 30 young adults (ages 17-18) with wide-ranging math achievement levels. Tract based spatial statistics was used to examine the correlation between PSAT math scores, fractional anisotropy (FA), radial diffusivity (RD) and axial diffusivity (AD). FA in left parietal white matter was positively correlated with math PSAT scores (specifically in the left superior longitudinal fasciculus, left superior corona radiata, and left corticospinal tract) after controlling for chronological age and same grade PSAT critical reading scores. Furthermore, RD, but not AD, was correlated with PSAT math scores in these white matter microstructures. The negative correlation with RD further suggests that participants with higher PSAT math scores have greater white matter integrity in this region. Individual differences in FA and RD may reflect variability in experience dependent plasticity over the course of learning and development. These results are the first to demonstrate that individual differences in white matter are associated with mathematical abilities on a nationally administered scholastic aptitude measure. PMID:23108272
Bishop, N. Scott
This study examined the effects of different test administration conditions on reading comprehension test scores. Evidence of performance differences across district testing conditions might imply that the meanings and interpretations associated with the corresponding test scores have limited generalizability (i.e., knowing how well one reads…
Marder, M.; Bansal, D.
We apply visualization and modeling methods for convective and diffusive flows to public school mathematics test scores from Texas. We obtain plots that show the most likely future and past scores of students, the effects of random processes such as guessing, and the rate at which students appear in and disappear from schools. We show that student outcomes depend strongly upon economic class, and identify the grade levels where flows of different groups diverge most strongly. Changing the effectiveness of instruction in one grade naturally leads to strongly nonlinear effects on student outcomes in subsequent grades. PMID:19805049
Zain, Zakiyah; Aziz, Nazrina; Ahmad, Yuhaniz
In clinical trials, the main purpose is often to compare efficacy between experimental and control treatments. Treatment comparisons often involve multiple endpoints, and this situation further complicates the analysis of survival data. In the case of tumor patients, endpoints concerning survival times include: times from tumor removal until the first, the second and the third tumor recurrences, and time to death. For each patient, these endpoints are correlated, and the estimation of the correlation between two score statistics is fundamental in derivation of overall treatment advantage. In this paper, the bivariate survival analysis method using the global score test methodology is extended to multivariate setting.
Jiao, Hong; Liu, Junhui; Haynie, Kathleen; Woo, Ada; Gorham, Jerry
This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test…
Deng, Qiangyu; Tang, Bihan; Xue, Chen; Liu, Yuan; Liu, Xu; Lv, Yipeng; Zhang, Lulu
Background: Description of the anatomical severity of injuries in trauma patients is important. While the Injury Severity Score has been regarded as the “gold standard” since its creation, several studies have indicated that the New Injury Severity Score is better. Therefore, we aimed to systematically evaluate and compare the accuracy of the Injury Severity Score and the New Injury Severity Score in predicting mortality. Methods: Two researchers independently searched the PubMed, Embase, and Web of Science databases and included studies from which the exact number of true-positive, false-positive, false-negative, and true-negative results could be extracted. Quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies checklist criteria. The meta-analysis was performed using Meta-DiSc. Meta-regression, subgroup analyses, and sensitivity analyses were conducted to determine the source(s) of heterogeneity and factor(s) affecting the accuracy of the New Injury Severity Score and the Injury Severity Score in predicting mortality. Results: The heterogeneity of the 11 relevant studies (total n = 11,866) was high (I2 > 80%). The meta-analysis using a random-effects model resulted in sensitivity of 0.64, specificity of 0.93, positive likelihood ratio of 5.11, negative likelihood ratio of 0.27, diagnostic odds ratio of 27.75, and area under the summary receiver operator characteristic curve of 0.9009 for the Injury Severity Score; and sensitivity of 0.71, specificity of 0.87, positive likelihood ratio of 5.22, negative likelihood ratio of 0.20, diagnostic odds ratio of 24.74, and area under the summary receiver operating characteristic curve of 0.9095 for the New Injury Severity Score. Conclusion: The New Injury Severity Score and the Injury Severity Score have similar abilities in predicting mortality. Further research is required to determine the appropriate use of the Injury Severity Score or the New Injury Severity Score based on specific
Lovasi, Gina S; Eldred-Skemp, Nicolia; Quinn, James W; Chang, Hsin-Wen; Rauh, Virginia A; Rundle, Andrew; Orjuela, Manuela A; Perera, Frederica P
Childhood cognitive and test-taking abilities have long-term implications for educational achievement and health, and may be influenced by household environmental exposures and neighborhood contexts. This study evaluates whether age 5 scores on the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R, administered in English) are associated with polycyclic aromatic hydrocarbon (PAH) exposure and neighborhood context variables including poverty, low educational attainment, low English language proficiency, and inadequate plumbing. The Columbia Center for Children's Environmental Health enrolled African-American and Dominican-American New York City women during pregnancy, and conducted follow-up for subsequent childhood health outcomes including cognitive test scores. Individual outcomes were linked to data characterizing 1-km network buffers around prenatal addresses, home observations, interviews, and prenatal PAH exposure data from personal air monitors. Prenatal PAH exposure above the median predicted 3.5 point lower total WPPSI-R scores and 3.9 point lower verbal scores; the association was similar in magnitude across models with adjustments for neighborhood characteristics. Neighborhood-level low English proficiency was independently associated with 2.3 point lower mean total WPPSI-R score, 1.2 point lower verbal score, and 2.7 point lower performance score per standard deviation. Low neighborhood-level educational attainment was also associated with 2.0 point lower performance scores. In models examining effect modification, neighborhood associations were similar or diminished among the high PAH exposure group, as compared with the low PAH exposure group. Early life exposure to personal PAH exposure or selected neighborhood-level social contexts may predict lower cognitive test scores. However, these results may reflect limited geographic exposure variation and limited generalizability. PMID:24994947
Eldred-Skemp, Nicolia; Quinn, James W.; Chang, Hsin-wen; Rauh, Virginia A.; Rundle, Andrew; Orjuela, Manuela A.; Perera, Frederica P.
Childhood cognitive and test-taking abilities have long-term implications for educational achievement and health, and may be influenced by household environmental exposures and neighborhood contexts. This study evaluates whether age 5 scores on the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R, administered in English) are associated with polycyclic aromatic hydrocarbon (PAH) exposure and neighborhood context variables including poverty, low educational attainment, low English language proficiency, and inadequate plumbing. The Columbia Center for Children’s Environmental Health enrolled African-American and Dominican-American New York City women during pregnancy, and conducted follow-up for subsequent childhood health outcomes including cognitive test scores. Individual outcomes were linked to data characterizing 1-km network buffers around prenatal addresses, home observations, interviews, and prenatal PAH exposure data from personal air monitors. Prenatal PAH exposure above the median predicted 3.5 point lower total WPPSI-R scores and 3.9 point lower verbal scores; the association was similar in magnitude across models with adjustments for neighborhood characteristics. Neighborhood-level low English proficiency was independently associated with 2.3 point lower mean total WPPSI-R score, 1.2 point lower verbal score, and 2.7 point lower performance score per standard deviation. Low neighborhood-level educational attainment was also associated with 2.0 point lower performance scores. In models examining effect modification, neighborhood associations were similar or diminished among the high PAH exposure group, as compared with the low PAH exposure group. Early life exposure to personal PAH exposure or selected neighborhood-level social contexts may predict lower cognitive test scores. However, these results may reflect limited geographic exposure variation and limited generalizability. PMID:24994947
Standardized tests of writing ability have individual and shared limitations and deficiencies that should be acknowledged by test designers and users. Most institutions use the portions of standardized tests that test ability to proofread and edit, but they do not use the optional essay sections that actually require students to write. To assure…
Kim, Anita; Berry, Christopher M
This study investigates the personality processes involved in the debate surrounding the use of cognitive ability tests in college admissions. In Study 1, 108 undergraduates (Mage = 18.88 years, 60 women, 80 Whites) completed measures of social dominance orientation (SDO), testing self-efficacy, and attitudes regarding the use of cognitive ability tests in college admissions; SAT/ACT scores were collected from the registrar. Sixty-seven undergraduates (Mage = 19.06 years, 39 women, 49 Whites) completed the same measures in Study 2, along with measures of endorsement of commonly presented arguments about test use. In Study 3, 321 American adults (Mage = 35.58 years, 180 women, 251 Whites) completed the same measures used in Study 2; half were provided with facts about race and validity issues surrounding cognitive ability tests. Individual differences in SDO significantly predicted support for the use of cognitive ability tests in all samples, after controlling for SAT/ACT scores and test self-efficacy and also among participants who read facts about cognitive ability tests. Moreover, arguments for and against test use mediated this effect. The present study sheds new light on an old debate by demonstrating that individual differences in beliefs about hierarchy play a key role in attitudes toward cognitive ability test use. PMID:24219574
Miner, Claire Usher; Osborne, W. Larry; Jaeger, Richard M.
Uses regression analysis on career development measures to examine whether career maturity indicators are predictive of interest consistency, differentiation, and score elevation. Results indicate that interest consistency and score elevation were weakly predicted by the measure; no relationship existed between the attitudinal and cognitive…
Martin, John D.; And Others
The degree of relationship between scores on the Barron Ego Strength Scale and the scores on the Bender-Gestalt Test was investigated on a sample of college students. Correlations were moderate to low. Racial differences were observed on the Bender-Gestalt Test. (Author/JKS)
Deckel, A W
1. Previous work reported that tests of executive functioning (EF) predict the risk of alcoholism in subject populations selected for a "high density" of a family history of alcoholism and/or the presence of sociopathic traits. The current experiment examined the ability of EF tests to predict the risk of alcoholism, as measured by the MacAndrew Alcoholism Scale (MAC), in outpatient subjects referred to a general neuropsychological testing service. 2. Sixty-eight male and female subjects referred for neuropsychological testing were assessed for their past drinking histories and administered the Wisconsin Card Sorting Test, the Wechsler Adult Intelligence Scale-Revised, the Trails (Part B) Test, and the MAC. Principal Components analysis (PCA) reduced the number of EF tests to two measures, including one that loaded on the WCST, and one that loaded on the Similarities, Picture Arrangement, and Trails tests. Multiple hierarchical regression first removed the variance from demographic variables, alcohol consumption, and verbal (i.e., Vocabulary) and non-verbal (i.e., Block Design) IQ, and then entered the executive functioning factors into the prediction of the MAC. 3. Seventy-six percent of the subjects were classified as either light, infrequent, or non-drinkers on the Quantity-Frequency-Variability scale. The factor derived from the WCST on PCA significantly added to the prediction of risk on the MAC (p = .0063), as did scores on Block Design (p = .033). Relatively more impaired scores on the WCST factor and Block Design were predictive of higher scores on the MAC. The other factors were not associated with MAC scores. 4. These results support the hypothesis that decrements in EF are associated with risk factors for alcoholism, even in populations where the density of alcoholic behaviors are not unusually high. When taken in conjunction with other findings, these results implicate EF test scores, and prefrontal brain functioning, in the neurobiology of the risk for
McFarland, Dennis J.
Purpose: Factor analysis is a useful technique to aid in organizing multivariate data characterizing speech, language, and auditory abilities. However, knowledge of the limitations of factor analysis is essential for proper interpretation of results. The present study used simulated test scores to illustrate some characteristics of factor…
Balboni, Giulia; Naglieri, Jack A.; Cubelli, Roberto
The concurrent and predictive validities of the Naglieri Nonverbal Ability Test (NNAT) and Raven's Colored Progressive Matrices (CPM) were investigated in a large group of Italian third-and fifth-grade students with different sociocultural levels evaluated at the beginning and end of the school year. CPM and NNAT scores were related to math and…
A path analytic model of state test anxiety was tested in 169 college students who were enrolled in statistics courses. Variables in the model included gender, mathematics ability, trait test anxiety (trait worry and trait emotionality as separate variables), statistics course anxiety, statistics achievement (scores on midterm examinations),…
Walter, Richard Barry
This study investigated the relationship between instructional level scores as determined by a cloze test and instructional level scores as determined by an informal reading inventory (IRI). Fifty male and 50 female subjects were randomly selected from the total fifth grade population of five schools chosen from a total of 22 midwestern elementary…
Alloway, Tracy Packiam; Gregory, David
Literacy problems are highly prevalent and can persist into adulthood. Yet, the majority of research on the predictive nature of cognitive skills to literacy has primarily focused on development and adolescent populations. The aim of the present study was to extend existing research to investigate the roles of IQ scores and Working Memory…
Martin, John D.; And Others
The relationship between Elizur's Hostility Scoring on the Rorschach Test and the Acting-Out Score on the Hand Test was examined. Correlations between the two measures (using several scoring procedures) ranged from .40 to .64. (JKS)
Kobrosly, Roni W; Seplaki, Christopher L; Jones, Courtney M; van Wijngaarden, Edwin
Objective To investigate the relationship between a measure of cumulative physiologic dysfunction and specific domains of cognitive function. Methods We examined a summary score measuring physiological dysfunction, a multisystem measure of the body’s ability to effectively adapt to physical and psychological demands, in relation to cognitive function deficits in a population of 4511 adults aged 20 to 59 who participated in the third National Health and Nutrition Examination Survey (1988–1994). Measures of cognitive function comprised three domains: working memory, visuomotor speed, and perceptual-motor speed. ‘Physiologic dysfunction’ scores summarizing measures of cardiovascular, immunologic, kidney, and liver function were explored. We used multiple linear regression models to estimate associations between cognitive function measures and physiological dysfunction scores, adjusting for socioeconomic factors, test conditions, and self-reported health factors. Results We noted a dose-response relationship between physiologic dysfunction and working memory (coefficient = 0.207, 95% CI = (0.066, 0.348), p < 0.0001) that persisted after adjustment for all covariates (p = 0.03). We did not observe any significant relationships between dysfunction scores and visuomotor (p = 0.37) or perceptual-motor ability (p = 0.33). Conclusions Our findings suggest that multisystem physiologic dysfunction is associated with working memory. Future longitudinal studies are needed to clarify the underlying mechanisms and explore the persistency of this association into later life. We suggest that such studies should incorporate physiologic data, neuroendocrine parameters, and a wide range of specific cognitive domains. PMID:22155941
Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J.
Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…
This study examined the relationship between teacher education students' scores on basic skills admission tests and graduating seniors' scores on the National Teacher Examinations (NTE) at Eastern Kentucky University. The 1981-82 basic skills test scores for 262 teacher education students were compared with their NTE scores taken in 1984-85 during…
Bracey, Gerald W.
Three former secretaries of education--William Bennett, Lauro Cavazos, and Terrel Bell--have touted state-level SAT scores as proof that educational financing does not matter. Recently, Brian Powell and Lala Carr Steelman adjusted scores for participation rate and detected a very strong relationship between expenditures and SAT scores. Bigger…
Stevens, Charlotte Bethany Rains
Nationwide, the goal of providing a productive science and math education to our youth in today's educational institutions is centering itself around the technology being utilized in these classrooms. In this age of digital technology, educational software and calculator-based laboratories (CBL) have become significant devices in the teaching of science and math for many states across the United States. Among the technology, the Texas Instruments graphing calculator and Vernier Labpro interface, are among some of the calculator-based laboratories becoming increasingly popular among middle and high school science and math teachers in many school districts across this country. In Tennessee, however, it is reported that this type of technology is not regularly utilized at the student level in most high school science classrooms, especially in the area of Physical Science (Vernier, 2006). This research explored the effect of calculator based laboratory instruction on standardized test scores. The purpose of this study was to determine the effect of traditional teaching methods versus graphing calculator teaching methods on the state mandated End-of-Course (EOC) Physical Science exam based on ability, gender, and ethnicity. The sample included 187 total tenth and eleventh grade physical science students, 101 of which belonged to a control group and 87 of which belonged to the experimental group. Physical Science End-of-Course scores obtained from the Tennessee Department of Education during the spring of 2005 and the spring of 2006 were used to examine the hypotheses. The findings of this research study suggested the type of teaching method, traditional or calculator based, did not have an effect on standardized test scores. However, the students' ability level, as demonstrated on the End-of-Course test, had a significant effect on End-of-Course test scores. This study focused on a limited population of high school physical science students in the middle Tennessee
Foreman, Jennifer L.; Gubbins, E. Jean
Teacher nominations of students are commonly used in gifted and talented identification systems to supplement psychometric measures of reasoning ability. In this study, second grade teachers were requested to nominate approximately one fourth of their students as having high learning potential in the year prior to the students' participation…
Discusses the ACTFL/ETS Oral Proficiency Interview (OPI) in relation to current models of communicative skills and argues that the OPI fails to measure important aspects of communicative ability. Two Situation Tests, one written and one oral, are described as alternative measures of communicative ability. Examples are given in Appendices.…
Zimmerman, Donald W.
Results of this study indicate that the correlation between half-test scores over repeated splits, over persons, and over repeated testings resulting in different sets of observed scores, is given by Kuder-Richardson Formula 21. (RF)
Reynolds, Matthew R
The linear loadings of intelligence test composite scores on a general factor (g) have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the g loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of this study was to (a) investigate whether the g loadings of composite scores from the Differential Ability Scales (2nd ed.) (DAS-II, C. D. Elliott, 2007a, Differential Ability Scales (2nd ed.). San Antonio, TX: Pearson) were nonlinear and (b) if they were nonlinear, to compare them with linear g loadings to demonstrate how SLODR alters the interpretation of these loadings. Linear and nonlinear confirmatory factor analysis (CFA) models were used to model Nonverbal Reasoning, Verbal Ability, Visual Spatial Ability, Working Memory, and Processing Speed composite scores in four age groups (5-6, 7-8, 9-13, and 14-17) from the DAS-II norming sample. The nonlinear CFA models provided better fit to the data than did the linear models. In support of SLODR, estimates obtained from the nonlinear CFAs indicated that g loadings decreased as g level increased. The nonlinear portion for the nonverbal reasoning loading, however, was not statistically significant across the age groups. Knowledge of general ability level informs composite score interpretation because g is less likely to produce differences, or is measured less, in those scores at higher g levels. One implication is that it may be more important to examine the pattern of specific abilities at higher general ability levels. PMID:23506024
Urushihata, Toshiya; Kinugasa, Takashi; Soma, Yuki; Miyoshi, Hirokazu
Balance impairment is one of the biggest risk factors for falls reducing inactivity, resulting in nursing care. Therefore, balance ability is crucial to maintain the activities of independent daily living of older adults. Many tests to assess balance ability have been developed. However, few reports reveal the structure underlying results of balance performance tests comparing young and older adults. Covariance structure analysis is a tool that is used to test statistically whether factorial structure fits data. This study examined aging effects on the factorial structure underlying balance performance tests. Participants comprised 60 healthy young women aged 22 ± 3 years (young group) and 60 community-dwelling older women aged 69 ± 5 years (older group). Six balance tests: postural sway, one-leg standing, functional reach, timed up and go (TUG), gait, and the EquiTest were employed. Exploratory factor analysis revealed that three clearly interpretable factors were extracted in the young group. The first factor had high loadings on the EquiTest, and was interpreted as 'Reactive'. The second factor had high loadings on the postural sway test, and was interpreted as 'Static'. The third factor had high loadings on TUG and gait test, and was interpreted as 'Dynamic'. Similarly, three interpretable factors were extracted in the older group. The first factor had high loadings on the postural sway test and the EquiTest and therefore was interpreted as 'Static and Reactive'. The second factor, which had high loadings on the EquiTest, was interpreted as 'Reactive'. The third factor, which had high loadings on TUG and the gait test, was interpreted as 'Dynamic'. A covariance structure model was applied to the test data: the second-order factor was balance ability, and the first-order factors were static, dynamic and reactive factors which were assumed to be measured based on the six balance tests. Goodness-of-fit index (GFI) of the models were acceptable (young group, GFI
Kinugasa, Takashi; Soma, Yuki; Miyoshi, Hirokazu
Balance impairment is one of the biggest risk factors for falls reducing inactivity, resulting in nursing care. Therefore, balance ability is crucial to maintain the activities of independent daily living of older adults. Many tests to assess balance ability have been developed. However, few reports reveal the structure underlying results of balance performance tests comparing young and older adults. Covariance structure analysis is a tool that is used to test statistically whether factorial structure fits data. This study examined aging effects on the factorial structure underlying balance performance tests. Participants comprised 60 healthy young women aged 22 ± 3 years (young group) and 60 community-dwelling older women aged 69 ± 5 years (older group). Six balance tests: postural sway, one-leg standing, functional reach, timed up and go (TUG), gait, and the EquiTest were employed. Exploratory factor analysis revealed that three clearly interpretable factors were extracted in the young group. The first factor had high loadings on the EquiTest, and was interpreted as ‘Reactive’. The second factor had high loadings on the postural sway test, and was interpreted as ‘Static’. The third factor had high loadings on TUG and gait test, and was interpreted as ‘Dynamic’. Similarly, three interpretable factors were extracted in the older group. The first factor had high loadings on the postural sway test and the EquiTest and therefore was interpreted as ‘Static and Reactive’. The second factor, which had high loadings on the EquiTest, was interpreted as ‘Reactive’. The third factor, which had high loadings on TUG and the gait test, was interpreted as ‘Dynamic’. A covariance structure model was applied to the test data: the second-order factor was balance ability, and the first-order factors were static, dynamic and reactive factors which were assumed to be measured based on the six balance tests. Goodness-of-fit index (GFI) of the models were
Zenisky, April L.; Hambleton, Ronald K.
Test scores matter these days. Test-takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees' information or usability needs, but this is clearly changing…
This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…
This handbook provides guidelines for teaching test-taking skills to students of all grade levels to help the students raise their standardized test scores. Topics covered include: understanding instructions and following directions, efficient use of time, intelligent guessing, and application of special strategies for multiple-choice and…
Molenaar, Dylan; Borsboom, Denny
Measurement invariance is an important prerequisite for the adequate comparison of group differences in test scores. In psychology, measurement invariance is typically investigated by means of linear factor analyses of subtest scores. These subtest scores typically result from summing the item scores. In this paper, we discuss 4 possible problems…
Severo, Milton; Gaio, A. Rita; Povo, Ana; Silva-Pereira, Fernanda; Ferreira, Maria Amélia
In theory the formula scoring methods increase the reliability of multiple-choice tests in comparison with number-right scoring. This study aimed to evaluate the impact of the formula scoring method in clinical anatomy multiple-choice examinations, and to compare it with that from the number-right scoring method, hoping to achieve an…
Handeland, Katina; Kjellevold, Marian; Wik Markhus, Maria; Eide Graff, Ingvild; Frøyland, Livar; Lie, Øyvind; Skotheim, Siv; Stormark, Kjell Morten; Dahl, Lisbeth; Øyen, Jannike
Assessment of adolescents' dietary habits is challenging. Reliable instruments to monitor dietary trends are required to promote healthier behaviours in this group. The purpose of this cross-sectional study was to assess adolescents' adherence to Norwegian dietary recommendations with a diet score and to report results from, and test-retest reliability of, the score. The diet score involved seven food groups and one physical activity indicator, and was applied to answers from a semi-quantitative food frequency questionnaire (FFQ) administered twice. Reproducibility of the score was assessed with Cohen's Kappa (κ statistics) at an interval of three months. The setting was eight lower-secondary schools in Hordaland County, Norway, and subjects were adolescents (n = 472) aged 14-15 years and their caregivers. Results showed that the proportion of adolescents consistently classified by the diet score was 87.6% (κ = 0.465). For food groups, proportions ranged from 74.0% to 91.6% (κ = 0.249 to κ = 0.573). Less than 40% of the participants were found to adhere to recommendations for frequencies of eating fruits, vegetables, added sugar, and fish. Highest compliance to recommendations was seen for choosing water as beverage and limit the intake of red meat. The score was associated with parental socioeconomic status. The diet score was found to be reproducible at an acceptable level. Health promoting work targeting adolescents should emphasize to increase the intake of recommended foods to approach nutritional guidelines. PMID:27483312
Handeland, Katina; Kjellevold, Marian; Wik Markhus, Maria; Eide Graff, Ingvild; Frøyland, Livar; Lie, Øyvind; Skotheim, Siv; Stormark, Kjell Morten; Dahl, Lisbeth; Øyen, Jannike
Assessment of adolescents’ dietary habits is challenging. Reliable instruments to monitor dietary trends are required to promote healthier behaviours in this group. The purpose of this cross-sectional study was to assess adolescents’ adherence to Norwegian dietary recommendations with a diet score and to report results from, and test-retest reliability of, the score. The diet score involved seven food groups and one physical activity indicator, and was applied to answers from a semi-quantitative food frequency questionnaire (FFQ) administered twice. Reproducibility of the score was assessed with Cohen’s Kappa (κ statistics) at an interval of three months. The setting was eight lower-secondary schools in Hordaland County, Norway, and subjects were adolescents (n = 472) aged 14–15 years and their caregivers. Results showed that the proportion of adolescents consistently classified by the diet score was 87.6% (κ = 0.465). For food groups, proportions ranged from 74.0% to 91.6% (κ = 0.249 to κ = 0.573). Less than 40% of the participants were found to adhere to recommendations for frequencies of eating fruits, vegetables, added sugar, and fish. Highest compliance to recommendations was seen for choosing water as beverage and limit the intake of red meat. The score was associated with parental socioeconomic status. The diet score was found to be reproducible at an acceptable level. Health promoting work targeting adolescents should emphasize to increase the intake of recommended foods to approach nutritional guidelines. PMID:27483312
Hageman, Barbara H.; Sigman, Clayton B.; Koslosky, John T.
A Test/Score/Report capability is currently being developed for the Transportable Payload Operations Control Center (TPOCC) Advanced Spacecraft Simulator (TASS) system which will automate testing of the Goddard Space Flight Center (GSFC) Payload Operations Control Center (POCC) and Mission Operations Center (MOC) software in three areas: telemetry decommutation, spacecraft command processing, and spacecraft memory load and dump processing. Automated computer control of the acceptance test process is one of the primary goals of a test team. With the proper simulation tools and user interface, the task of acceptance testing, regression testing, and repeatability of specific test procedures of a ground data system can be a simpler task. Ideally, the goal for complete automation would be to plug the operational deliverable into the simulator, press the start button, execute the test procedure, accumulate and analyze the data, score the results, and report the results to the test team along with a go/no recommendation to the test team. In practice, this may not be possible because of inadequate test tools, pressures of schedules, limited resources, etc. Most tests are accomplished using a certain degree of automation and test procedures that are labor intensive. This paper discusses some simulation techniques that can improve the automation of the test process. The TASS system tests the POCC/MOC software and provides a score based on the test results. The TASS system displays statistics on the success of the POCC/MOC system processing in each of the three areas as well as event messages pertaining to the Test/Score/Report processing. The TASS system also provides formatted reports documenting each step performed during the tests and the results of each step. A prototype of the Test/Score/Report capability is available and currently being used to test some POCC/MOC software deliveries. When this capability is fully operational it should greatly reduce the time necessary
Esau, Helmut; Yost, Carlson
This paper describes an experiment that was undertaken to examine the usefulness of the cloze test as an objective measure of a native speaker's writing ability. A modified version of the cloze test used by Oller and others to measure integrative language skills in non-native speakers was given to 100 freshman English students. The test…
Ott, Brian R.; Heindel, William C.; Whelihan, William M.; Caron, Mark. D.; Piatt, Andrea L.; DiCarlo, Margaret A.
A battery of standard neuropsychological tests examining various features of executive function, attention, and visual perception was administered to 27 subjects with questionable to mild dementia and compared to a 4-point caregiver rating scale of driving ability. Based on the results of this study, a computerized maze task, employing 10 mazes, was administered to a second sample of 40 normal elders and questionable to moderately demented drivers. Comparison was made to the same caregiver rating scale as well as to crash frequency. In the first study of neuropsychological tests, errors on Porteus Mazes emerged as the only significant predictor of driving ability in a stepwise regression analysis. In the follow-up study employing the computerized mazes, all 10 mazes were significantly related to driving ability ratings. Computerized tests of maze performance offer promise as a screening tool to identify potential driving impairment among cognitively impaired elderly and demented drivers. PMID:12967057
Fujimoto, Shinichiro; Kondo, Takeshi; Yamamoto, Hideya; Yokoyama, Naoyuki; Tarutani, Yasuhiro; Takamura, Kazuhisa; Urabe, Yoji; Konno, Kumiko; Nishizaki, Yuji; Shinozaki, Tomohiro; Kihara, Yasuki; Daida, Hiroyuki; Isshiki, Takaaki; Takase, Shinichi
Existing methods to calculate pre-test probability of obstructive coronary artery disease (CAD) have been established using selected high-risk patients who were referred to conventional coronary angiography. The purpose of this study is to develop and validate our new method for pre-test probability of obstructive CAD using patients who underwent coronary CT angiography (CTA), which could be applicable to a wider range of patient population. Using consecutive 4137 patients with suspected CAD who underwent coronary CTA at our institution, a multivariate logistic regression model including clinical factors as covariates calculated the pre-test probability (K-score) of obstructive CAD determined by coronary CTA. The K-score was compared with the Duke clinical score using the area under the curve (AUC) for the receiver-operating characteristic curve. External validation was performed by an independent sample of 319 patients. The final model included eight significant predictors: age, gender, coronary risk factor (hypertension, diabetes mellitus, dyslipidemia, smoking), history of cerebral infarction, and chest symptom. The AUC of the K-score was significantly greater than that of the Duke clinical score for both derivation (0.736 vs. 0.699) and validation (0.714 vs. 0.688) data sets. Among patients who underwent coronary CTA, newly developed K-score had better pre-test prediction ability of obstructive CAD compared to Duke clinical score in Japanese population. PMID:24770610
Pope, Gregory A.; Wentzel, Carolyn; Braden, Brigitta; Anderson, Jordan
The purpose of this study was to investigate statistical relationships between gender and Alberta Achievement Testing Program scores. Achievement test scores from grades 3, 6, and 9 in all subject areas were investigated during a four-year period. Results showed statistically significant positive correlations between gender and scores in most…
Journal of Blacks in Higher Education, 2003
Discusses the racial scoring gap on tests for admission to medical, business, law, and other graduate programs, noting that in the highest-scoring brackets on the Medical College Admission Test (MCAT), the racial gap is even larger. Whites are five times, twelve times, and seven times more likely, respectively, to score higher on the MCAT, Law…
Spencer, Harry E.
The relationships between mathematical SAT scores (SAT-M) and grades earned by students in eight consecutive years of first- and second-semester general chemistry courses at Oberlin College are reported. The academic years surveyed are 1987-1988 through 1994-1995. SAT-M scores are grouped within seven ranges from 450 and less to 710-800. Within any range of scores, students in both courses earned a wide variety of grades, but those within the higher ranges tended to earn higher grades and fewer failures relative to students in the lower ranges. For all students within each range of SAT-M scores, the fraction earning each grade are calculated. These fractions along with the numbers of students and their SAT-M scores in a subset are used to calculate grades expected for that subset. In the first-semester course, the expected and actual grades for subsets of males, females, first-year students, non-first-year students, Asians, Blacks, and Latinos are not significantly different. Those who eventually majored in chemistry or biochemistry attained grades very significantly higher than expected. Most students tended to achieve grades in the second-semester course that were similar to those earned in the first-semester course.
In order to meet the goals of No Child Left Behind, standardized testing is preeminent as the sole indicator determining whether states all across America demonstrate adequate yearly progress regarding the improvement of student achievement in literacy education. This book will help teachers and parents raise children's scores on standardized…
Viliūnas, V; Lukauskiene, R; Svegzda, A; Zukauskas, A
The scoring artefact in the Farnsworth-Munsell 100-Hue test, arising from the grouping of the caps into four boxes, was investigated. The traditional method of scoring performed with the numbers of the anchor caps disregarded and the alternative scoring performed with the numbers of the anchor caps employed, were compared. For the traditional method of scoring, we revealed an increase of the error score of the outside (end-box) caps when the total error score was above 240. On the contrary for scoring performed with the numbers of the anchor caps employed, the difference between the error score of the outside caps and the average error per cap is not significant. To mitigate the end-box artefact and to improve the reliability of the Farnsworth-Munsell 100-Hue test, corrections to the traditional method of scoring are proposed. PMID:17040422
This guide was prepared to facilitate the practitioner's selection of formal tests for evaluating communicative behavior in clinical infant populations during the first year of life. Clinical instruments with particular emphasis on communication and emerging language and speech abilities were identified in terms of publishers' recommended…
Michaelides, Marcos A.; Parpa, Koulla M.; Thompson, Jerald; Brown, Barry
The purpose of this project was to identify the relationships between various fitness parameters such as upper body muscular endurance, upper and lower body strength, flexibility, body composition and performance on an ability test (AT) that included simulated firefighting tasks. A second intent was to create a regression model that would predict…
Rutkowski, Leslie; Vasterling, Jennifer J.; Proctor, Susan P.; Anderson, Carolyn J.
Given the widespread use and high-stakes nature of educational standardized assessments, understanding factors that affect test-taking ability in young adults is vital. Although scholarly attention has often focused on demographic factors (e.g., gender and race), sufficiently prevalent acquired characteristics may also help explain widespread…
Blackburn, McKinley L.
Previous research has suggested that skills reflected in test-score performance on tests such as the Armed Forces Qualification Test (AFQT) can account for some of the racial differences in average wages. I use a more complete set of test scores available with the National Longitudinal Survey of Youth 1979 Cohort to reconsider this evidence, and…
... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
... 21 Food and Drugs 8 2012-04-01 2012-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
... 21 Food and Drugs 8 2013-04-01 2013-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
... 21 Food and Drugs 8 2014-04-01 2014-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum...
Dorans, Neil J.; Moses, Tim P.; Eignor, Daniel R.
Score equating is essential for any testing program that continually produces new editions of a test and for which the expectation is that scores from these editions have the same meaning over time. Particularly in testing programs that help make high-stakes decisions, it is extremely important that test equating be done carefully and accurately.…
van der Linden, Wim J.
A constrained computerized adaptive testing (CAT) algorithm is presented that automatically equates the number-correct scores on adaptive tests. The algorithm can be used to equate number-correct scores across different administrations of the same adaptive test as well as to an external reference test. The constraints are derived from a set of…
Research documents that transient students who change schools frequently oftensuffer from low academic achievement. This article investigates standardized group measures by disentangling elementary achievement scores. Located in a highly transient area outside of Philadelphia, Pennsylvania, Main Street School had their fifth grade Pennsylvania…
Gentry, Ruben; Stokes, Dorothy
Many African Americans were imbued with the cliché that they must work twice as hard as others to be a success in life. Entering college, students with this belief put extensive effort into earning top grades to ensure quality preparation for their chosen career; yet, some fail to earn top scores. Why? This is the million dollar question, but the…
Cope, Ronald T.; Kolen, Michael J.
This study compared five density estimation techniques applied to samples from a population of 272,244 examinees' ACT English Usage and Mathematics Usage raw scores. Unsmoothed frequencies, kernel method, negative hypergeometric, four-parameter beta compound binomial, and Cureton-Tukey methods were applied to 500 replications of random samples of…
Devena, Sarah E.; Watkins, Marley W.
The Wechsler Intelligence Scale for Children-Fourth Edition General Abilities Index and Cognitive Proficiency Index have been advanced as possible diagnostic markers of attention deficit hyperactivity disorder. This hypothesis was tested with a hospital sample with attention deficit hyperactivity disorder (n = 78), a referred but nondiagnosed…
Wang, Xiang-Bo; Harris, Vincent; Roussos, Louis
Multidimensionality is known to affect the accuracy of item parameter and ability estimations, which subsequently influences the computation of item characteristic curves (ICCs) and true scores. By judiciously combining sections of a Law School Admission Test (LSAT), 11 sections of varying degrees of uni- and multidimensional structures are used…
Carraway, Cassandra T.
A study was conducted to determine whether participation in a test-taking strategy seminar significantly decreased test anxiety in first-year nursing students. The study also sought to compare nursing test scores of first-year nursing students who participated in the seminar with those who did not. The sample consisted of 30 first-year nursing…
Rosselli, M; Ardila, A; Bateman, J R; Guzmán, M
Limited information is currently available about performance of Spanish-speaking children on different neuropsychological tests. This study was designed to (a) analyze the effects of age and sex on different neuropsychological test scores of a randomly selected sample of Spanish-speaking children, (b) analyze the value of neuropsychological test scores for predicting school performance, and (c) describe the neuropsychological profile of Spanish-speaking children with learning disabilities (LD). Two hundred ninety (141 boys, 149 girls) 6- to 11-year-old children were selected from a school in Bogotá, Colombia. Three age groups were distinguished: 6- to 7-, 8- to 9-, and 10- to 11-year-olds. Performance was measured utilizing the following neuropsychological tests: Seashore Rhythm Test, Finger Tapping Test (FTT), Grooved Pegboard Test, Children's Category Test (CCT), California Verbal Learning Test-Children's Version (CVLT-C), Benton Visual Retention Test (BVRT), and Bateria Woodcock Psicoeducativa en Español (Woodcock, 1982). Normative scores were calculated. Age effect was significant for most of the test scores. A significant sex effect was observed for 3 test scores. Intercorrelations were performed between neuropsychological test scores and academic areas (science, mathematics, Spanish, social studies, and music). In a post hoc analysis, children presenting very low scores on the reading, writing, and arithmetic achievement scales of the Woodcock battery were identified in the sample, and their neuropsychological test scores were compared with a matched normal group. Finally, a comparison was made between Colombian and American norms. PMID:11827093
Troll, Lillian E.; And Others
After seven years, a group (N=32) of originally nonemployed poverty-level older people (over 60) now employed as foster grandparents were retested with the WAIS. Three subtest scores showed stability and Digit Span showed a statistically significant drop. Neither age nor initial level of health or WAIS scores was related to test-score changes over…
Cross, Lawrence H.; And Others
A new scoring procedure for multiple choice tests attempts to assess partial knowledge and to restrict guessing. It is a variant of Coombs' elimination scoring method, adapted for use with the carbon-shield answer sheets commonly used with answer-until-correct scoring. Examinees are directed to erase the carbon shields of choices they are certain…
This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…
Over the past five years, both DC Public Schools (DCPS) and public charter schools (PCS) have seen significant growth in secondary reading and math scores on the state test known as the District of Columbia Comprehensive Assessment System (DC CAS). However, scores have not improved as much at the elementary level. Reading and math scores for DCPS…
Zwick, Rebecca; Zapata-Rivera, Diego; Hegarty, Mary
Research has shown that many educators do not understand the terminology or displays used in test score reports and that measurement error is a particularly challenging concept. We investigated graphical and verbal methods of representing measurement error associated with individual student scores. We created four alternative score reports, each…
Gavett, Brandon E
The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed. PMID:25784058
Carroll, John B.
The problem of determining relative weights for quantity and quality in scoring foreign language speaking and writing fluency tests is studied. French speaking and writing fluency tests were administered to students of French in several schools in England. Data from these tests was analyzed to support the suggestion that scoring formulas should…
Cornwell, Christopher; Mustard, David B.; Van Parys, Jessica
Using data from the 1998-99 ECLS-K cohort, we show that the grades awarded by teachers are not aligned with test scores. Girls in every racial category outperform boys on reading tests, while boys score at least as well on math and science tests as girls. However, boys in all racial categories across all subject areas are not represented in…
Ho, Andrew D.
State test score trends are widely interpreted as indicators of educational improvement. To validate these interpretations, state test score trends are often compared to trends on other tests such as the National Assessment of Educational Progress (NAEP). These comparisons raise serious technical and substantive concerns. Technically, the most…
The relationship between the National League for Nursing (NLN) achievement test scores and performance on the State Board Test Pool Examination (SBTPE) was studied with 166 graduates of a diploma degree school of nursing between 1976 and 1978. It was found that NLN achievement test scores had a highly significant correlation with SBTPE results.…
Klesch, Heather S.
The reporting of scores on educational tests is at times misunderstood, misinterpreted, and potentially confusing to examinees and other stakeholders who may need to interpret test scores. In reporting test results to examinees, there is a need for clarity in the message communicated. As pressure rises for students to demonstrate performance at a…
Das, Jishnu; Dercon, Stefan; Habyarimana, James; Krishnan, Pramila; Muralidharan, Karthik; Sundararaman, Venkatesh
Empirical studies of the relationship between school inputs and test scores typically do not account for the fact that households will respond to changes in school inputs. We present a dynamic household optimization model relating test scores to school and household inputs, and test its predictions in two very different low-income country…
Jelínek, Martin; Květon, Petr; Vobořil, Dalibor
Despite initial expectations, which have emerged with the advancement of computer technology over the last decade of the twentieth century, scientific literature does not contain many relevant references regarding the development and use of innovative items in psychological testing. Our study presents and evaluates two novel item types. One item type is derived from a standard schematic test item used for the assessment of the spatial perception aspect of spatial ability, enhanced by an interactive response module. The performance on this item type is correlated with the performance on its paper and pencil counterpart. The other innovative item type used complex stimuli in the form of a short video of a ride through a city presented in an on-route perspective, which is intended to measure navigation skills and the ability to keep oneself oriented in space. In this case, the scores were related to the capacity of visuo-spatial working memory and also to the overall score in the paper/pencil test of spatial ability. The second relationship was moderated by gender. PMID:25362549
Warnimont, Chad S.
The purpose of this quantitative study was to examine the relationship between students' performance on the Cognitive Abilities Test (CogAT) and the fourth and fifth grade Reading and Math Achievement Tests in Ohio. The sample utilized students from a suburban school district in Northwest Ohio. Third grade CogAT scores (2006-2007 school year), 4th…
Delaware State Dept. of Education, Dover. Assessment and Accountability Branch.
This guide is intended to help parents understand the Delaware Student Testing Program (DSTP) and the reports it generates. The DSTP tests are administered to provide an accurate measure of how well students are doing relative to Delaware's rigorous content standards. DSTP tests are administered in reading, writing, mathematics, science, and…
Zou, Xiao-Ling; Chen, Yan-Min
The effects of computer and paper test media on EFL test-takers with different computer familiarity in writing scores and in the cognitive writing process have been comprehensively explored from the learners' aspect as well as on the basis of related theories and practice. The results indicate significant differences in test scores among the…
Ghabaee, Mojdeh; Zandieh, Ali; Mohebbi, Shahrzad; Fakhri, Mohammad; Sadeghian, Homa; Divani, Fatemeh; Amirifard, Hamed; Mousavi-Mirkala, Mohammadreza; Ghaffarpour, Majid
We aimed to compare the association of high-sensitivity C-reactive protein (CRP) and National Institutes of Health Stroke Scale (NIHSS) score with mortality risk and to determine the optimal threshold of CRP for prediction of mortality in ischemic-stroke patients. A series of 162 patients with first-ever ischemic-stroke admitted within 24 h after onset of symptoms was enrolled. CRP and NIHSS score were estimated on admission and their predictive abilities for mortality at 7 days were determined by logistic-regression analyses. Receiver-Operating Characteristic (ROC) curves were depicted to identify the optimal cut-off of CRP, using the maximum Youden-index and the shortest-distance methods. Deceased patients had higher levels of CRP and NIHSS on admission (8.87 ± 7.11 vs. 2.20 ± 4.71 mg/l for CRP, and 17.31 ± 6.36 vs. 8.70 ± 4.85 U for NIHSS, respectively, P < 0.01). CRP and NIHSS were correlated with each other (r (2) = 0.39, P < 0.001) and were also independently associated with increased risk of mortality [odds ratios (95 % confidence interval) of 1.16 (1.05-1.28) and 1.20 (1.07-1.35) for CRP and NIHSS, respectively, P < 0.01]. The areas under the ROC curves of CRP and NIHSS for mortality were 0.82 and 0.84, respectively. The CRP value of 2.2 mg/l was identified as the optimal cut-off value for prediction of mortality within 7 days (sensitivity: 0.81, specificity: 0.80). Thus, CRP as an independent predictor of mortality following ischemic-stroke is comparable with NIHSS and the value of 2.2 mg/l yields the optimum sensitivity and specificity for mortality prediction. PMID:23975559
DUBOIS, PHILIP H.; WIENTGE, KING M.
THIS PRELIMINARY MANUAL OUTLINES CONTENT, ADMINISTRATIVE AND SCORING PROCEDURES, ANTECEDENT RESEARCH, AND AVAILABLE NORM DATA FOR THE TEST OF ADULT COLLEGE APTITUDE (TACA). THE TACA, A COMBINED TEST AND ANSWER SHEET ADAPTED FOR VISUAL SCORING BY AN OPTICAL SCANNER, CONSISTS OF 22 ITEMS ON BIOGRAPHICAL DATA (AGE, SEX, OCCUPATION, FAMILY AND MARITAL…
Xi, Xiaoming; Mollaun, Pam
We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…
Correlational evidence suggests that high school GPA is better than admission test scores in predicting first-year college GPA, although test scores have incremental predictive validity. The usefulness of a selection variable in making admission decisions depends in part on its predictive validity, but also on institutions' selectivity and…
Cech, Scott J.
More students are taking Advanced Placement tests, but the proportion of tests receiving what is deemed a passing score has dipped, and the mean score is down for the fourth year in a row. Data released here this week by the New York City-based nonprofit organization that owns the AP brand shows that a greater-than-ever proportion of students…
Pellicer-Sanchez, Ana; Schmitt, Norbert
Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there…
Frary, Robert B.
Six scoring methods for assigning weights to right or wrong responses according to various instructions given to test takers are analyzed with respect to expected change scores and the effect of various levels of information and misinformation. Three of the methods provide feedback to the test taker. (Author/CTM)
There are many reasons to align tests with curricular standards, but this alignment is not sufficient to protect against score inflation. This report explains the relationship between alignment and score inflation by clarifying what is meant by inappropriate test preparation. It provides a concrete, hypothetical example that illustrates a process…
Ebuoh, Casmir N.; Ezeudu, S. A.
The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…
Brender, John R.
This research investigated the effects of homework completion on test scores for 401 undergraduate students, 94 percent African American, at an urban university in 2 levels of introductory Spanish, all with the same instructor. Five to six teacher-generated exams were administered during the course; the lowest test score for each student was…
Simpson, Robert G.
Occasionally, differences in test scores seem to indicate that a student performs much better in one reading area than in another when, in reality, the differences may not be statistically significant. The author presents a table in which statistically significant differences between Woodcock test standard scores are identified. (Author)
Cascallar, Alicia S.; Dorans, Neil J.
This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…
Cascallar, Alicia S.; Dorans, Neil J.
This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…
Silles, Mary A.
This article, using longitudinal data from the National Child Development Study, presents new evidence on the effects of family size and birth order on test scores and behavioral development at age 7, 11 and 16. Sibling size is shown to have an adverse causal effect on test scores and behavioral development. For any given family size, first-borns…
Hanson, Bradley A.
Three methods of estimating test score distributions that may improve on using the observed frequencies (OBFs) as estimates of a population test score distribution are considered: the kernel method (KM); the polynomial method (PM); and the four-parameter beta binomial method (FPBBM). The assumption each method makes about the smoothness of the…
... HUMAN SERVICES Food and Drug Administration 21 CFR Part 866 Medical Devices; Ovarian Adnexal Mass... regulation classifying ovarian adnexal mass assessment score test systems to restrict these devices so that a... mass assessment score test system into class II (special controls). DATES: Submit either electronic...
Pankratz, Mary; Morrison, Andrea; Plante, Elena
Differences in the standard scores for the Peabody Picture Vocabulary Test-Revised (PPVT-R; L. M. Dunn & L. M. Dunn, 1981) and the PPVT-Third Edition (PPVT-III; Dunn & Dunn, 1997b) are known to exist for children, with typically higher scores occurring on the PPVT-III. However, these tests are administered into adulthood as well, and score…
CHANGES IN ACADEMIC APTITUDE AND ACHIEVEMENT TEST SCORES OF PUPILS ATTENDING PUBLIC SCHOOLS IN DISADVANTAGED AREAS IN NEW YORK CITY WERE INVESTIGATED. AN ATTEMPT WAS MADE TO DETERMINE WHETHER VARYING DEGREES OF MOBILITY WERE ASSOCIATED WITH VARIATION IN CHANGES IN TEST SCORES. THE CUMULATIVE RECORD CARDS OF SIXTH-GRADE PUPILS WERE EXAMINED TO…
Lockwood, J. R.; McCaffrey, Daniel F.
A common strategy for estimating treatment effects in observational studies using individual student-level data is analysis of covariance (ANCOVA) or hierarchical variants of it, in which outcomes (often standardized test scores) are regressed on pretreatment test scores, other student characteristics, and treatment group indicators. Measurement…
Springer, Matthew G.
Previous research on the effect of accountability programs on the distribution of student test score gains is decidedly mixed. This study examines the issue by estimating an educational production function in which test score gains are a function of the incentives schools have to focus instruction on below-proficient students. NCLB's threat of…
Increasing standardized test scores in reading and math is of high importance to the California Department of Education to meet requirements mandated by the No Child Left Behind (NCLB) act of 2001. More research is needed to understand the best ways to improve tests scores to meet concerns of the NCLB act. The purpose of the study was to evaluate…
This paper assesses the magnitude of the non-indigenous/indigenous test-score gap for third-year and fourth-year primary school pupils in Peru, in relation to the main family, school and peer inputs contributing to the test-score gap using the estimation method of feasible generalized least squares. The article then decomposes the gap into its…
A substantial body of evidence has shown large academic test score gaps between black and white students in early childhood. These gaps remain, and probably grow, as students progress through school. Many researchers have sought to explain these persistent test score gaps, and particularly, to understand the role of students' socio-economic status…
We apply a quantile version of the Oaxaca-Blinder decomposition to estimate the counterfactual distribution of the test scores of Black students. In the Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999 (ECLS-K), we find that the gap initially appears only at the top of the distribution of test scores. As children age, however,…
Chen, Shiu-Sheng; Luoh, Ming-Ching
Using data from the Programme for International Student Assessment (PISA) and the Trends in International Mathematics and Science Study (TIMSS), we investigate the link between test scores (mathematics and science) and cross-country income differences. We would like to know whether test scores are good indicators of labor-force quality. The…
Berends, Mark; Lucas, Samuel R.; Penaloza, Roberto V.
Through several decades of research, a great deal has been written about trends in black-white test scores and the factors that may explain the gaps in different subject areas. Only a few studies have examined the changing relationships between gaps in students' test scores and family and school measures in nationally representative data over…
Berends, Mark; Penaloza, Roberto V.
Background/Context: Although there has been progress in closing the test score gaps among student groups over past decades, that progress has stalled. Many researchers have speculated why the test score gaps closed between the early 1970s and the early 1990s, but only a few have been able to empirically study how changes in school factors and…
Chambliss, Catherine; Cattai, Ashley; Benton, Peter; Elghawy, Ahmed; Fan, Madde; Thompson, Kayleigh; Scavicchio, Daniel; Tanenbaum, Joshua
The Freudenfreude and Schadenfreude Test (FAST) had moderate test-retest reliability in an undergraduate sample. Freudenfreude scores were lower and Schadenfreude scores were higher among mildly depressed than nondepressed students. Distinctive reactions to personal success and failure were associated with depression. Responses to others' success and failure may also be related to depression. PMID:23045853
Jancarík, Antonín; Kostelecká, Yvona
Electronic testing has become a regular part of online courses. Most learning management systems offer a wide range of tools that can be used in electronic tests. With respect to time demands, the most efficient tools are those that allow automatic assessment. The presented paper focuses on one of these tools: matching questions in which one…
Berry, Christopher M.; Clark, Malissa A.; McClure, Tara K.
The correlation between cognitive ability test scores and performance was separately meta-analyzed for Asian, Black, Hispanic, and White racial/ethnic subgroups. Compared to the average White observed correlation ([image omitted] = 0.33, N = 903,779), average correlations were lower for Black samples ([image omitted] = 0.24, N = 112,194) and…
Dwyer, Andrew C.
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Klein, Stephen P.; Hamilton, Laura S.; McCaffrey, Daniel F.; Stecher, Brian M.
Texas students have made extraordinarily large gains on statewide achievement tests, the Texas Assessment of Academic Skills (TAAS), gains so dramatic that they have been dubbed the "Texas miracle." There is general agreement that these gains are attributable to the high stakes accountability system in Texas, but there is some question about what…
Rural School and Community Trust, Washington, DC.
A number of studies suggests that the small size of many rural schools gives their students, especially the poorest, a leg up on academic achievement. This notion is supported by the standardized test results presented in this report, from a sample of the primarily small schools participating in the Rural School and Community Trust, a national…
Hetzler, Ronald K.; Stickley, Christopher D.; Kimura, Iris F.
In this study, we developed allometric exponents for scaling Wingate anaerobic test (WAnT) power data that are reflective in controlling for body mass (BM) and lean body mass (LBM) and established a normative WAnT data set for college-age women. One hundred women completed a standard WAnT. Allometric exponents and percentile ranks for peak (PP)…
Berson, Barry L.
The purpose of this memo is to present tests that comprise the test battery used to select Navy personnel to train marine mammals, and to describe the scoring procedures of the tests. The test battery consists of: Biosystems General Information Test (BGIT), Personnel History Questionnaire (PHQ), Gordon Personal Inventory, Gordon Personal Profile,…
Giessman, Jacob A.; Gambrell, James L.; Stebbins, Molly S.
The Naglieri Nonverbal Ability Test, Second Edition (NNAT2), is used widely to screen students for possible inclusion in talent development programs. The NNAT2 claims to provide a more culturally neutral evaluation of general ability than tests such as Form 6 of the Cognitive Abilities Test (CogAT6), which has Verbal and Quantitative batteries in…
Beketayev, Kenes; Runco, Mark A
Divergent thinking (DT) tests are useful for the assessment of creative potentials. This article reports the semantics-based algorithmic (SBA) method for assessing DT. This algorithm is fully automated: Examinees receive DT questions on a computer or mobile device and their ideas are immediately compared with norms and semantic networks. This investigation compared the scores generated by the SBA method with the traditional methods of scoring DT (i.e., fluency, originality, and flexibility). Data were collected from 250 examinees using the "Many Uses Test" of DT. The most important finding involved the flexibility scores from both scoring methods. This was critical because semantic networks are based on conceptual structures, and thus a high SBA score should be highly correlated with the traditional flexibility score from DT tests. Results confirmed this correlation (r = .74). This supports the use of algorithmic scoring of DT. The nearly-immediate computation time required by SBA method may make it the method of choice, especially when it comes to moderate- and large-scale DT assessment investigations. Correlations between SBA scores and GPA were insignificant, providing evidence of the discriminant and construct validity of SBA scores. Limitations of the present study and directions for future research are offered. PMID:27298632
Lentz, Christine A.
The purpose of this mixed method study was to examine the alignment of the written, enacted, and tested curricula of the Ocean City High School science course sequencing and its impact on student achievement. This study also examined the school's ability to predict student scores on the science portion of the High School Proficiency Assessment (HSPA). Data collected for science achievement included the science portion of the Grade Eight Proficiency Assessment (GEPA) as a pretest and the scores for the science portion of the HSPA as a posttest. Data collected for curriculum alignment included an examination of teacher generated course curriculum maps to determine the alignment with the New Jersey Core Curriculum Content Standards and the HSPA Test Specifications Directory. The quantitative data were treated through a series of paired samples t-tests, Pearson product moment correlation was used to examine relationships between variables, an ANCOVA analysis and a stepwise regression analysis were also completed. Based on the findings of the data analysis of this research effort, the following conclusions were drawn: (1) the alignment of the enacted curriculum with the tested and written curricula affected science achievement. (2) GEPA scores are significantly tied to HSPA scores and (3) GEPA scores and enrollment in the science sequence whose curriculum was aligned with the written and tested curricula, met the requirements of a predictor of scores on the HSPA exam. It is expected that educational leadership will use the results of this research to inform practice and drive decision-making in respect to student placement in to course sequences. It is hoped that the results will not only increase support for the district's curricula development plan but also add to the overall body of knowledge surrounding science program effectiveness in relation to the No Child Left Behind standards.
Vidulich, M. A.; Tsang, P. S.
Most real world operators are required to perform multiple tasks simultaneously. In some cases, such as flying a high performance aircraft or trouble shooting a failing nuclear power plant, the operator's ability to time share or process in parallel" can be driven to extremes. This has created interest in selection tests of cognitive abilities. Two tests that have been suggested are the Dichotic Listening Task and the Cognitive Failures Questionnaire. Correlations between these test results and time sharing performance were obtained and the validity of these tests were examined. The primary task was a tracking task with dynamically varying bandwidth. This was performed either alone or concurrently with either another tracking task or a spatial transformation task. The results were: (1) An unexpected negative correlation was detected between the two tests; (2) The lack of correlation between either test and task performance made the predictive utility of the tests scores appear questionable; (3) Pilots made more errors on the Dichotic Listening Task than college students.
Salleh, N M; Fueki, K; Garrett, N R; Ohyama, T
The aim of this study was to compare objective and subjective hardness of selected common foods with a wax cube used as a test item in a mixing ability test. Objective hardness was determined for 11 foods (cream cheese, boiled fish paste, boiled beef, apple, raw carrot, peanut, soft/hard rice cracker, jelly, plain chocolate and chewing gum) and the wax cube. Peak force (N) to compress each item was obtained from force-time curves generated with the Tensipresser. Perceived hardness ratings of each item were made by 30 dentate subjects (mean age 26.9 years) using a visual analogue scale (100 mm). These subjective assessments were given twice with a 1 week interval. High intraclass correlation coefficients (ICCs) for test-retest reliability were seen for all foods (ICC > 0.68; P < 0.001). One-way anova found a significant effect of food type on both the objective hardness score and the subjective hardness rating (P < 0.001). The wax cube showed significant lower objective hardness score (32.6 N) and subjective hardness rating (47.7) than peanut (45.3 N, 63.5) and raw carrot (82.5 N, 78.4) [P < 0.05; Ryan-Einot-Gabriel-Welsch (REGW)-F]. A significant semilogarithmic relationship was found between the logarithm of objective hardness scores and subjective hardness ratings across twelve test items (r = 0.90; P < 0.001). These results suggest the wax cube has a softer texture compared with test foods traditionally used for masticatory performance test, such as peanut and raw carrot. The hardness of the wax cube could be modified to simulate a range of test foods by changing mixture ratio of soft and hard paraffin wax. PMID:17302945
Zimmerman, Donald W.; Zumbo, Bruno D.
Educational and psychological testing textbooks typically warn of the inappropriateness of performing arithmetic operations and statistical analysis on percentiles instead of raw scores. This seems inconsistent with the well-established finding that transforming scores to ranks and using nonparametric methods often improves the validity and power…
Harris, Robert V.; King, Stephanie B.
The purpose of this study was to see if a relationship existed between American College Testing (ACT) scores (i.e., English, reading, mathematics, science reasoning, and composite) and student success in a computer applications course at a Mississippi community college. The study showed that while the ACT scores were excellent predictors of…
Generalizability of writing scores has always been a longstanding concern in L2 writing assessment. A number of studies have been conducted to investigate this topic during the last two decades. However, with the introduction of new test methods, such as reading-to-write tasks, generalizability studies need to focus on the score accuracy of…
Bridgeman, Brent; Powers, Donald; Stone, Elizabeth; Mollaun, Pamela
Scores assigned by trained raters and by an automated scoring system (SpeechRater[TM]) on the speaking section of the TOEFL iBT[TM] were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language…