Kane, Thomas J.; Staiger, Douglas O.
By the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. In this paper, the authors focus on accountability programs in which states measure the effectiveness of individual…
Meijer, Rob R.
This book discusses how to obtain test scores and, in particular, how to obtain test scores from tests that consist of a combination of multiple choice and open-ended questions. The strength of the book is that scoring solutions are presented for a diversity of real world scoring problems. (SLD)
Silles, Mary A.
This article, using longitudinal data from the National Child Development Study, presents new evidence on the effects of family size and birth order on test scores and behavioral development at age 7, 11 and 16. Sibling size is shown to have an adverse causal effect on test scores and behavioral development. For any given family size, first-borns…
Della-Piana, Gabriel Mario; Gardner, Michael
Background: Professional standards for validity of achievement tests have long reflected a consensus that validity is the degree to which evidence and theory support interpretations of test scores entailed by the intended uses of tests. Yet there are convincing lines of evidence that the standards are not adequately followed in practice, that…
Hollingsworth, Mary Ann
This study examined the relationship between dimensions of wellness and academic performance for 634 third through fifth grade students in Title One schools in rural Mississippi, using composites of the Five Factor Wellness Inventory for Elementary Children and Reading, Language, and Math Scores of the Mississippi Curriculum Test (a state level…
Recent calls for an increase in educational accountability in K-16 resulted in an uptick of low-stakes testing and, consequently, an increased need for ensuring that students' test scores are reliable and valid representations of their true ability. Focusing on accountability testing in higher education, the current program of research was…
Zhao, Sihai Dave; Li, Yi
Variable screening has emerged as a crucial first step in the analysis of high-throughput data, but existing procedures can be computationally cumbersome, difficult to justify theoretically, or inapplicable to certain types of analyses. Motivated by a high-dimensional censored quantile regression problem in multiple myeloma genomics, this paper makes three contributions. First, we establish a score test-based screening framework, which is widely applicable, extremely computationally efficient, and relatively simple to justify. Secondly, we propose a resampling-based procedure for selecting the number of variables to retain after screening according to the principle of reproducibility. Finally, we propose a new iterative score test screening method which is closely related to sparse regression. In simulations we apply our methods to four different regression models and show that they can outperform existing procedures. We also apply score test screening to an analysis of gene expression data from multiple myeloma patients using a censored quantile regression model to identify high-risk genes. PMID:25124197
Zenisky, April L.; Hambleton, Ronald K.; Sireci, Stephen G.
How a testing agency approaches score reporting can have a significant impact on the perception of that assessment and the usefulness of the information among intended users and stakeholders. Too often, important decisions about reporting test data are left to the end of the test development cycle, but by considering the audience(s) and the kinds…
Questions use of value-added assessment of student achievement to solve problems of accountability. Discusses three problems associated with value-added assessment: (1) limited accuracy of testing to measure student gains; (2) factors other than teacher or school quality possibly attributable to gains; and (3) lack of gain comparators for students…
Dearman, Nancy B.; Plisko, Valena White
Looks at four sources for measuring national student performance: (1) the National Assessment of Educational Progress study of basic skills; (2) competency testing in reading, writing, and arithmetic; (3) college entrance examination scores; and (4) rates of educational attainment by sex, race, ability level, and socioeconomic status. (SK)
Green, Donald Ross
Uses of the variety of scores generated by standardized achievement tests are discussed. Desirable characteristics of scales, raw score scales, percent of correct items, percentile ranks, grade equivalents, normal curve equivalents, and scale scores are considered. The various meanings and purposes of each type of score are discussed. It is…
Paul, Clyde; Rosenkoetter, John
Total scores from a series of classroom examinations compared with the order in which students completed the tests showed a relationship between completion time and test score. The first half of the students to finish scored significantly higher than the last half. (DS)
A User's Guide To BRILLIANT! TEST SCORING AND ITEM ANALYSIS August, 2008 Program Brilliant!: Test ....................................................................................................2 Test Scoring Enhancements.............................................................................................................................................................2 Scoring different test forms
Green, Donald Ross
Explains achievement test scores, focusing on types, uses, meanings, and relative importance. Describes currently used scales, normal distribution curves, percentile ranks, grade equivalents, and other rating systems. Advises inclusion of more than one kind of standardized test score, since each provides different information. Includes three…
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®) . In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. PMID:25773314
Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G.
This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…
van der Linden, Wim J.
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…
Traditionally, the test score represented by the number of items answered correctly was taken as an indicator of the examinee's ability level. Researchers still tend to think that the number-correct score is a way of ordering individuals with respect to the latent trait. The objective of this study is to depict the benefits of using ability…
The paper investigates if the provision of financial incentives has an impact on the performance of students in educational tests. The analysis is based on data from an experiment with high school students who answered multiple?choice items from the Third International Mathematics and Science Study (TIMSS). As in TIMSS, the setup did not discourage students from guessing. Students with a
Weinstein, Lawrence; Laverghetta, Antonio; Alexander, Ralph; Stewart, Megan
The current study is an extension of a previous investigation dealing with teacher greetings to students. The present investigation used teacher greetings with college students and academic performance (test scores). We report data using university students and in-class test performance. Students in introductory psychology who received teachers'…
Current thinking on validity suggests that educational institutions and individuals should evaluate their uses of test scores in the context of their fundamental goals. Regression coefficients and other traditional criterion-related validity statistics provide relevant information, but often do not, by themselves, address the fundamental reasons…
The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…
Kane, Michael T.
To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…
Verret, Erik Phillip
accurately and quickly. . . . Unless the whole program is care- fully planned, there is danger that the scoring of tests will be allowed to drag over a period of several months until the faculty and administration, as well as the students, have , 3 lost... for Develop- ment of Computer Test Grading and Computer Naintained Course Gradebook", p. l. 3 Grossman, Alvin and Howe, Robert L. , Data Processin for Educators, pp. 152-153. 4 Op. cit. , Hedges and Hope, p. 2. Associate Professor of Chemistry. When...
Verret, Erik Phillip
accurately and quickly. . . . Unless the whole program is care- fully planned, there is danger that the scoring of tests will be allowed to drag over a period of several months until the faculty and administration, as well as the students, have , 3 lost... for Develop- ment of Computer Test Grading and Computer Naintained Course Gradebook", p. l. 3 Grossman, Alvin and Howe, Robert L. , Data Processin for Educators, pp. 152-153. 4 Op. cit. , Hedges and Hope, p. 2. Associate Professor of Chemistry. When...
Karsten T. Hansen; James J. Heckman; Kathleen J. Mullen
This paper develops two methods for estimating the e!ect of schooling on achievement test scores that control for the endogeneity of schooling by postulating that both schooling and test scores are generated by a common unobserved latent ability. These methods are applied to data on schooling and test scores. Estimates from the two methods are in close agreement. We ,nd
Karsten T. Hansen; James J. Heckman; K. J. Kathleen J. Mullen
This paper develops two methods for estimating the effect of schooling on achievement test scores that control for the endogeneity of schooling by postulating that both schooling and test scores are generated by a common unobserved latent ability. These methods are applied to data on schooling and test scores. Estimates from the two methods are in close agreement. We find
Karsten T. Hansen; James J. Heckman; Kathleen J. Mullen
This paper develops two methods for estimating the effect of schooling on achievement test scores that control for the endogeneity of schooling by postulating that both schooling and test scores are generated by a common unobserved latent ability. These methods are applied to data on schooling and test scores. Estimates from the two methods are in close agreement. We find
Kuentzel, Jeffrey G.; Hetterscheidt, Lesley A.; Barnett, Douglas
The rigors of standardized testing make for numerous opportunities for examiner error, including simple computational mistakes in scoring. Although experts recommend that test scoring be double-checked, the extent to which independent double-checking would reduce scoring errors is not known. A double-checking procedure was established at a…
Hieronymus, A. N.; Stroud, James B.
Attempts to fill research gap on testing by obtaining comparisons of deviation scores, at grade levels four, seven, and ten, from the California Test of Mental Maturity, Henmon-Nelson Tests, and Lorge-Thorndike Intelligence tests. Results tabulated. (CJ)
Sackett, P R; Wilk, S L
Various forms of score adjustment have been suggested and used when mean differences by gender, race, or ethnicity are found using preemployment tests. This article examines the rationales for score adjustment and describes and compares different forms of score adjustment, including within-group norming, bonus points, separate cutoffs, and banding. It reviews the legal environment for personnel selection and the circumstances leading to the passage of the Civil Rights Act of 1991. It examines score adjustment in the use of cognitive ability tests, personality inventories, interest inventories, scored biographical data, and physical ability tests and outlines the implications for testing practice of various interpretations of the Civil Rights Act of 1991. PMID:7985886
Levine, Michael V.; Rubin, Donald B.
Appropriateness indexes (statistical formulas) for detecting suspiciously high or low scores on aptitude tests were presented, based on a simulation of the Scholastic Aptitude Test (SAT) with 3,000 simulated scores--2,800 normal and 200 suspicious. The traditional index--marginal probability--uses a model for the normal examinee's test-taking…
Wise, Vicki L.; Wise, Steven L.; Bhola, Dennison S.
Accountability for educational quality is a priority at all levels of education. Low-stakes testing is one way to measure the quality of education that students receive and make inferences about what students know and can do. Aggregate test scores from low-stakes testing programs are suspect, however, to the degree that these scores are influenced…
van der Linden, Wim J.
Presents a constrained computerized adaptive testing (CAT) algorithm that can be used to equate CAT number-correct scores to a reference test. Used an item bank from the Law School Admission Test to compare results of the algorithm with those for equipercentile observed-score equating. Discusses advantages of the approach. (SLD)
Feldt, Leonard S.
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
Levine, Michael V.; Rubin, Donald B.
A student may be so unlike other students that his/her aptitude test score fails to be a completely appropriate measure. We consider the problem of using the student's pattern of multiple-choice aptitude test answers to decide whether his/her score is an appropriate ability measure. (Author/CTM)
Schagen, I. P.
A model for the age standardization of test scores is presented, which is fitted to the percentile points of the raw score distribution and assumes a linear trend of each percentile with age. The model's applications in standardizing tests and diagnostic plots produced by a computer program--STANEW--are described. (SLD)
ELIZABETH BURLEIGH; I AN REEVES; C HRISTINE MCALPINE; J AMES DAVIE
Objectives: the abbreviated mental test is widely used in the assessment of cognitive impairment in elderly patients. However, many doctors do not administer the full 10 questions, preferring to estimate the patient's score instead. We have studied the accuracy of doctors in predicting patients' abbreviated mental test scores. Methods: we assessed 102 patients in the geriatric unit. We asked doctors
a course in English 101; nor can I earn test credit for English 102 if I have completed a course in English 102 or ENGL 112. I understandReceiving Credit for English Composition Based on Test Scores Updated 4
Center on Education Policy, 2010
This paper profiles Wisconsin's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in math at grades 4 and 8 and in reading at grade 8. In grade 4 reading, the percentage scoring…
J. R. Crawford; Paul H. Garthwaite
Neuropsychologists often need to estimate the abnormality of an individual patient’s test score, or test score discrepancies, when the normative or control sample against which the patient is compared is modest in size. Crawford and Howell [The Clinical Neuropsychologist 12 (1998) 482] and Crawford et al. [Journal of Clinical and Experimental Neuropsychology 20 (1998) 898] presented methods for obtaining point
It is challenging for parents and the general public to make sense of the reports on test scores that appear in the mass media. This article offers some things for readers to consider as they bring a critical eye to what is read in the papers. Usually reports on test scores in the media are quite short and focus on one or two aspects of test…
Sireci, Stephen G.; Talento-Miller, Eileen
Admissions data and first-year grade point average (GPA) data from 11 graduate management schools were analyzed to evaluate the predictive validity of Graduate Management Admission Test[R] (GMAT[R]) scores and the extent to which predictive validity held across sex and race/ethnicity. The results indicated GMAT verbal and quantitative scores had…
Powers, Donald E.
After adjusting for different background characteristics of students, effects on test scores were related to the length and type of test coaching programs offered. The data suggest that the test item types in the Graduate Record Examination General Test appear to show little susceptibility to formal coaching experiences. (Author/DWH)
There is often little correlation between objective tests of writing or writing components and grading by teachers. The technology that can be applied to student writing evaluation lags behind a reasoned rhetorical explanation of test results. Evaluations of writing are inadequate unless they are interpreted within a rhetorical context that…
Integrating mathematics with family and consumer sciences (FCS) has enabled youth to pass the Minnesota 8th Grade Math Basic Skills test. The test focuses on the eight content areas: (1) problem solving with whole numbers and fractions; (2) problem solving with percentage/ratio; (3) number sense; (4) estimation; 5) measurement; (6) tables and…
This article presents three strategies for teaching students who are taking the IELTS speaking test. The first strategy is aimed at improving confidence and uses a variety of self-help materials from the field of popular psychology. The second encourages students to think critically and invokes a range of academic perspectives. The third strategy…
Feeney, M. Patrick
The study evaluated a distinctive feature scoring technique for List 1 of the California Consonant Test for the purpose of improving test reliability in this test used to identify errors in speech recognition made by adult listeners (N=50) with high frequency sensorineural hearing loss. (DB)
As more colleges move to "test optional" admissions policies, the debate over the utility and interpretation of standardized-test scores continues. In this article, the author interviews Daniel Koretz, a professor of education at Harvard University and author of "Measuring Up: What Educational Testing Really Tells Us". Koretz shares his thoughts…
Center on Education Policy, 2010
This paper profiles Maryland's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased at grades 4 and 8 in both reading and math. Average annual gains were larger on the state test than…
Beatriz U Ramirez (Universidad de Santiago de Chile)
After a sudden increase in most of the individual grades in a multiple-choice test, students were asked to rank the three most relevant factors responsible for this outcome. Among eight others, the availability of a test for self-assessment before the final test was by far the most frequently mentioned (82.4% of the students). Questions applied during different course activities did not have the same effect on student scores as the "online" self-assessment test.
This article argues that so-called 'objective', scientifically 'valid and reliable' tests of aptitude such as ASAT (Australian Scholastic Apti tude Test, used in Queensland as the scaling mechanism for pro ducing a state-wide ranked order of student merit prior to the allocation of tertiary entrance scores), in fact operate to reinforce existing biases in the education system. Drawing on an
Brown, Sarah Lee
The researcher interviewed two groups of eleventh grade students, in a rural Appalachian setting, who tended to score low on the state mandated high stakes/low stakes test to discover their efforts on the test, specifically in reading, and to obtain their opinions concerning the effects of a specific incentive or consequence. Before the eleventh…
Wise, Steven L.
Whenever the purpose of measurement is to inform an inference about a student's achievement level, it is important that we be able to trust that the student's test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student's test event is not unduly influenced by construct-irrelevant factors…
A widely held view is that good schools are essential to a nation's international economic success and that high test scores on international tests of academic skills and knowledge indicate how good a nation's schools are. The widespread belief that good schools are an important contributor to a nation's economic success in the world is supported…
Muller, Jorg M.
A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…
Eldred-Skemp, Nicolia; Quinn, James W.; Chang, Hsin-wen; Rauh, Virginia A.; Rundle, Andrew; Orjuela, Manuela A.; Perera, Frederica P.
Childhood cognitive and test-taking abilities have long-term implications for educational achievement and health, and may be influenced by household environmental exposures and neighborhood contexts. This study evaluates whether age 5 scores on the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R, administered in English) are associated with polycyclic aromatic hydrocarbon (PAH) exposure and neighborhood context variables including poverty, low educational attainment, low English language proficiency, and inadequate plumbing. The Columbia Center for Children’s Environmental Health enrolled African-American and Dominican-American New York City women during pregnancy, and conducted follow-up for subsequent childhood health outcomes including cognitive test scores. Individual outcomes were linked to data characterizing 1-km network buffers around prenatal addresses, home observations, interviews, and prenatal PAH exposure data from personal air monitors. Prenatal PAH exposure above the median predicted 3.5 point lower total WPPSI-R scores and 3.9 point lower verbal scores; the association was similar in magnitude across models with adjustments for neighborhood characteristics. Neighborhood-level low English proficiency was independently associated with 2.3 point lower mean total WPPSI-R score, 1.2 point lower verbal score, and 2.7 point lower performance score per standard deviation. Low neighborhood-level educational attainment was also associated with 2.0 point lower performance scores. In models examining effect modification, neighborhood associations were similar or diminished among the high PAH exposure group, as compared with the low PAH exposure group. Early life exposure to personal PAH exposure or selected neighborhood-level social contexts may predict lower cognitive test scores. However, these results may reflect limited geographic exposure variation and limited generalizability. PMID:24994947
Bishop, N. Scott
This study examined the effects of different test administration conditions on reading comprehension test scores. Evidence of performance differences across district testing conditions might imply that the meanings and interpretations associated with the corresponding test scores have limited generalizability (i.e., knowing how well one reads…
Brennan, Robert L.
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
Report on Education Research, 1983
THE FOLLOWING IS THE FULL TEXT OF THIS DOCUMENT: A new study by a pair of Harvard University researchers discounts earlier findings that coaching can substantially improve student performance on the Scholastic Aptitude Test (SAT). "There is simply insufficient evidence that large score increases are a result of a coaching program," write Rebecca…
van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas
This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Holland, Paul W.; Thayer, Dorothy T.
Applied the theory of exponential families of distributions to the problem of fitting the univariate histograms and discrete bivariate frequency distributions that often arise in the analysis of test scores. Considers efficient computation of the maximum likelihood estimates of the parameters using Newton's Method and computationally efficient…
van der Linden, Wim J.; Wiberg, Marie
For traditional methods of observed-score equating with anchor-test designs, such as chain and poststratification equating, it is difficult to satisfy the criteria of equity and population invariance. Their equatings are therefore likely to be biased. The bias in these methods was evaluated against a simple local equating method in which the…
Grissom, Jason A.; Kalogrides, Demetra; Loeb, Susanna
Expansion of the use of student test score data to measure teacher performance has fueled recent policy interest in using those data to measure the effects of school administrators as well. However, little research has considered the capacity of student performance data to uncover principal effects. Filling this gap, this article identifies…
Marder, M.; Bansal, D.
We apply visualization and modeling methods for convective and diffusive flows to public school mathematics test scores from Texas. We obtain plots that show the most likely future and past scores of students, the effects of random processes such as guessing, and the rate at which students appear in and disappear from schools. We show that student outcomes depend strongly upon economic class, and identify the grade levels where flows of different groups diverge most strongly. Changing the effectiveness of instruction in one grade naturally leads to strongly nonlinear effects on student outcomes in subsequent grades. PMID:19805049
This paper by Stephen P. Klein, et al., was at the center of the Presidential campaign last week as Al Gore seized on its conclusion that the great disparity in Texas between student scores on state (Texas Assessment of Academic Skills) vs. federal (NAEP) tests suggested that the improvements claimed by Governor Bush in the state's education system were in fact inflated, possibly due to a policy of teachers teaching to the Texas tests.
Helm, Denise Muesch
The principal goal of acceptance criteria is to select candi-dates who will graduate and transition into professional practice. However, in an attempt to increase the diversity of their student populations, educators are anxious to make changes to the traditional acceptance criteria, such as standardized test scores. Yet data indicate that standardized testing biases against certain populations of students (i.e., female, culturally diverse, and those from lower socioeconomic backgrounds). Fairer assessment measures should continue to be sought. PMID:18847114
Wu, Brad C.
The additive and response patterns scoring methods within and between multiple true-false (MTF) items were examined using data for 5,000 students for each of 2 years from the mathematics portion of the national college entrance examination in Taiwan. For additive scoring at item level, response to each option was scored dichotomously and added up…
Chafetz, Michael D; Matthews, Lee H
A New interference calculation method for the Stroop test was developed based upon a neuropsychological model of the suppression of word reading in favor of color naming. Polynomial regression equations show a significant relationship between word reading and the New interference score that closely fits the underlying prediction of the New model, while the Golden [Stroop Color and Word Test, Stoelting Co., IL, Wood Dale, 1978] model (Old) produces only a random relationship. Constructs of developmental maturation and lateralized brain damage are supported by the New but not the Old method. The New compared to the Old method also gives a significant reduction in scores in a small sample of demented patients. It would be advisable to use this New model in both cognitive and neuropsychological comparisons of different lesions or different stimulus and response demands. The New model will also help promote finer clinical inferences when an understanding relative to the patient's own baselines is necessary. PMID:15163456
Martin, John D.; And Others
The relationship between Elizur's Hostility Scoring on the Rorschach Test and the Acting-Out Score on the Hand Test was examined. Correlations between the two measures (using several scoring procedures) ranged from .40 to .64. (JKS)
Hatcher, Donald L.
In this article, after describing one approach for teaching critical thinking (CT) that was in place at Baker University from 1990 to 2008, the author describes their experience assessing CT using three standardized exams and shows why the choice of a standardized CT test can be problematic and the results misleading. These results can be…
Wang, Ting; Merkle, Edgar C.; Zeileis, Achim
In this paper, we consider a family of recently-proposed measurement invariance tests that are based on the scores of a fitted model. This family can be used to test for measurement invariance w.r.t. a continuous auxiliary variable, without pre-specification of subgroups. Moreover, the family can be used when one wishes to test for measurement invariance w.r.t. an ordinal auxiliary variable, yielding test statistics that are sensitive to violations that are monotonically related to the ordinal variable (and less sensitive to non-monotonic violations). The paper is specifically aimed at potential users of the tests who may wish to know (1) how the tests can be employed for their data, and (2) whether the tests can accurately identify specific models parameters that violate measurement invariance (possibly in the presence of model misspecification). After providing an overview of the tests, we illustrate their general use via the R packages lavaan and strucchange. We then describe two novel simulations that provide evidence of the tests' practical abilities. As a whole, the paper provides researchers with the tools and knowledge needed to apply these tests to general measurement invariance scenarios. PMID:24936190
Friedman, A F; Wakefield, J A; Sasek, J; Schroeder, D
A new scoring procedure to be used with Spraings' technique for administering the Bender-Gestalt test in a multiple choice format is presented. Scoring weights are used instead of simply scoring each item right or wrong. The evidence presented suggests that this method of scoring would increase the value of Spraings' test in the diagnosis of perceptual deficits. PMID:833302
Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J.
Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…
This study examined the relationship between teacher education students' scores on basic skills admission tests and graduating seniors' scores on the National Teacher Examinations (NTE) at Eastern Kentucky University. The 1981-82 basic skills test scores for 262 teacher education students were compared with their NTE scores taken in 1984-85 during…
Alpay, S. Pamir
Step 2: Click on the test title Step 3: Click on the test score Step 1: Click on "My Grades test results in HuskyCT Instructors apply settings that determine the extent of the feedback that students see after taking a test in HuskyCT and when that information becomes available. Minimal
To determine usefulness of current and previous test-day somatic cell score (SCS) in predicting test-day milk yield, test-day records from Holstein first and second calvings between 1995 and 2002 were examined. Initial selection required that cows have at least the first four test days with recorde...
Zimmerman, Donald W.
Results of this study indicate that the correlation between half-test scores over repeated splits, over persons, and over repeated testings resulting in different sets of observed scores, is given by Kuder-Richardson Formula 21. (RF)
Meijer, Rob R.
Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…
Carr, Nathan T.; Xi, Xiaoming
This article examines how the use of automated scoring procedures for short-answer reading tasks can affect the constructs being assessed. In particular, it highlights ways in which the development of scoring algorithms intended to apply the criteria used by human raters can lead test developers to reexamine and even refine the constructs they…
Branberg, Kenny; And Others
Effects of sex, education, and age on total test score on the Swedish Scholastic Aptitude Test, a college entrance examination, are studied using applicants aged over 25 with 1 to 4 years' work experience. About 10,000 applicants have taken the test annually since 1977. Genuine differences appear in each variable studied. (SLD)
Zenisky, April L.; Hambleton, Ronald K.
Test scores matter these days. Test-takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees' information or usability needs, but this is clearly changing…
Lin, Miao-Hsiang; Hsiung, Chao A.
Two simple empirical approximate Bayes estimators are introduced for estimating domain scores under binomial and hypergeometric distributions respectively. Criteria are established regarding use of these functions over maximum likelihood estimation counterparts. (SLD)
Hernandez, Barbara L. Michiels; Ward, Susan; Strickland, George
Legislative mandates and reforms hold universities accountable for student certification test performance. The purpose of this investigation was to determine if cumulative grade point average scores and the preprofessional academic skills test scores predict performance on elementary certification test (professional development) scores of…
Rich, John D., Jr.; Fullard, William; Overton, Willis
One Hundred and Twelve Latino students from Philadelphia participated in this study, which examined the development of deductive reasoning across adolescence, and the relation of reasoning to test anxiety and standardized test scores. As predicted, 11th and ninth graders demonstrated significantly more advanced reasoning than seventh graders.…
Feldt, Leonard S.
Develops formulas to cope with the situation in which the reliability of test scores must be approximated even though no examinee has taken the complete instrument. Develops different estimators for part tests that are judged to be classically parallel, tau-equivalent, or congeneric. Proposes standards for differentiating among these three models.…
Pankratz, Mary; Morrison, Andrea; Plante, Elena
Differences in the standard scores for the Peabody Picture Vocabulary Test-Revised (PPVT-R; L. M. Dunn & L. M. Dunn, 1981) and the PPVT-Third Edition (PPVT-III; Dunn & Dunn, 1997b) are known to exist for children, with typically higher scores occurring on the PPVT-III. However, these tests are administered into adulthood as well, and score…
Matton, Nadine; Vautier, Stephane; Raufaste, Eric
Mean gain scores for cognitive ability tests between two sessions in a selection setting are now a robust finding, yet not fully understood. Many authors do not attribute such gain scores to an increase in the target abilities. Our approach consists of testing a longitudinal SEM model suitable to this view. We propose to model the scores' changes…
Journal of Blacks in Higher Education, 2003
Discusses the racial scoring gap on tests for admission to medical, business, law, and other graduate programs, noting that in the highest-scoring brackets on the Medical College Admission Test (MCAT), the racial gap is even larger. Whites are five times, twelve times, and seven times more likely, respectively, to score higher on the MCAT, Law…
Pope, Gregory A.; Wentzel, Carolyn; Braden, Brigitta; Anderson, Jordan
The purpose of this study was to investigate statistical relationships between gender and Alberta Achievement Testing Program scores. Achievement test scores from grades 3, 6, and 9 in all subject areas were investigated during a four-year period. Results showed statistically significant positive correlations between gender and scores in most…
Niu, Sunny X.; Tienda, Marta
Using administrative data for five Texas universities that differ in selectivity, this study evaluates the relative influence of two key indicators for college success—high school class rank and standardized tests. Empirical results show that class rank is the superior predictor of college performance and that test score advantages do not insulate lower ranked students from academic underperformance. Using the UT-Austin campus as a test case, we conduct a simulation to evaluate the consequences of capping students admitted automatically using both achievement metrics. We find that using class rank to cap the number of students eligible for automatic admission would have roughly uniform impacts across high schools, but imposing a minimum test score threshold on all students would have highly unequal consequences by greatly reduce the admission eligibility of the highest performing students who attend poor high schools while not jeopardizing admissibility of students who attend affluent high schools. We discuss the implications of the Texas admissions experiment for higher education in Europe. PMID:23788828
Niu, Sunny X; Tienda, Marta
Using administrative data for five Texas universities that differ in selectivity, this study evaluates the relative influence of two key indicators for college success-high school class rank and standardized tests. Empirical results show that class rank is the superior predictor of college performance and that test score advantages do not insulate lower ranked students from academic underperformance. Using the UT-Austin campus as a test case, we conduct a simulation to evaluate the consequences of capping students admitted automatically using both achievement metrics. We find that using class rank to cap the number of students eligible for automatic admission would have roughly uniform impacts across high schools, but imposing a minimum test score threshold on all students would have highly unequal consequences by greatly reduce the admission eligibility of the highest performing students who attend poor high schools while not jeopardizing admissibility of students who attend affluent high schools. We discuss the implications of the Texas admissions experiment for higher education in Europe. PMID:23788828
Hageman, Barbara H.; Sigman, Clayton B.; Koslosky, John T.
A Test/Score/Report capability is currently being developed for the Transportable Payload Operations Control Center (TPOCC) Advanced Spacecraft Simulator (TASS) system which will automate testing of the Goddard Space Flight Center (GSFC) Payload Operations Control Center (POCC) and Mission Operations Center (MOC) software in three areas: telemetry decommutation, spacecraft command processing, and spacecraft memory load and dump processing. Automated computer control of the acceptance test process is one of the primary goals of a test team. With the proper simulation tools and user interface, the task of acceptance testing, regression testing, and repeatability of specific test procedures of a ground data system can be a simpler task. Ideally, the goal for complete automation would be to plug the operational deliverable into the simulator, press the start button, execute the test procedure, accumulate and analyze the data, score the results, and report the results to the test team along with a go/no recommendation to the test team. In practice, this may not be possible because of inadequate test tools, pressures of schedules, limited resources, etc. Most tests are accomplished using a certain degree of automation and test procedures that are labor intensive. This paper discusses some simulation techniques that can improve the automation of the test process. The TASS system tests the POCC/MOC software and provides a score based on the test results. The TASS system displays statistics on the success of the POCC/MOC system processing in each of the three areas as well as event messages pertaining to the Test/Score/Report processing. The TASS system also provides formatted reports documenting each step performed during the tests and the results of each step. A prototype of the Test/Score/Report capability is available and currently being used to test some POCC/MOC software deliveries. When this capability is fully operational it should greatly reduce the time necessary to test a POCC/MOC software delivery, as well as improve the quality of the test process.
. Malatesha Joshi Dennie Smith Bruce Thompson Head of Department, Dennie Smith December 2009 Major Subject: Curriculum and Instruction iii ABSTRACT The Effects of Handwriting, Spelling, and T-units on Holistic Scoring with Implications... is a life-long learner v ACKNOWLEDGEMENTS I would like to thank my committee chair, Dr. Mark Sadoski, and my committee members, Dr. M. Joshi, Dr. Bruce Thompson, and Dr. Dennie Smith, for serving on my committee. I want to extend my gratitude...
The relationship between selected standardized test scores and performance in advanced placement math and science exams: Analyzing the differential effectiveness of scores for course identification and placement
Urbina, Josue N.
There is a national need to increase the STEM-related workforce. Among factors leading towards STEM careers include the number of advanced high school mathematics and science courses students complete. Florida's enrollment patterns in STEM-related Advanced Placement (AP) courses, however, reveal that only a small percentage of students enroll into these classes. Therefore, screening tools are needed to find more students for these courses, who are academically ready, yet have not been identified. The purpose of this study was to investigate the extent to which scores from a national standardized test, Preliminary Scholastic Assessment Test/ National Merit Qualifying Test (PSAT/NMSQT), in conjunction with and compared to a state-mandated standardized test, Florida Comprehensive Assessment Test (FCAT), are related to selected AP exam performance in Seminole County Public Schools. An ex post facto correlational study was conducted using 6,189 student records from the 2010 - 2012 academic years. Multiple regression analyses using simultaneous Full Model testing showed differential moderate to strong relationships between scores in eight of the nine AP courses (i.e., Biology, Environmental Science, Chemistry, Physics B, Physics C Electrical, Physics C Mechanical, Statistics, Calculus AB and BC) examined. For example, the significant unique contribution to overall variance in AP scores was a linear combination of PSAT Math (M), Critical Reading (CR) and FCAT Reading (R) for Biology and Environmental Science. Moderate relationships for Chemistry included a linear combination of PSAT M, W (Writing) and FCAT M; a combination of FCAT M and PSAT M was most significantly associated with Calculus AB performance. These findings have implications for both research and practice. FCAT scores, in conjunction with PSAT scores, can potentially be used for specific STEM-related AP courses, as part of a systematic approach towards AP course identification and placement. For courses with moderate to strong relationships, validation studies and development of expectancy tables, which estimate the probability of successful performance on these AP exams, are recommended. Also, findings established a need to examine other related research issues including, but not limited to, extensive longitudinal studies and analyses of other available or prospective standardized test scores.
Stocking, Martha; And Others
For two tests measuring the same trait, the program, BIV20, equates the scores using the two True score distributions estimated by the univariate method 20 program (see Wingersky, Lees, Lennon, and Lord, 1969) and, with these equated true scores and their distributions, estimates the bivariate distribution scores and the relative efficiency of the…
Dorans, Neil J.; Moses, Tim P.; Eignor, Daniel R.
Score equating is essential for any testing program that continually produces new editions of a test and for which the expectation is that scores from these editions have the same meaning over time. Particularly in testing programs that help make high-stakes decisions, it is extremely important that test equating be done carefully and accurately.…
Blackburn, McKinley L.
Previous research has suggested that skills reflected in test-score performance on tests such as the Armed Forces Qualification Test (AFQT) can account for some of the racial differences in average wages. I use a more complete set of test scores available with the National Longitudinal Survey of Youth 1979 Cohort to reconsider this evidence, and…
van der Linden, Wim J.
A constrained computerized adaptive testing (CAT) algorithm is presented that automatically equates the number-correct scores on adaptive tests. The algorithm can be used to equate number-correct scores across different administrations of the same adaptive test as well as to an external reference test. The constraints are derived from a set of…
The purpose of this study was to determine if there was a difference in Tennessee Comprehensive Assessment Program Modified Academic Achievement Standards (TCAP MAAS) achievement test scores for special education students who receive their instruction in the resource classroom or in an inclusion classroom. The study involved third, fourth, and…
PhD Harry R. Goldberg (Johns Hopkins University Zanvyl Krieger Mind/Brain Institute and Department of Biology)
A study conducted that shows that students learn well and score higher on exams in a "Virtual Learning Environment" where the students are presented the same material that is traditionally presented in lecture.
Over the past five years, both DC Public Schools (DCPS) and public charter schools (PCS) have seen significant growth in secondary reading and math scores on the state test known as the District of Columbia Comprehensive Assessment System (DC CAS). However, scores have not improved as much at the elementary level. Reading and math scores for DCPS…
This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…
Smith, Richard M.; Mitchell, Virginia P.
To improve the accuracy of college placement, Rasch scoring and person-fit statistics on the Comparative Guidance and Placement test (CGP) was compared to the traditional right-only scoring. Correlations were calculated between English and mathematics course grades and scores of 1,448 entering freshmen on the reading, writing, and mathematics…
Gavett, Brandon E
The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed. PMID:25784058
Das, Jishnu; Dercon, Stefan; Habyarimana, James; Krishnan, Pramila; Muralidharan, Karthik; Sundararaman, Venkatesh
Empirical studies of the relationship between school inputs and test scores typically do not account for the fact that households will respond to changes in school inputs. We present a dynamic household optimization model relating test scores to school and household inputs, and test its predictions in two very different low-income country…
Klesch, Heather S.
The reporting of scores on educational tests is at times misunderstood, misinterpreted, and potentially confusing to examinees and other stakeholders who may need to interpret test scores. In reporting test results to examinees, there is a need for clarity in the message communicated. As pressure rises for students to demonstrate performance at a…
K. Das; B. C. Sutradhar
A general nonlinear regression model for repeated measures data is considered. Neyman?s  partial score tests are derived for the significance of regression parameters as well as overdispersion components of the model. Neyman?s score test is asymptotically locally optimal, and the test statistic has asymptotically ?2 distribution under the null hypothesis, with m degrees of freedom, where m is the
Sandal, Gro M.; Musson, Dave; Helmreich, Robert. L.; Gravdal, Lene
The assessment of personality is recognized by space agencies as an approach to identify candidates likely to perform optimally during spaceflights. In the use of personality scales for selection, the impact of social desirability (SD) has been cited as a concern. Study 1 addressed the impact of SD on responses to the Personality Characteristic Inventory (PCI) and NEO-FFI. This was achieved by contrasting scores from active astronauts (N=65) with scores of successful astronaut applicants (N=63), and between pilots applicants (N=1271) and pilot research subjects (N=120). Secondly, personality scores were correlated with scores on the Marlow Crown Social Desirability Scale among applicants to managerial positions (N=120). The results indicated that SD inflated scores on PCI scales assessing negative interpersonal characteristics, and impacted on four of five scales in NEO-FFI. Still, the effect sizes were small or moderate. Study 2 addressed performance implications of SD during an assessment of males applying to work as rescue personnel operations in the North Sea (N=22). The results showed that SD correlated negatively with cognitive test performance, and positively with discrepancy in performance ratings between self and two observers. In conclusion, caution is needed in interpreting personality scores in applicant populations. SD may be a negative predictor for performance under stress.
Cantrell, D. Dean
Fear of poor teaching and low national test scores have spawned a back to basics movement and a shift from the use of tests as predictors and models to that of assessment and achievement. This movement may have positive impact on the teaching of English, which previously has not lent itself well to standardized testing. Although many English…
Cope, Ronald T.; Kolen, Michael J.
This study compared five density estimation techniques applied to samples from a population of 272,244 examinees' ACT English Usage and Mathematics Usage raw scores. Unsmoothed frequencies, kernel method, negative hypergeometric, four-parameter beta compound binomial, and Cureton-Tukey methods were applied to 500 replications of random samples of…
We apply a quantile version of the Oaxaca-Blinder decomposition to estimate the counterfactual distribution of the test scores of Black students. In the Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999 (ECLS-K), we find that the gap initially appears only at the top of the distribution of test scores. As children age, however,…
Mertler, Craig A.
This book is designed to help K-12 teachers and administrators understand the nature of standardized tests and, in particular, the scores that result from them. This useful manual helps teachers develop the skills necessary to incorporate these test scores into various types of instructional decision making--a process known as "data-driven…
Increasing standardized test scores in reading and math is of high importance to the California Department of Education to meet requirements mandated by the No Child Left Behind (NCLB) act of 2001. More research is needed to understand the best ways to improve tests scores to meet concerns of the NCLB act. The purpose of the study was to evaluate…
There are many reasons to align tests with curricular standards, but this alignment is not sufficient to protect against score inflation. This report explains the relationship between alignment and score inflation by clarifying what is meant by inappropriate test preparation. It provides a concrete, hypothetical example that illustrates a process…
Ramos, Erica; Alfonso, Vincent C.; Schermerhorn, Susan M.
The interpretation of cognitive test scores often leads to decisions concerning the diagnosis, educational placement, and types of interventions used for children. Therefore, it is important that practitioners administer and score cognitive tests without error. This study assesses the frequency and types of examiner errors that occur during the…
Lord, Frederic M.
Given any observed number-right score on a test, a method is described for obtaining a predicition interval for the corresponding number-right score on a randomly parallel form of the same test. The interval can be written down directly from published tables of the hypergeometric distribution. (Author)
Eleonora Patacchini; Yves Zenou
We investigate the racial gap in test scores between white and non-white students in Britain both in levels and differences across the school years. We find that there is a substantial racial gap in test scores, especially between ages 7 and 11, and a less severe one between ages 11 and 16. It thus seems that nonwhites are losing ground
Roland G. Fryer Jr; Steven D. Levitt
In previous research, a substantial gap in test scores between white and black students persists, even after controlling for a wide range of observable characteristics. Using a newly available data set (the Early Childhood Longitudinal Study), we demonstrate that in stark contrast to earlier studies, the black-white test score gap among incoming kindergartners disappears when we control for a small
Roland G. Fryer Jr.; Steven D. Levitt
In previous research, a substantial gap in test scores between White and Black students persists, even after controlling for a wide range of observable characteristics. Using a newly available data set (Early Childhood Longitudinal Study), we demonstrate that in stark contrast to earlier studies, the Black-White test score gap among incoming kindergartners disappears when we control for a small number
Lockwood, J. R.; McCaffrey, Daniel F.
A common strategy for estimating treatment effects in observational studies using individual student-level data is analysis of covariance (ANCOVA) or hierarchical variants of it, in which outcomes (often standardized test scores) are regressed on pretreatment test scores, other student characteristics, and treatment group indicators. Measurement…
Berends, Mark; Penaloza, Roberto V.
Background/Context: Although there has been progress in closing the test score gaps among student groups over past decades, that progress has stalled. Many researchers have speculated why the test score gaps closed between the early 1970s and the early 1990s, but only a few have been able to empirically study how changes in school factors and…
Cascallar, Alicia S.; Dorans, Neil J.
This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…
Bergeron, Renee; Floyd, Randy G.
This study examined the group- and individual-level part score profiles of children with intellectual disability (ID) who participated in clinical validity studies supporting three individually administered intelligence tests. Across tests, children with ID produced group-level profiles comprising mean part scores that fell in the Low to Very Low…
Chen, Shiu-Sheng; Luoh, Ming-Ching
Using data from the Programme for International Student Assessment (PISA) and the Trends in International Mathematics and Science Study (TIMSS), we investigate the link between test scores (mathematics and science) and cross-country income differences. We would like to know whether test scores are good indicators of labor-force quality. The…
Pellicer-Sanchez, Ana; Schmitt, Norbert
Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there…
Strand, Steve; Deary, Ian J.; Smith, Pauline
Background and aims: There is uncertainty about the extent or even existence of sex differences in the mean and variability of reasoning test scores ( Jensen, 1998; Lynn, 1994, ; Mackintosh, 1996). This paper analyses the Cognitive Abilities Test (CAT) scores of a large and representative sample of UK pupils to determine the extent of any sex…
Cech, Scott J.
More students are taking Advanced Placement tests, but the proportion of tests receiving what is deemed a passing score has dipped, and the mean score is down for the fourth year in a row. Data released here this week by the New York City-based nonprofit organization that owns the AP brand shows that a greater-than-ever proportion of students…
Leslie Rescorla; Adena S. Rosenthal
Growth in Test of Cognitive Skills (TCS) scores and Comprehensive Tests of Basic Skills (CTBS) reading, math, and total achievement scores from 3rd to 10th grade was studied in 328 public school students in a middle-class suburban community. Surprisingly, groups differing in ability and achievement in 3rd grade made parallel progress over time, and some \\
Tamerah N. Hunt; Michael S. Ferrara; L. Stephen Miller; Stephen Macciocchi
ObjectivePoor effort on baseline neuropsychological tests is expected to influence interpretation of post-concussion assessment scores. Our study examined effort in an athletic population to determine if poor effort effects neuropsychological test performance.
Ginther, April; Dimova, Slobodanka; Yang, Rui
Information provided by examination of the skills that underlie holistic scores can be used not only as supporting evidence for the validity of inferences associated with performance tests but also as a way to improve the scoring rubrics, descriptors, and benchmarks associated with scoring scales. As fluency is considered a critical, perhaps…
Lowe, Patricia A.; Papanastasiou, Elena C.; DeRuyck, Kimberly A.; Reynolds, Cecil R.
In this study, the authors investigated the temporal stability and construct validity of the Adult Manifest Anxiety Scale-College Version (AMAS-C; C. R. Reynolds, B. O. Richmond, & P. A. Lowe, 2003b) scores. Results indicated that the AMAS-C scores had adequate to excellent test score stability, and evidence supported the construct validity of the…
Berson, Barry L.
The purpose of this memo is to present tests that comprise the test battery used to select Navy personnel to train marine mammals, and to describe the scoring procedures of the tests. The test battery consists of: Biosystems General Information Test (BGIT), Personnel History Questionnaire (PHQ), Gordon Personal Inventory, Gordon Personal Profile,…
The educational implications of criterion-referenced tests are demonstrated. It is the hypothesis of the author that criterion-referenced tests have little educational impact unless carefully constructed around rigorous domain specifications. The paper details the process and problems of construction of a series of history tests presently being…
Marco, Gary L.; And Others
Data from the verbal portion of the College Entrance Examination Board Scholastic Aptitude Tests were used in an experimental test of the accuracy of equating for a variety of models in three categories: linear equating, equipercentile equating, and item characteristic curve equating. The models were tested for both mean squared error and bias.…
Zimmerman, Donald W.; Zumbo, Bruno D.
Educational and psychological testing textbooks typically warn of the inappropriateness of performing arithmetic operations and statistical analysis on percentiles instead of raw scores. This seems inconsistent with the well-established finding that transforming scores to ranks and using nonparametric methods often improves the validity and power…
Peng, Pai; Hochweber, Jan; Klieme, Eckhard
Outcome-oriented evaluation of school effectiveness is often based on student test scores in certain critical examinations. This study provides another method of evaluation--value-added--which is based on student achievement progress. This paper introduces the method of estimating the value-added score of schools in multi-level models. Based on…
Binqing Q. Wei; Walter A. Baase; Larry H. Weaver; Brian W. Matthews; Brian K. Shoichet
Prediction of interaction energies between ligands and their receptors remains a major challenge for structure-based inhibitor discovery. Much effort has been devoted to developing scoring schemes that can successfully rank the affinities of a diverse set of possible ligands to a binding site for which the structure is known. To test these scoring functions, well-characterized experimental systems can be very
Wilcox, Rand R.
A procedure is described for determining the minimal length of a mastery test given certain constraints. The procedure assumes that the testor is indifferent to misclassifying some testees who score within a specified range about the passing score. An example and table are provided. (JKS)
Dashfield, A. K.; Lambert, A. W.; Campbell, J. K.; Wilkins, D. C.
We have investigated the correlation between the scores attained on a computerised psychometric test, measuring psychomotor aptitude and learning tying of a surgical reef knot. Fifteen surgical trainees performed a test of psychomotor aptitude (ADTRACK 2) from the MICROPAT testing system. They then performed a simple test of their ability to tie a surgical reef knot and were assessed by a panel of experts prior to embarking on a standardised course of instruction and practice session. The knot-tying test was repeated at the end of the day and the differences in average scores recorded. There was a significant correlation between the means of the differences in knot tying scores and ADTRACK 2 scores (r = -0.533, P < 0.05). Psychomotor abilities appear to be determinants of trainees' initial proficiency in learning to tie a surgical reef knot. PMID:11320926
Elena C. Papanastasiou; Mark D. Reckase
Because of the increased popularity of computerized adaptive testing (CAT), many admissions tests, as well as certification and licensure examinations, have been transformed from their paper-and-pencil versions to computerized adaptive versions. A major difference between paper-and-pencil tests and CAT from an examinee's point of view is that in many cases examinees are not allowed to revise their answers on CAT.
Thompson, Andrew; Wennike, Nic
The Royal College of Physicians' Acute care toolkit 10 has recommended the use of the AMB score as an aid to determining patients suitable for ambulatory care. As this score has only been previously validated in one centre, the present study calculated the score of 200 patients referred to the medical take to see if it successfully identified patients who had a length of stay of less than 12 hours. In our test centre, the score was found to have a reduced sensitivity compared with the original centre (88 vs 96%) and a positive predictive value of 39%. Therefore in our hospital this was not a useful scoring system, and other trusts need to be aware that the AMB score may not be as effective as the original study suggested. PMID:26031968
Paul, Clyde A.; Rosenkoetter, John
Questions whether higher achieving students complete an achievement test faster than lower achievers, reviews relevant research, and poses various hypotheses regarding test completion and achievement. Findings indicated that perhaps there is a tendency for better students to finish early on tests but that the correlation between test score and…
With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
Kim, Sooyeon; Moses, Tim
The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…
No one can dispute that tests should measure important content, and for many (but not all) purposes, tests should be aligned with curricular goals. Thus in many cases, alignment is clearly better than the alternative, and nothing that follows here argues otherwise. Unfortunately, however, this does not imply that alignment is sufficient protection…
Jancarík, Antonín; Kostelecká, Yvona
Electronic testing has become a regular part of online courses. Most learning management systems offer a wide range of tools that can be used in electronic tests. With respect to time demands, the most efficient tools are those that allow automatic assessment. The presented paper focuses on one of these tools: matching questions in which one…
This study investigated the effect of targeted test preparation, or coaching, on oral English as a second language test scores. The tests in question were the Basic English Skills Test Plus (BEST Plus), a scripted oral interview published by the Center for Applied Linguistics, and the Versant English Test (VET), a computer-administered and…
... minutes a night linked to lower performance in math, science To use the sharing features on this ... about their homework habits, and their performance in math and science was assessed using a standardized test. ...
Creighton, Susan Dabney
There is no consensus regarding the most reliable and valid scoring methods for the assessment of higher order thinking skills. Most of the research on alternative formats has focused on the scoring of writing ability. This study examined the value of different types of performance assessment scoring guides on state mandated science and social studies tests. A proportional stratified sample of raters were randomly assigned to one of four scoring groups: checklist, analytic rubric, holistic rubric, and generic rubrics. A fifth method, the weighted analytic rubric, was included by applying an algorithmic formula to the scores assigned by raters using the analytic rubric. A comparison of the mean scores for the five scoring groups suggests that there may be a difference in the way raters applied the rubric for each group. Although the literature suggests that it is possible to achieve high levels of inter-rater reliability, across forms of scoring, phi coefficients of moderate strength were obtained for three of the four constructed-response items. Results for each scoring group were compared indicating that item complexity may impact the level of inter-rate, reliability and the selection of the most reliable rubric for each discipline. Analytic rubrics appear to achieve more reliable results with less complex items. A multitrait-multimethod approach was utilized to investigate the external validity of the social studies and science tasks. As expected, there tended to be a stronger association between the PACT science constructed-response scores with scores based on science multiple-choice scores than between the science constructed-response scores and the writing ability subtest scores. A similar pattern was seen with social studies items. These results provide some evidence for the validity of the performance assessments. A post study survey completed by raters provided qualitative information regarding their thought processes and their primary focus during the scoring process. An analysis of this data suggests that raters using alternative rubrics may have employed different strategies to score student responses.
Looney, Marilyn A.; Gilbert, Jennie
The purpose of the study was to determine if currently used FITNESSGRAM[R] cut-off scores for the Back Saver Sit and Reach Test had the best criterion-referenced validity evidence for 6-12 year old children. Secondary analyses of an existing data set focused on the passive straight leg raise and Back Saver Sit and Reach Test flexibility scores of…
Hetzler, Ronald K.; Stickley, Christopher D.; Kimura, Iris F.
In this study, we developed allometric exponents for scaling Wingate anaerobic test (WAnT) power data that are reflective in controlling for body mass (BM) and lean body mass (LBM) and established a normative WAnT data set for college-age women. One hundred women completed a standard WAnT. Allometric exponents and percentile ranks for peak (PP)…
Watson, Charles G.; Klett, William G.
In a search for an adequate but efficient substitute, the authors have instituted three evaluations of the relationships between potential WAIS-substitutes and the WAIS itself. The present report describes the first of these researches-- a study of the relationships between the four group ability tests and the WAIS in a mental hospital setting.…
Bartosh, Oksana; Tudor, Margaret; Ferguson, Lynne; Taylor, Catherine
The present research investigated the impact of environmental education (EE) programs on student achievement in math, reading, and writing by comparing student performances on two standardized tests for environmental education schools and schools with traditional curriculum. Quantitative analysis was used to evaluate the impact of the EE programs.…
Armor, David J.; Duck, Stephanie
Recent studies have used increasingly complex methodologies to estimate the effect of peer characteristics--race, poverty, and ability--on student achievement. A paper by Hanushek, Kain, and Rivkin using Texas state testing data has received particularly wide attention because it found a large negative effect of school percent black on black math…
Poplin, Beth D.
This study examined whether students who graded and corrected their own test papers improved their learning and standardized test scores on the North Carolina end-of-course test in United States History. Four preexisting, intact classrooms of 11th grade United States History students in two different high schools formed the basis of this…
East, Pam C.
Many teachers look at standardized tests as something to be dreaded. This author and teacher looks at standardized-test scores and sees a tool to bring students learning to new heights. This is a way for teachers to target instruction exactly where it's needed. A way to get students looking forward to end-of-the-year tests (really!) as a way to…
Sireci, Stephen G.; Han, Kyung T.; Wells, Craig S.
In the United States, when English language learners (ELLs) are tested, they are usually tested in English and their limited English proficiency is a potential cause of construct-irrelevant variance. When such irrelevancies affect test scores, inaccurate interpretations of ELLs' knowledge, skills, and abilities may occur. In this article, we…
Westbrook, Bert W.; And Others
Attempted to replicate study determining relationship between appropriateness of career choices and career maturity test scores in rural ninth grade students (N=112) using Goal Selection scale of Career Maturity Inventory Competence Test and American College Testing Program Career Planning Program. Found two career maturity measures correlated…
MacCann, Robert G.
It is shown that the Angoff and bookmarking cut scores are examples of true score equating that in the real world must be applied to observed scores. In the context of defining minimal competency, the percentage "failed" by such methods is a function of the length of the measuring instrument. It is argued that this length is largely arbitrary,…
Mueller, Daniel J.; Wasser, Virginia
Eighteen studies of the effects of changing initial answers to objective test items are reviewed. While students throughout the total test score range tended to gain more points than they lost, higher scoring students gain more than did lower scoring students. Suggestions for further research are made. (Author/JKS)
Lalande, John F.; Schweckendiek, Jurgen
Investigates what correlations might exist between an individual's score on the Zertifikat Deutsch als Fremdsprache and on the Oral Proficiency Interview. The tests themselves are briefly described. Results indicate that the two tests appear to correlate well in their evaluation of speaking skills. (SED)
Loewen, David Allen
This exploratory correlational study seeks to answer the question of whether a relationship exists between student average test score gains on state exams and teachers' rating of values on the Schwartz Values Survey. Eighty-seven randomly selected Kansas teachers of math and/or reading, grades four through eight, participated. Student test…
Needham, Martha Elaine
This research compares differences between standardized test scores in problem-based learning (PBL) classrooms and a traditional classroom for 6th grade students using a mixed-method, quasi-experimental and qualitative design. The research shows that problem-based learning is as effective as traditional teaching methods on standardized tests. The…
Liu, Yang; Thissen, David
Local dependence (LD) refers to the violation of the local independence assumption of most item response models. Statistics that indicate LD between a pair of items on a test or questionnaire that is being fitted with an item response model can play a useful diagnostic role in applications of item response theory. In this article, a new score test…
D'Agostino, Jerome V.; Powers, Sonya J.
A meta-analysis was conducted to examine the degree to which teachers' test scores and their performance in preparation programs as measured by their collegiate grade point average (GPA) predicted their teaching competence. Results from 123 studies that yielded 715 effect sizes were analyzed, and the mediating effects of test and GPA type,…
Jerome V. D’Agostino; Sonya J. Powers
A meta-analysis was conducted to examine the degree to which teachers’ test scores and their performance in preparation programs as measured by their collegiate grade point average (GPA) predicted their teaching competence. Results from 123 studies that yielded 715 effect sizes were analyzed, and the mediating effects of test and GPA type, criterion type, teaching level, service level, and decade
The purpose of this study was to investigate the impact of local item dependence (LID) in passage-based testlets on the test score reliability of an English as a Foreign Language (EFL) reading comprehension test from the perspective of generalizability (G) theory. Definitions and causes of LID in passage-based testlets are reviewed within the…
DEVELOPMENT OF A WEB-BASED BLIND TEST TO SCORE AND RANK HYPERSPECTRAL CLASSIFICATION ALGORITHMS K by supplying the user with additional spectral data as compared to high-resolution color imagery. The web their classification algorithms and upload their results back to the web application. The blind test site automatically
Journal of Blacks in Higher Education, 2003
Academically accomplished applicants to the nation's top colleges usually take SAT II Achievement Tests. While scoring gaps between college-bound Blacks and Whites on these tests tend to be smaller than gaps on the basic SAT, a racial scoring gap persists. However, black students appear to be making progress in closing the racial scoring gap on…
de Gobbi Porto, Fábio Henrique; Spíndola, Lívia; de Oliveira, Maira Okada; Figuerêdo do Vale, Patrícia Helena; Orsini, Marco; Nitrini, Ricardo; Dozzi Brucki, Sonia Maria
It is not easy to differentiate patients with mild cognitive impairment (MCI) from subjective memory complainers (SMC). Assessments with screening cognitive tools are essential, particularly in primary care where most patients are seen. The objective of this study was to evaluate the diagnostic accuracy of screening cognitive tests and to propose a score derived from screening tests. Elderly subjects with memory complaints were evaluated using the Mini Mental State Examination (MMSE) and the Brief Cognitive Battery (BCB). We added two delayed recalls in the MMSE (a delayed recall and a late-delayed recall, LDR), and also a phonemic fluency test of letter P fluency (LPF). A score was created based on these tests. The diagnoses were made on the basis of clinical consensus and neuropsychological testing. Receiver operating characteristic curve analyses were used to determine area under the curve (AUC), the sensitivity and specificity for each test separately and for the final proposed score. MMSE, LDR, LPF and delayed recall of BCB scores reach statistically significant differences between groups (P=0.000, 0.03, 0.001 and 0.01, respectively). Sensitivity, specificity and AUC were MMSE: 64%, 79% and 0.75 (cut off <29); LDR: 56%, 62% and 0.62 (cut off <3); LPF: 71%, 71% and 0.71 (cut off <14); delayed recall of BCB: 56%, 82% and 0.68 (cut off <9). The proposed score reached a sensitivity of 88% and 76% and specificity of 62% and 75% for cut off over 1 and over 2, respectively. AUC were 0.81. In conclusion, a score created from screening tests is capable of discriminating MCI from SMC with moderate to good accurancy. PMID:24147213
Meadows, Sara; Herrick, David; Feiler, Anthony
The aim of the UK National Literacy Strategy is to raise standards in literacy. Strong evidence for its success has, however, been lacking: most of the available data comes from performance on tests administered in schools or from Office for Standards in Education reports and is vulnerable to suggestions of bias. An opportunistic analysis of data from a population cohort study extending over three school years compares school-based scores at school entry and at age 7-8 with independently administered scores on similar tests. The results show a small but statistically significant rise between 1998 and 1999 and between 1998 and 2000 in scores on both Key Stage 1 Reading Standard Assessment Tasks taken in schools and the reading component of the WORD test taken independently. This is clear evidence for a real rise in reading attainment over this period, which may be attributable to the children's experience of the National Literacy Strategy. PMID:18273398
Chamberlain, Gary E
I studied predictive effects of teachers and schools on test scores in fourth through eighth grade and outcomes later in life such as college attendance and earnings. For example, predict the fraction of a classroom attending college at age 20 given the test score for a different classroom in the same school with the same teacher and given the test score for a classroom in the same school with a different teacher. I would like to have predictive effects that condition on averages over many classrooms, with and without the same teacher. I set up a factor model that, under certain assumptions, makes this feasible. Administrative school district data in combination with tax data were used to calculate estimates and do inference. PMID:24101492
Homard, Catherine M
The purpose of this ex post facto correlational study was to compare exit examination scores and NCLEX-RN(®) pass rates of baccalaureate nursing students who differed in level of participation in a standardized test package. Three cohort groups emerged as a standardized test package was introduced: (a) students who did not participate in a standardized test package; (b) students with two semesters of a standardized test package; and (c) students with four semesters of a standardized test package. Benner's novice-to-expert theory framed the study in the belief that students best acquire knowledge and skills through practice and reflection. Students participating in four semesters of a standardized test package demonstrated higher exit examination scores and NCLEX-RN pass rates compared with students who did not participate in this package. This study's results could inform nurse educators about strategies to facilitate nursing student success on exit examinations and the NCLEX-RN. PMID:23413805
Ockey, Gary J.
The second language group oral is a test of second language speaking proficiency, in which a group of three or more English language learners discuss an assigned topic without interaction with interlocutors. Concerns expressed about the extent to which test takers' personal characteristics affect the scores of others in the group have limited its…
Educational Testing Service, Princeton, NJ.
The Nairn report, The Reign of ETS, asserts that Educational Testing Service (ETS) has attempted to suppress information on the relationship of test scores to students' family income, that the relationship of Scholastic Aptitude Test (SAT) scores to income is inordinately high, and that the tests preserve the social status quo by denying…
Lee, Seon-Young; Olszewski-Kubilius, Paula
This study examined differences between students who qualified for talent search testing via scores on standardized tests and via parent nomination in their performances on the SAT or ACT and some demographic characteristics. Overall, the standardized testing group earned higher scores on the off-level tests than the parent nominated group. Asian…
Kay, Rachel E.
Over the past few decades, and especially in the past ten years, computer use in schools has increased dramatically; however there has been little research examining the effects of technology use on student achievement, specifically defined by standardized test scores. There is also concern as to how technology use differs by gender and if that…
In 2004, the National Endowment for the Arts (NEA) concluded that "literature reading is fading as a meaningful activity, especially among younger people." How can educators continue to teach students about the power of literary response when the priority is for them to achieve proficiency on standardized tests, whose scores can only be narrowly…
Herriott, Tavita S.
The purpose of this study was to determine if there was a difference in students' standardized test scores based on the instructional model their teachers used. One group of students was served under a pullout instructional model. The other was served under an inclusive model. It is not known whether or not the pullout instructional model or the…
Almond, Russell G.
Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common…
This paper considers the problem of validating placement procedures or, more precisely, of determining their educational appropriateness. At issue is determining whether a test score serves the particular educational function it was designed to serve (for example, course placement), and whether it does so in an economical way. These determinations…
Ricketts, Christine R.
This study examined the extent to which end-of-course grades are predictive of Virginia Standards of Learning test scores in nine high school content areas. It also analyzed the impact of the variables school cluster attended, gender, ethnicity, disability status, Limited English Proficiency status, and socioeconomic status on the relationship…
Among the 50 states, Florida's gains on the National Assessment of Educational Progress (NAEP) between 1992 and 2011 ranked second only to Maryland's. Florida's progress has been particularly impressive in the early grades. In 1998, Florida scored about one grade level below the national average on the 4th-grade NAEP reading test, but it was…
McEnroe, James D.
The study examined the effects of the federally funded Comprehensive School Reform (CSR) program on student performance on mandated standardized tests. The study focused on the mathematics and reading scores of Illinois public elementary and middle and junior high school students. The federal CSR program provided Illinois schools with an annual…
Rothstein, Jesse; Wozny, Nathan
Analysts often examine the black-white test score gap conditional on family income. Typically only a current income measure is available. We argue that the gap conditional on permanent income is of greater interest, and we describe a method for identifying this gap using an auxiliary data set to estimate the relationship between current and…
David J. Hebert; Alan F. Holmes
A literature review was carried out to determine existing knowledge regarding the relationship of Graduate Record Examinations Aptitude Test (GRE) scores and graduate grade point average (GGPA). Building upon this information, a study was undertaken in which data were gathered for 67 M.Ed. candidates admitted into and graduating from the University of New Hampshire Department of Education during a given
In "Beyond Test Scores: Leading Indicators for Education," Foley and colleagues (2008) define leading indicators as those that "provide early signals of progress toward academic achievement" (p. 1) and stress that educators "need leading indicators to help them see the direction their efforts are going in and to take corrective action as soon as…
Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance." His…
Diamond, Sandra M.
The Problem: The purpose of this study was to investigate whether or not there were any statistically significant differences in the Mathematics California Standard Test scores and attendance rates for African American and Latina high school girls who participated in an afterschool program. Method: A quasi-experimental design was conducted with…
Stiefel, Leanna; Schwartz, Amy Ellen; Ellen, Ingrid Gould
We examine the size and distribution of the gap in test scores across races within New York City public schools and the factors that explain these gaps. While gaps are partially explained by differences in student characteristics, such as poverty, differences in schools attended are also important. At the same time, substantial within-school gaps…
Thomas, P. Ann
The focus of the investigation is on a sixth grade population not performing reading on grade level and not achieving high-stakes test score proficiency causing the school to fail adequate yearly progress (AYP). The lack of reading skills causes the students to repeat grades in middle school and high school. Reading technology instruction is the…
Hoffman, John L.; Lowitzki, Katie E.
Using a sample of 522 students at a Lutheran university in the Southwestern United States, researchers examined differences in the predictive strength of high school grades and standardized test scores for student involvement, academic achievement, retention, and satisfaction. Findings indicate that high school grades are stronger predictors of…
van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas
The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…
Land, Warren A.; Land, Elizabeth R.
This study was directed toward determining the effect on National Teacher Education (NTE) Test Scores of changing a professional undergraduate educational program, and exploring whether there is a specific significant difference between the use of a traditional undergraduate professional educational program and a modified undergraduate…
Worrell, Frank C.; Watson, Stevie
In this study, the authors tested the viability of the expanded nigrescence (NT-E) model as operationalized by Cross Racial Identity Scale (CRIS) scores using confirmatory factor analyses. Participants were 594 Black college students from the Southeastern United States. Results indicated a good fit for NT-E's proposed six-factor structure.…
Dickson, Teresa Kay
This study analyzed student test scores to determine if teacher participation in an inquiry-based professional development was able to make a statistically significant difference in student achievement levels. Test scores for objectives that assessed the critical thinking skills and problem-solving strategies modeled in a science inquiry institute were studied. Inquiry-based experiences are the cornerstones for meeting the science standards for scientific literacy. State mandated assessment tests measure the levels of student achievement and are reported as meeting minimum expectations or showing mastery for specific learning objectives. Students test scores from the Texas Assessment of Academic Skills Test (TAAS) for 8th grade science and the biology End Of Course (EOC) exams were analyzed using ANCOVA, chi square, and logistic regression, with the Iowa Test of Basic Skills (ITBS) 7th Grade Science Subtest as covariate. It was hypothesized that the students of Inquiry Institute teachers would have higher scale scores and better rates of mastery on the critical thinking objectives than the students of non-Institute teachers. It was also hypothesized that it would be possible to predict student mastery on the objectives that assessed critical thinking and problem solving based on Institute participation. This quasi-experimental study did not show a statistically significant difference between the two groups. The effects of inquiry-based professional development may not be determined by analyzing the results of the standardized tests currently being used in Texas. Inquiry training may make a difference, but because of factors such as the ceiling effect, insufficient time to implement the program, and test items that are intended to but do not address critical thinking skills, the TAAS and EOC tests may not accurately assess effects of the Inquiry Institute. The results of this study did indicate the best predictor of student mastery for the 8th grade science TAAS and Biology EOC may possibly be prior knowledge acquired in elementary school and as demonstrated on the 7th grade ITBS science subtest.
M. Laiacona; M. G. Inzaghi; A. De Tanti; E. Capitani
The Wisconsin card sorting test and the Weigl test are two neuropsychological tools widely used in clinical practice to assess\\u000a frontal lobe functions. In this study we present norms useful for Italian subjects aged from 15 to 85 years, within 5–17 years\\u000a of education. Concerning the Wisconsin card sorting test, a new measure of global efficiency (global score) is proposed
Richard A. Charter
The author provides statistical approaches to aid investigators in assuring that sufficiently high test score reliabilities are achieved for specific research purposes. The statistical approaches use tests of statistical significance between the obtained reliability and lowest population reliability that an investigator will tolerate. The statistical approaches work for coefficient alpha and related coefficients and for alternate-forms, split-half (2-part alpha), and
George-Ezzelle, Carol E.; Skaggs, Gary
Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…
Legg, Sue M.; Buhr, Dianne C.
Possible causes of a 16-point mean score increase for the computer adaptive form of the College Level Academic Skills Test (CLAST) in reading over the paper-and-pencil test (PPT) in reading are examined. The adaptive form of the CLAST was used in a state-wide field test in which reading, writing, and computation scores for approximately 1,000…
In recent years there has been growing theoretical interest in exploring the relationship between the interpretation and use of high-stakes proficiency test scores. In these discussions, the role of institutional test users (or test score consumers) has received only limited attention. This may be due, at least in part, to the lack of consensus in…
Manjunath, N K; Telles, Shirley
The performance scores of children (aged 11 to 16 years) in verbal and spatial memory tests were compared for two groups (n = 30, each), one attending a yoga camp and the other a fine arts camp. Both groups were assessed on the memory tasks initially and after ten days of their respective interventions. A control group (n = 30) was similarly studied to assess the test-retest effect. At the final assessment the yoga group showed a significant increase of 43% in spatial memory scores (Multivariate analysis, Tukey test), while the fine arts and control groups showed no change. The results suggest that yoga practice, including physical postures, yoga breathing, meditation and guided relaxation improved delayed recall of spatial information. PMID:15648409
Guzeller, Cem Oktay
In this research, the relationship between written exam scores of science and technology class of 6th, 7th, and 8th grades, project, participation in class activities and performance work, year-end academic success point averages and sub-test raw scores of LDT science of 6th, 7th and 8th grades. Academic success point averages were used as…
Zimmerman, Donald W.
In order to circumvent the influence of correlation in paired-samples and repeated measures experimental designs, researchers typically perform a one-sample Student "t" test on difference scores. That procedure entails some loss of power, because it employs N - 1 degrees of freedom instead of the 2N - 2 degrees of freedom of the…
There is significant potential for error in long production processes that consist of sequential stages, each of which is heavily dependent on the previous stage, such as the SER (Scoring, Equating, and Reporting) process. Quality control procedures are required in order to monitor this process and to reduce the number of mistakes to a minimum. In…
Powell, P. E.
Educators have recently come to consider inquiry based instruction as a more effective method of instruction than didactic instruction. Experience based learning theory suggests that student performance is linked to teaching method. However, research is limited on inquiry teaching and its effectiveness on preparing students to perform well on standardized tests. The purpose of the study to investigate whether one of these two teaching methodologies was more effective in increasing student performance on standardized science tests. The quasi experimental quantitative study was comprised of two stages. Stage 1 used a survey to identify teaching methods of a convenience sample of 57 teacher participants and determined level of inquiry used in instruction to place participants into instructional groups (the independent variable). Stage 2 used analysis of covariance (ANCOVA) to compare posttest scores on a standardized exam by teaching method. Additional analyses were conducted to examine the differences in science achievement by ethnicity, gender, and socioeconomic status by teaching methodology. Results demonstrated a statistically significant gain in test scores when taught using inquiry based instruction. Subpopulation analyses indicated all groups showed improved mean standardized test scores except African American students. The findings benefit teachers and students by presenting data supporting a method of content delivery that increases teacher efficacy and produces students with a greater cognition of science content that meets the school's mission and goals.
Mbandi, Stanley Kimbung; Hesse, Uljana; Rees, D. Jasper G.; Christoffels, Alan
Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score based filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms. PMID:24575122
Gitomer, Drew H.; Qi, Yi
This study concerns the "highly qualified teacher" provisions of the "Elementary and Secondary Education Act" ("ESEA," 2002), as reauthorized, and other policies at the federal, state and local levels, which have aimed to elevate the content knowledge of teachers. This examination of "Praxis II" score trends was not meant to serve as an evaluation…
ECONOMICS UNDERSTANDING OF ALBANIAN HIGH SCHOOL STUDENTS: FACTORS RELATED TO ACHIEVEMENT AS MEASURED BY TEST SCORES ON THE TEST OF ECONOMIC LITERACY By Dolore Bushati ©2010 Submitted to the Department of Curriculum... STUDENTS: FACTORS RELATED TO ACHIEVEMENT AS MEASURED BY TEST SCORES ON THE TEST OF ECONOMIC LITERACY Committee: ________________________________ Chairperson...
Lam, Teresa; Burns, Kharis; Dennis, Mark; Cheung, N Wah; Gunton, Jenny E
Cardiovascular disease (CVD) is the leading cause of morbidity and mortality among patients with diabetes mellitus, who have a risk of cardiovascular mortality two to four times that of people without diabetes. An individualised approach to cardiovascular risk estimation and management is needed. Over the past decades, many risk scores have been developed to predict CVD. However, few have been externally validated in a diabetic population and limited studies have examined the impact of applying a prediction model in clinical practice. Currently, guidelines are focused on testing for CVD in symptomatic patients. Atypical symptoms or silent ischemia are more common in the diabetic population, and with additional markers of vascular disease such as erectile dysfunction and autonomic neuropathy, these guidelines can be difficult to interpret. We propose an algorithm incorporating cardiovascular risk scores in combination with typical and atypical signs and symptoms to alert clinicians to consider further investigation with provocative testing. The modalities for investigation of CVD are discussed. PMID:25987961
Oden, Neal; VanVeldhuisen, Paul C.; Scott, Ingrid U.; Ip, Michael S.
We compare five closed tests for strong control of family-wide type I error (FWE) while making all pair-wise comparisons of means in clinical trials with multiple arms such as the SCORE Study. We simulated outcomes of the SCORE Study under its design hypotheses, and used p-values from chi-squared tests to compare performance of a “pairwise” closed test described below to Bonferroni and Hochberg adjusted p-values. “Pairwise” closed testing was more powerful than Hochberg’s method by several definitions of multiple-test power. Simulations over a wider parameter space, and considering other closed methods, confirmed this superiority for p-values based on normal, logistic, and Poisson distributions. The power benefit of “pair-wise” closed testing begins to disappear with 5 or more arms, and with unbalanced designs. For trials with 4 or fewer arms and balanced designs, investigators should consider using “pair-wise” closed testing in preference to Shaffer’s, Hommel’s, and Hochberg’s approaches when making all pairwise comparisons of means. If not all p-values from the closed family are available, Shaffer’s method is a good choice. PMID:21660119
Stevens, Charlotte Bethany Rains
Nationwide, the goal of providing a productive science and math education to our youth in today's educational institutions is centering itself around the technology being utilized in these classrooms. In this age of digital technology, educational software and calculator-based laboratories (CBL) have become significant devices in the teaching of science and math for many states across the United States. Among the technology, the Texas Instruments graphing calculator and Vernier Labpro interface, are among some of the calculator-based laboratories becoming increasingly popular among middle and high school science and math teachers in many school districts across this country. In Tennessee, however, it is reported that this type of technology is not regularly utilized at the student level in most high school science classrooms, especially in the area of Physical Science (Vernier, 2006). This research explored the effect of calculator based laboratory instruction on standardized test scores. The purpose of this study was to determine the effect of traditional teaching methods versus graphing calculator teaching methods on the state mandated End-of-Course (EOC) Physical Science exam based on ability, gender, and ethnicity. The sample included 187 total tenth and eleventh grade physical science students, 101 of which belonged to a control group and 87 of which belonged to the experimental group. Physical Science End-of-Course scores obtained from the Tennessee Department of Education during the spring of 2005 and the spring of 2006 were used to examine the hypotheses. The findings of this research study suggested the type of teaching method, traditional or calculator based, did not have an effect on standardized test scores. However, the students' ability level, as demonstrated on the End-of-Course test, had a significant effect on End-of-Course test scores. This study focused on a limited population of high school physical science students in the middle Tennessee Putnam County area. The study should be reproduced in various school districts in the state of Tennessee to compare the findings.
G. Stennis Watson; Kenneth J Sufka; Terence J Coderre
The formalin test is a well-established model for assessing inflammatory nociceptive processes and analgesic drug effects. Previous research established the validity of an ordinal relationship among three well-defined pain behavior categories used to compute a composite pain score (CPS). However, optimal weights had not been validated. The present research used data from Coderre et al. (1993)and from Sufka and Roach
Reynolds, Matthew R.
The linear loadings of intelligence test composite scores on a general factor ("g") have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the "g" loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of this study was to (a)…
Butler, Oliver T.; And Others
This study tested for cultural bias in the Bender Visual Motor Gestalt Test. Subjects were 72 black and white patients diagnosed as either brain damaged or psychiatric. Bender protocols were scored by Pascal-Suttell and Hain systems. No race effect appeared except for the Pascal-Suttell system for which blacks scored significantly better. (Author)
Kobrosly, Roni W; Seplaki, Christopher L; Jones, Courtney M; van Wijngaarden, Edwin
Objective To investigate the relationship between a measure of cumulative physiologic dysfunction and specific domains of cognitive function. Methods We examined a summary score measuring physiological dysfunction, a multisystem measure of the body’s ability to effectively adapt to physical and psychological demands, in relation to cognitive function deficits in a population of 4511 adults aged 20 to 59 who participated in the third National Health and Nutrition Examination Survey (1988–1994). Measures of cognitive function comprised three domains: working memory, visuomotor speed, and perceptual-motor speed. ‘Physiologic dysfunction’ scores summarizing measures of cardiovascular, immunologic, kidney, and liver function were explored. We used multiple linear regression models to estimate associations between cognitive function measures and physiological dysfunction scores, adjusting for socioeconomic factors, test conditions, and self-reported health factors. Results We noted a dose-response relationship between physiologic dysfunction and working memory (coefficient = 0.207, 95% CI = (0.066, 0.348), p < 0.0001) that persisted after adjustment for all covariates (p = 0.03). We did not observe any significant relationships between dysfunction scores and visuomotor (p = 0.37) or perceptual-motor ability (p = 0.33). Conclusions Our findings suggest that multisystem physiologic dysfunction is associated with working memory. Future longitudinal studies are needed to clarify the underlying mechanisms and explore the persistency of this association into later life. We suggest that such studies should incorporate physiologic data, neuroendocrine parameters, and a wide range of specific cognitive domains. PMID:22155941
Lee, Guemin; Park, In-Yong
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Deckersbach, T; Savage, C R; Henin, A; Mataix-Cols, D; Otto, M W; Wilhelm, S; Rauch, S L; Baer, L; Jenike, M A
The Rey-Osterrieth Complex Figure Test (RCFT) is a widely-used measure of visuospatial construction and nonverbal memory. One of the critical aspects of this test is that organizing the figure into meaningful perceptual units during copy enhances its subsequent free recall from memory. This study examined the psychometric properties of a new system for quantifying the organizational approach to the RCFT figure and compared it to another compatible scoring system. We investigated interrater reliability of both systems and explored the influences of copy organization and copy accuracy on immediate recall. Seventy-one participants meeting DSM-IV criteria for obsessive-compulsive disorder and 55 healthy control participants completed the copy and immediate free recall condition of the RCFT. Interrater reliability was evaluated by Kappa coefficients and Pearson correlations. The effects of copy organization and copy accuracy on immediate recall were evaluated using multiple regression analyses. Results indicated that the organizational approach could be assessed with high reliability using both scoring systems. Organization during copy was a strong predictor for subsequent free recall from memory using both approaches. Multiple regression analysis indicated that all organizational elements were not equally predictive of memory performance. This new system represents a very simple and reliable approach to scoring organization on the RCFT, since it requires the identification of only 5 figure components. These characteristics should contribute to its clinical utility. PMID:11094399
Edwards, Jerri D; Vance, David E; Wadley, Virginia G; Cissell, Gayla M; Roenker, Daniel L; Ball, Karlene K
The Useful Field of View test (UFOV(1)) is a measure of processing speed that predicts driving performance and other functional abilities in older adults. In comparison to a number of other visual and cognitive measures, the UFOV measure has consistently been found to be the strongest predictor of motor vehicle crashes of older adults. This measure has valuable applications in that computerized, performance-based measures that are predictive of crashes in the elderly population can provide an objective criterion for determining the need for driver restriction or rehabilitation. Administration of the UFOV test has evolved from the standard version (administered via touch-screen with the Visual Attention Analyzer) to two briefer versions, which are administered on a personal desktop computer (PC) using either a touch screen or mouse response option. These new versions of the test are briefer and require less specialized equipment, making the test more portable and practical for use in clinical settings. This study examined the reliability and validity of the scores from these two new versions. Results indicate that test-retest reliabilities of the scores from the UFOV PC versions are high (r's= 0 .884 for mouse and 0.735 for touch), and performance on both PC versions correlates well with performance on the standard version (r's = 0.658 for mouse and 0.746 for touch). Furthermore, scores were highly correlated (r = 0.916) when participants used either a touch screen or a mouse to input responses. In conclusion, the reliability and validity coefficients are of sufficient magnitude to make the touch and mouse PC versions of the UFOV practical for use in clinical evaluations. PMID:16019630
Sullivan, Jeremy R.; Winter, Suzanne M.; Sass, Daniel A.; Svenkerud, Nicole
Many tests provide users with several different types of scores to facilitate interpretation and description of students' performance. Common examples include raw scores, age- and grade-equivalent scores, and standard scores. However, when used within the context of assessing growth among young children, these scores should not be…
Stewart, David W.; Hagemeier, Nicholas E.; Thigpen, Jim C.; Brooks, Lauren
Objective: To determine if the frequency of self-testing of course material prior to actual examination improves examination scores, regardless of the actual scores on the self-testing. Methods: Practice quizzes were randomly generated from a total of 1342 multiple-choice questions in pathophysiology and made available online for student self-testing. Intercorrelations, 2-way repeated measures ANOVA with post hoc tests, and 2-group comparisons following rank ordering, were conducted. Results: During each of 4 testing blocks, more than 85% of students took advantage of the self-testing process for a total of 7042 attempts. A consistent significant correlation (p?0.05) existed between the number of practice quiz attempts and the subsequent examination scores. No difference in the number of quiz attempts was demonstrated compared to the first testing block. Exam scores for the first and second testing blocks were both higher than those for third and fourth blocks. Conclusion: Although self-testing strategies increase retrieval and retention, they are uncommon in pharmacy education. The results suggested that the number of self-testing attempts alone improved subsequent examination scores, regardless of the score for self-tests.
He, Hua; McDermott, Michael P.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. PMID:21856650
Prato, Ermelinda; Biandolino, Francesca; Libralato, Giovanni
This study developed a tool able to evaluate the potential contamination of marine sediments detecting the presence or absence of toxicity supporting environmental decision-making processes. When the sample is toxic, it is important to classify its level of toxicity to understand its subsequent effects and management practices. Corophium insidiosum is a widespread and frequently recorded species along the Mediterranean Sea, North Sea and western Baltic Sea with records also in the Atlantic Ocean and Pacific Ocean. This amphipod is found in high abundance in shallow brackish inshore areas and estuaries also with high turbidity. At Italian level, C. insidiosum is more frequently collectable than Corophium orientale, making routine toxicity tests easier to be performed. Moreover, according to the international scientific literature, C. insidiosum is more sensitive than C. orientale. Whole sediment toxicity data (10 days) with C. insidiosum were organised in a species-specific toxicity score on the basis of the minimum significance difference (MSD) approach. Thresholds to rank samples as non-toxic and toxic were based on sediment samples (n=84) from the Gulf of Taranto (Italy). A five-class toxicity score (absent, low, medium, high and very high toxicity) was developed, considering the distribution of the 90th percentile of the MSD normalised to the effects on the negative controls (samples from reference sites). This toxicity score could be useful for interpreting sediment potential impacts and providing quick responsive management information. PMID:25773894
Makransky, Guido; Mortensen, Erik Lykke; Glas, Cees A. W.
Narrowly defined personality facet scores are commonly reported and used for making decisions in clinical and organizational settings. Although these facets are typically related, scoring is usually carried out for a single facet at a time. This method can be ineffective and time consuming when personality tests contain many highly correlated…
Stevenson, Rosnisha D.; Kritsonis, William Allan
This article will seek to utilize Dr. William Allan Kritsonis' book "Ways of Knowing Through the Realms of Meaning" (2007) as a framework to improve a campus's standardized test scores, more specifically, their TAKS (Texas Assessment of Knowledge and Skills) scores. Many campuses have an improvement plan, also known as a Campus Improvement Plan,…
Oshima, T. C.; And Others
A procedure to detect differential item functioning (DIF) is introduced that is suitable for tests with a cutoff score. DIF is assessed on a limited closed interval of thetas in which a cutoff score falls. How this approach affects the identification of DIF items is demonstrated with real data sets. (SLD)
Math 199 Contract of Understanding You are enrolled in Math 199 because your Calculus Placement Test score was below the cutoff for Math 231. Statistical data from previous semesters shows that students with placement scores of 22 and below (out of 45) have a much higher W/D/F rate in Math 231
Admission Test Preparation Admission test scores help professional and graduate programs determine-prepared for these tests. Some are tests of aptitude in quantitative skills, verbal and analytical reasoning and/or writing ability (e.g., GRE, LSAT, GMAT), while others are tests of content knowledge (e.g., GRE Subject Tests
Wilcox, Rand R.
This paper describes and compares procedures for estimating the reliability of proficiency tests that are scored with latent structure models. Results suggest that the predictive estimate is the most accurate of the procedures. (Author/BW)
Marangi, Giuseppe; Ricciardi, Stefania; Orteschi, Daniela; Tenconi, Romano; Monica, Matteo Della; Scarano, Gioacchino; Battaglia, Domenica; Lettori, Donatella; Vasco, Gessica; Zollino, Marcella
Pitt-Hopkins syndrome (PTHS) is an emerging condition characterized by severe intellectual disability (ID), typical facial gestalt, and additional features, such as breathing abnormalities. Because of the overlapping phenotype of severe ID with absent speech, epilepsy, microcephaly, large mouth, and constipation, differential diagnosis of PTHS with respect to Angelman, Rett, and Mowat-Wilson syndromes represents a relevant clinical issue, and many patients are currently undergoing genetic tests for different conditions that are assumed to fall within the PTHS clinical spectrum. During a search for TCF4 mutations in 78 patients with a suspected PTHS, haploinsufficiency of TCF4 was identified in 18. By evaluating clinical features of patients with a proven TCF4 mutation with those of patients without, we noticed that, in addition to the typical facial gestalt, the PTHS phenotype results from the various combination of the following characteristics: ID with severe speech impairment, normal growth parameters at birth, postnatal microcephaly, breathing abnormalities, motor incoordination, ocular anomalies, constipation, seizures, typical behavior, and subtle brain abnormalities. On the basis of these observations, here we propose a clinically based score system as useful tool for driving a first choice molecular test for PTHS. This scoring system is also proposed for a clinically based diagnosis of PTHS in absence of a proven TCF4 mutation. PMID:22678594
Walker, J.D. [Environmental Protection Agency, Washington, DC (United States)
This paper describes the TSCA interagency testing committee`s (ITC) approaches to screening and scoring chemicals and chemical groups between 1977 and 1983. During this time the ITC conducted five scoring exercises to select chemicals and chemical groups for detailed review and to determine which of these chemicals and chemical groups should be added to the TSCA Section 4(e) Priority Testing List. 29 refs., 1 fig., 2 tabs.
Should We Stop Looking for a Better Scoring Algorithm for Handling Implicit Association Test Data? Test of the Role of Errors, Extreme Latencies Treatment, Scoring Formula, and Practice Trials on Reliability and Validity
Perugini, Marco; Schönbrodt, Felix
Since the development of D scores for the Implicit Association Test, few studies have examined whether there is a better scoring method. In this contribution, we tested the effect of four relevant parameters for IAT data that are the treatment of extreme latencies, the error treatment, the method for computing the IAT difference, and the distinction between practice and test critical trials. For some options of these different parameters, we included robust statistic methods that can provide viable alternative metrics to existing scoring algorithms, especially given the specificity of reaction time data. We thus elaborated 420 algorithms that result from the combination of all the different options and test the main effect of the four parameters with robust statistical analyses as well as their interaction with the type of IAT (i.e., with or without built-in penalty included in the IAT procedure). From the results, we can elaborate some recommendations. A treatment of extreme latencies is preferable but only if it consists in replacing rather than eliminating them. Errors contain important information and should not be discarded. The D score seems to be still a good way to compute the difference although the G score could be a good alternative, and finally it seems better to not compute the IAT difference separately for practice and test critical trials. From this recommendation, we propose to improve the traditional D scores with small yet effective modifications. PMID:26107176
Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio
Objective The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. Methods The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2±0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach’s alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQ<70) was performed. In addition, stratum-specific likelihood ratios for detection of intellectual disability were calculated. Results The Cronbach’s alpha for the new Tanaka B Intelligence Scale IQ (BIQ) was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85–0.96). In addition, the stratum-specific likelihood ratio for the BIQ?65 stratum was 13.8 (95% CI: 3.9–48.9), and the stratum-specific likelihood ratio for the BIQ?76 stratum was 0.1 (95% CI: 0.03–0.4). Thus, intellectual disability could be ruled out or determined. Conclusion The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult. PMID:24940880
Moreira Leite, Katia Ramos [Laboratory of Medical Investigation - LIM 55, Urology Department, Medical School, Universidade de Sao Paulo (Brazil); Laboratory of Surgical and Molecular Pathology - Hospital Sirio Libanes, Sao Paulo (Brazil)], E-mail: firstname.lastname@example.org; Camara-Lopes, Luiz H.A. [Laboratory of Surgical and Molecular Pathology - Hospital Sirio Libanes, Sao Paulo (Brazil); Dall'Oglio, Marcos F.; Cury, Jose; Antunes, Alberto A.; Sanudo, Adriana; Srougi, Miguel [Laboratory of Medical Investigation - LIM 55, Urology Department, Medical School, Universidade de Sao Paulo (Brazil)
Purpose: To determine the incidence of overestimation of Gleason score (GS) in extended prostate biopsy, and consequently circumventing unnecessary aggressive treatment. Methods and Materials: This is a retrospective study of 464 patients who underwent prostate biopsy and radical prostatectomy between January 2001 and November 2007. The GS from biopsy and radical prostatectomy were compared. The incidence of overestimation of GS in biopsies and tumor volume were studied. Multivariate analysis was applied to find parameters that predict upgrading the GS in prostate biopsy. Results: The exact agreement of GS between prostate biopsy and radical prostatectomy occurred in 56.9% of cases. In 29.1% cases it was underestimated, and it was overestimated in 14%. One hundred and six (22.8%) patients received a diagnosis of high GS (8, 9, or 10) in a prostate biopsy. In 29.2% of cases, the definitive Gleason Score was 7 or lower. In cases in which GS was overestimated in the biopsy, tumors were significantly smaller. In multivariate analysis, the total percentage of tumor was the only independent factor in overestimation of GS. Tumors occupying less than 33% of cores had a 5.6-fold greater chance of being overestimated. Conclusion: In the extended biopsy era and after the International Society of Urological Pathology consensus on GS, almost one third of tumors considered to have high GS at the biopsy may be intermediate-risk cancers. In that condition, tumors are smaller in biopsy. This should be remembered by professionals involved with prostate cancer to avoid overtreatment and undesirable side effects.
Alster, E H
The purpose of this study was to assess the effects of extended time on the algebra test performance of community college students with and without learning disabilities. Forty-four students with learning disabilities and 44 students without learning disabilities attending five California community colleges participated in the study. The students each took an algebra test under timed conditions and a comparable test under extended-time conditions. The main results were that the students with learning disabilities scored significantly lower than the students without learning disabilities under timed conditions, the scores of the students with learning disabilities increased significantly with extended time, and the scores of the students with learning disabilities under extended-time conditions did not differ significantly from the timed or extended-time scores of the students without learning disabilities. PMID:9066283
must still prepare students for academic success. This study determined how the use of more rigorous Lexile standards found in other states and associated with the Common Core Curriculum Standards would affect passing scores on Texas reading assessments...
The purpose of this study was to compare the achievement of general education students within regular education classes to the achievement of general education students in inclusion/co-teach classes to determine whether there was a significant difference in the achievement between the two groups. The school district's inclusion/co-teach model included ongoing professional development support for teachers and administrators. General education teachers, special education teachers, and teacher assistants collaborated to develop instructional strategies to provide additional remediation to help students to acquire the skills needed to master course content. This quantitative study reviewed the end-of course test (EoCT) scores of Grade 10 physical science and math students within an urban school district. It is not known whether general education students in an inclusive/co-teach science or math course will demonstrate a higher achievement on the EoCT in math or science than students not in an inclusive/co-teach classroom setting. In addition, this study sought to determine if students classified as low socioeconomic status benefited from participating in co-teaching classrooms as evidenced by standardized tests. Inferential statistics were used to determine whether there was a significant difference between the achievements of the treatment group (inclusion/co-teach) and the control group (non-inclusion/co-teach). The findings can be used to provide school districts with optional instructional strategies to implement in the diverse classroom setting in the modern classroom to increase academic performance on state standardized tests.
Rayder, Nicolas; And Others
Four Wechsler subscales were administered in a longitudinal design to children from the Responsive Model Follow Through Program. On the first testing, subjects' average intelligence scores were significantly lower, but on subsequent tests equivalent to or higher than national norms, calling into question Deutsch's cumulative-deficit hypothesis.…
Talento-Miller, Eileen; Rudner, Lawrence M.
The validity of Graduate Management Admission Test (GMAT) scores is examined by summarizing 273 studies conducted between 1997 and 2004. Each of the studies was conducted through the Validity Study Service of the test sponsor and contained identical variables and statistical methods. Validity coefficients from each of the studies were corrected…
Robert Saltstone; Colin Skinner; Paul Tremblay
This study is a preliminary examination of the fit of three classical test theory models of standard error of measurement to selected personality scale (MMPI) score retest data. The three models compared are the conventional standard error of measurement formula, Lord’s (1955: Lord, F. M. (1955). Estimating test reliability. Educational and Psychological Measurement, 15, 325–336) conditional standard error of measurement
Bokossa, Maxime C.; Huang, Gary G.
This report describes the imputation procedures used to deal with missing data in the National Education Longitudinal Study of 1988 (NELS:88), the only current National Center for Education Statistics (NCES) dataset that contains scores from cognitive tests given the same set of students at multiple time points. As is inevitable, cognitive test…
Hodges, James Gregory
This study examined the impact that the teaching technique known as cooperative learning had on the changes between pre- and post-test scores on all sub-categories ("induction, deduction, analysis, evaluation, inference", and "total composite") associated with the "California Critical Thinking Skills Test" (CCTST) for…
Rooney, Charles; Schaeffer, Bob
More than 275 colleges across the United States now admit some or all of their applicants without regard to Scholastic Assessment Test (SAT) or American College Testing Program (ACT) scores, and many say that the policy has increased both the diversity and the academic quality of their entering classes. Many lessons have been learned at schools…
Wilcox, Rand R.
When determining criterion-referenced test length, problems of guessing are shown to be more serious than expected. A new method of scoring is presented that corrects for guessing without assuming that guessing is random. Empirical investigations of the procedure are examined. Test length can be substantially reduced. (Author/CM)
Kachewar, Smita Sushil; Dongre, Suryakant Dattatraya
Introduction: Fine-needle aspiration cytology (FNAC) method is safe, reliable and time saving outdoor procedure with little discomfort to the patient for detecting Carcinoma breast. The efficacy can further be enhanced when physical breast examination, mammography and FNAC (the triple test [TT]) are jointly taken into consideration. Aims and Objectives: The aim was to evaluate the role of TT score (TTS) in palpable breast masses. Materials and Methods: This prospective study was carried out from May 2010 to April 2012. In the subjects referred to the Department of Pathology for FNAC of the breast mass, the TTS was calculated, and histopathological findings were noted. Results: In the study period TTS score was calculated in 200 cases out of 225 FNAC's of breast. Of 124 benign cases on cytology, only three showed discordant TTS. Out of 62 malignant cases, 61 showed concordant TTS and one case of mastitis on histopathology showed TTS of five. Out of all the benign lesions, two cases of fibrocystic disease and a single case of phylloides tumor gave a TTS ?6. These cases were diagnosed as infiltrating ductal carcinoma and angiosarcoma respectively on histopathology. Histopathological correlation was possible in only 70 patients. Of these 70, 28 were from the benign category and 42 were from the malignant category. TTS of ?6 has a sensitivity of 97.44%, specificity of 100%. FNAC has a sensitivity of 88.37%, specificity of 96.42%. Conclusions: TT reliably guides evaluation of palpable breast masses. Histological correlation indicated TTS to be better diagnostic tool than FNAC alone.
Albert M. Gallo Jr
Exhaustive testing of computer software is intractable, but empirical studies of software failures suggest that testing can in some cases be effectively exhaustive. Data reported in this study and others show that software failures in a variety of domains were caused by combinations of relatively few conditions. These results have important implications for testing. If all faults in a system
Donlon, Thomas F.
To evaluate test speededness and to derive implications for test program activity, this study reviewed the literature on speed and power, identifying four major approaches to the assessment of speed: the Gulliksen approach, the Cronbach and Warrington approach, the Stafford approach, and the approach of the Educational Testing Service (ETS) as…
Bielinski, John; Thurlow, Martha; Minnema, Jane; Scott, Jim
This report is a review and analysis of the psychometric literature on the topic of out-of-level testing. Out-of-level testing refers to the practice of using a level of the test other than the test taken by most of the students in a student's current grade level. Much of the research on out-of-level testing was conducted in the 1970s and 1980s,…
Duckworth, Angela L; Quinn, Patrick D; Tsukayama, Eli
The increasing prominence of standardized testing to assess student learning motivated the current investigation. We propose that standardized achievement test scores assess competencies determined more by intelligence than by self-control, whereas report card grades assess competencies determined more by self-control than by intelligence. In particular, we suggest that intelligence helps students learn and solve problems independent of formal instruction, whereas self-control helps students study, complete homework, and behave positively in the classroom. Two longitudinal, prospective studies of middle school students support predictions from this model. In both samples, IQ predicted changes in standardized achievement test scores over time better than did self-control, whereas self-control predicted changes in report card grades over time better than did IQ. As expected, the effect of self-control on changes in report card grades was mediated in Study 2 by teacher ratings of homework completion and classroom conduct. In a third study, ratings of middle school teachers about the content and purpose of standardized achievement tests and report card grades were consistent with the proposed model. Implications for pedagogy and public policy are discussed. PMID:24072936
Center on Education Policy, 2011
This paper profiles Idaho's test score trends through 2008-09. In 2007, the mean scale score on the state 4th grade reading test was 209 for non-Title I students and 205 for Title I students. In 2007, the mean scale score in 4th grade reading was 211 for non-Title I students and 208 for Title I students. Between 2007 and 2009, the mean scale score…
Center on Education Policy, 2011
This paper profiles Utah's test score trends through 2008-09. In 2004, the mean scale score on the state 4th grade reading test was 167 for non-Title I students and 164 for Title I students. In 2009 the mean scale score in 4th grade reading was 168 for non-Title I students and 164 for Title I students. Between 2004 and 2009, the mean scale score…
Center on Education Policy, 2011
This paper profiles Kansas' test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 80 for non-Title I students and 73 for Title I students. In 2009, the mean scale score in 4th grade reading was 84 for non-Title I students and 78 for Title I students. Between 2006 and 2009, the mean scale score…
Center on Education Policy, 2011
This paper profiles Maine's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 445 for non-Title I students and 438 for Title I students. In 2009, the mean scale score in 4th grade reading was 477 for non-Title I students and 441 for Title I students. Between 2006 and 2009, the mean scale score…
Ruiz-Blanco, Yasser B.; Marrero-Ponce, Yovani; García, Yamila; Puris, Amilkar; Bello, Rafael; Green, James; Sotomayor-Torres, Clivia M.
Most successful structure prediction strategies use knowledge-based functions for global optimization, in spite of their intrinsic limited potential to create new folds, while physics-based approaches are often employed only during structure refinement steps. We here propose a physics-based scoring potential intended to perform global searches of the conformational space. We introduce a dynamic test to evaluate the discrimination power of our function, and compare it with predictions of targets from the CASP-ROLL competition. Results demonstrate that this dynamic test is able to generate 3D models which outrank 59% (according GDT_TS score) of models generated with ab initio structure prediction servers.
Marshall, Garland Ross
ENALISIS OF n?ISSOUM GOUNTI EHEM INVENTOrQn TEST SCORES END THE FERFORNENGE RATINGS OF TAXES iQRIGULTUREL EXTENSION aGENTS k Thesis GaHLaND iKSS Ha &HaLL Submitted to the Graduate College of the Tomas k. A University in Fartial fulfillment... of the requirements for the degree of 5 ?STER OF SCIENCE August 1964 Eaj or buhject: SocioiogF AN ANALYSIS OF "MISSOURI COUNTY AGi~ INFENTORP' TEST SCORES AND THE PERFORMANCE RATINGS OF TEXAS AGRICULTURAL EXTENSION AGENTS A Thesis GARLAND HOSS MARSHALL...
Mallett, Susan; Halligan, Steve; Collins, Gary S.; Altman, Doug G.
Background Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. Methods In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Results Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. Conclusions The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests. PMID:25353643
Bohnker, Bruce K; Sack, David M; Wedierhold, Lynn; Malakooti, Mark
Physical performance and risk factors from the U.S. Navy physical readiness test (PRT) were analyzed in a retrospective, cross-sectional, population-based study using data from the Spring 2002 cycle. PRT scores were available for 22,314 active duty women and 131,287 men, and risk factor information was available for 4,254 women and 31,503 men. For risk factors, self-reported smoking rates were higher for men than women, and decreased with increasing age. Self-reported rates for elevated cholesterol and joint problems increased with increasing age. Linear regression showed body mass index increased with age for men (constant = 25.6, increasing 0.0,765 per year of age over 18 years, p = 0.000) and were increasing at a lower rate for women (constant = 24.5 increasing 0.0,159 per year of age over 18 years, p = 0.000). Increasing body mass index was associated with decreasing PRT performance. This analysis provides population-based information on the PRT risk factors, body mass index, and physical fitness for Navy personnel. PMID:16435757
George Washington Univ., Washington, DC. Inst. for Educational Leadership.
"Options in Education" is a radio news program which focuses on issues and developments in education. This transcript contains discussions of volunteer parent tutors in a junior high school, the feminization of the teaching profession, the test score controversy, busing as an issue in the political primaries, and busing and the role of the social…
Stacy, Brian; Lockwood, J. R.; McCaffrey, Daniel
Researchers and policymakers are interested in the causal effects of educational inputs on student achievement. Unfortunately, it is not possible to directly observe student learning, so test score data is often used as an approximate measure. To measure their achievement at a given point in time (e.g., in the spring of the school year) students…
The topic of arts integration creates continuing dialog among educators and arts advocates. This study examined the degree to which student achievement was affected when arts education is limited or eliminated from schools to meet the mandates of NCLB (2001) legislation. Standardized test scores from 12 schools in Central Mississippi were used to…
Biuk-Aghai, Robert P.
Abstract--University or college admission is a complex decision process that goes beyond simply. domestic vs. overseas student proportion, and others. Choosing the most suitable among the many thousands matching test scores and admission requirements. Past research has suggested that students' backgrounds
Miroslava Korenova; Norbert Zilka; Zuzana Stozicka; Ondrej Bugos; Ivo Vanicky; Michal Novak
We have previously shown that transgenic rats expressing misfolded tau protein developed neurofibrillary tangles and axonal degeneration in the brain and spinal cord, which led to impairment of sensorimotor and neuromuscular functions. To quantify neurobehavioral phenotype of the transgenic rats we have designed a testing protocol and a novel scoring system – NeuroScale – that reliably reflects progression of functional
Jencks, Christopher; And Others
This volume contains eleven appendixes, varying from 5 to 165 pages, which describe the sample used in the analysis of ten surveys of American men aged 25-64 to determine the effects of family background, adolescent personality traits, cognitive test scores, and earnings in maturity. The appendixes are (1) 1970 Census 1/1000 Sample; (2) 1962…
Tannenbaum, Richard J.; Cho, Yeonsuk
In this article, we consolidate and present in one place what is known about quality indicators for setting standards so that stakeholders may be able to recognize the signs of standard-setting quality. We use the context of setting standards to associate English language test scores with language proficiency descriptions such as those presented…
Bolinger, Rex W.
Scholastic Aptitude Test (SAT) scores of Asian, Hispanic, Black, and White students with similar socioeconomic backgrounds and access to similar instruction in the same large midwestern school district were compared. Income levels were determined by using federal guidelines for free and reduced school lunches. The population of the study consisted…
Williams, Thomas O., Jr.; Fall, Anna-Maria; Eaves, Ronald C.; Woods-Groves, Suzanne
The reliability of scores for the "Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults" is examined with a sample of 110 college students from two universities in the southeast. The alpha coefficient for the total sample and the interscorer and intrascorer reliability for a subset of 31 students are analyzed. The alpha…
Koretz, Daniel; Kim, Young-Suk
In a pair of recent studies, Fryer and Levitt (2004a, 2004b) analyzed the Early Childhood Longitudinal Study--Kindergarten Cohort (ECLS-K) to explore the characteristics of the Black-White test score gap in young children. They found that the gap grew markedly between kindergarten and the third grade and that they could predict the gap from…
Turner, Sherry L.
Thirteen percent of the 2008-2009 senior class in one southeastern state did not pass the science portion of the state's high school graduation test. Another 5% failed to pass the math portion of the graduation test, leaving these students unable to obtain a high school diploma. The purpose of this nonexperimental quantitative research study was…
Haberman, Shelby J.
Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Silberglitt, Benjamin; Hintze, John
This study outlines a formative assessment system using a consistent set of cut scores on Curriculum-Based Measurement-Reading (CBM-R) probes and investigates four statistical methods for establishing cut scores. Cut scores were established using the Minnesota statewide achievement test in reading at grade 3 as the criterion for a successful…
Fontanive, Paolo; Miccoli, Mario; Simioniuc, Anca; Angelillis, Marco; Di Bello, Vitantonio; Baggiani, Angelo; Bongiorni, Maria Grazia; Marzilli, Mario; Dini, Frank Lloyd
Although echo Doppler and biomarkers are the most common examinations performed worldwide in heart failure (HF), they are rarely considered in risk scores. In outpatients with chronic HF and left ventricular ejection fraction (LVEF) ?45%, data on clinical status, echo Doppler variables, aminoterminal pro-type B natriuretic peptide (NT-proBNP), estimated glomerular filtration rate (eGFR), and drug therapies were combined to build up a multiparametric score. We randomly selected 250 patients to produce a derivation cohort and 388 patients were used as a testing cohort. Follow-up lasted 29 ± 23 months. The univariable predictors that entered into the multivariable Cox model were as follows: furosemide daily dose >25 mg, inability to tolerate angiotensin converting enzyme (ACE) inhibitors, inability to tolerate ?-blockers, age >75 years, New York Heart Association (NYHA) >2, eGFR<60 mL/min, NT-proBNP plasma levels above the median, tricuspid plane systolic excursion (TAPSE) ?14 mm, LV end-diastolic volume index (LVEDVi) >96 mL/m(2) , moderate-to-severe mitral regurgitation (MR) and LVEF <30%. The scores of prognostic factors were obtained with the respective odds ratio divided by the lower odd ratio: 4 points for furosemide dose, 3 points for age, NT-proBNP, LVEDVi, TAPSE, 2 points for inability to tolerate ?-blockers, inability to tolerate ACE inhibitors, NYHA, eGFR<60 mL/min, moderate-to-severe MR, 1 point for LVEF. The multiparametric score predicted all-cause mortality either in the derivation cohort (68.4% sensitivity, 79.5% specificity, area under the curve [AUC] 78.7%) or in the testing cohort (73.7% sensitivity, 71.3% specificity, AUC 77.2%). All-cause mortality significantly increased with increasing score both in the derivation and in the testing cohort (P < 0.0001). In conclusion, this multiparametric score is able to predict mortality in chronic systolic HF. PMID:23742144
Caryn Lerman; Robert T. Croyle; Kenneth P. Tercyak; Heidi Hamann
The number of inherited disorders and risk factors that can be detected through genetic testing is increasing rapidly, and genetic testing may soon become a common component of routine medical care. Is behavioral medicine ready? For the first time, a sophisti- cated understanding of gene-environment interactions as mani- fested in the interactions among an individual's genetic predispo- sitions, behavior, and
Walton, Gregory M; Spencer, Steven J
Past research has assumed that group differences in academic performance entirely reflect genuine differences in ability. In contrast, extending research on stereotype threat, we suggest that standard measures of academic performance are biased against non-Asian ethnic minorities and against women in quantitative fields. This bias results not from the content of performance measures, but from the context in which they are assessed-from psychological threats in common academic environments, which depress the performances of people targeted by negative intellectual stereotypes. Like the time of a track star running into a stiff headwind, such performances underestimate the true ability of stereotyped students. Two meta-analyses, combining data from 18,976 students in five countries, tested this latent-ability hypothesis. Both meta-analyses found that, under conditions that reduce psychological threat, stereotyped students performed better than nonstereotyped students at the same level of past performance. We discuss implications for the interpretation of and remedies for achievement gaps. PMID:19656335
Wilde, Elizabeth Ty; Hollister, Robinson
This study tested the performance of nonexperimental estimators of impacts applied to a class size reduction intervention with achievement test scores as the outcome. Nonexperimental estimates of impacts were compared to "true impact" estimates provided by a random-assignment design that assessed intervention effects. Data came from Project STAR,…
Wilde, Elizabeth Ty; Hollister, Robinson
In this study we test the performance of some nonexperimental estimators of impacts applied to an educational intervention--reduction in class size--where achievement test scores were the outcome. We compare the nonexperimental estimates of the impacts to "true impact" estimates provided by a random-assignment design used to assess the…
Elisha Ofiram; Timothy A. Garvey; James D. Schwender; Francis Denis; Joseph H. Perra; Ensor E. Transfeldt; Robert B. Winter; Jill M. Wroblewski
Background The lack of a widely available scoring system for cervical degenerative spondylosis encouraged the authors to establish and\\u000a validate a systematic quantitative radiographic index.\\u000a \\u000a \\u000a \\u000a Materials and methods This study included intraobserver and interobserver reliability testing among three reviewers with different years of experience.\\u000a Each observer independently scored four cervical radiographs of 48 patients at separate intervals, and statistical analysis\\u000a of the
Minov, Jordan; Karadzinska-Bislimovska, Jovanka; Vasilevska, Kristin; Stoleski, Saso; Mijakoski, Dragan
Introduction : COPD Assessment Test (CAT) is an 8-items questionnaire for assessment of health status in patients with chronic obstructive pulmonary disease (COPD). Objective : To evaluate the course of CAT scores during bacterial exacerbations of COPD treated in outpatient setting. Methods : We performed an observational, prospective study including 81 outpatients (57 males and 24 females, aged 43 to 74 years) with bacterial exacerbation of COPD. All participants completed CAT at initial visit (i.e. at the time of diagnosis of exacerbation and beginning of its treatment), 10 and 30 days after initial visit. Mean scores of each item, as well as the overall mean score, at these time points were compared. Results : The mean scores for each CAT question at initial visit varied from 2.6 to 3.5, whereas the mean scores for each CAT question 10 days after initial visit varied from 1.7 to 2.6. We registered significant reduction of the mean overall CAT score 10 days after initial visit as compared to its value at initial visit of 6.9 ± 2.7 points (16.8 vs 23.7; P < 0.001). The mean scores for each CAT question 30 days after initial visit varied from 1.3 to 2.4. We registered reduction of mean overall CAT score 30 days after initial visit as compared to its score 10 days after initial visit of 2.9 ± 1.2 points (13.9 vs 16.8; P < 0.005). The mean overall CAT score 30 days after initial visit was reduced for 9.8 ± 4.5 points as compared to its value at initial visit (13.9 vs 23.7; P < 0.001). Conclusion : We found significant improvement in the patient’s health status during recovery from exacerbation as compared to their health status at the time of exacerbation confirming the CAT as an effective tool to measure health status in patients with COPD. PMID:25893024
Davison, Mark L.; Jew, Gilbert B.; Davenport, Ernest C., Jr.
Using Baccalaureate and Beyond 2001 data, we found that STEM major was associated with an SAT pattern less common among females than males, in which the student's quantitative score exceeded the verbal score. Verbal ability was negatively associated with STEM major. Implications for career theory and test interpretation are discussed.
The purpose of this study was to examine the longitudinal impacts of the Science Writing Heuristic (SWH) approach on student science achievement measured by the Iowa Test of Basic Skills (ITBS). A number of studies have reported positive impact of an inquiry-based instruction on student achievement, critical thinking skills, reasoning skills, attitude toward science, etc. So far, studies have focused on exploring how an intervention affects student achievement using teacher/researcher-generated measurement. Only a few studies have attempted to explore the long-term impacts of an intervention on student science achievement measured by standardized tests. The students' science and reading ITBS data was collected from 2000 to 2011 from a school district which had adopted the SWH approach as the main approach in science classrooms since 2002. The data consisted of 12,350 data points from 3,039 students. The multilevel model for change with discontinuity in elevation and slope technique was used to analyze changes in student science achievement growth trajectories prior and after adopting the SWH approach. The results showed that the SWH approach positively impacted students by initially raising science achievement scores. The initial impact was maintained and gradually increased when students were continuously exposed to the SWH approach. Disadvantaged students who were at risk of having low science achievement had bigger benefits from experience with the SWH approach. As a result, existing problematic achievement gaps were narrowed down. Moreover, students who started experience with the SWH approach as early as elementary school seemed to have better science achievement growth compared to students who started experiencing with the SWH approach only in high school. The results found in this study not only confirmed the positive impacts of the SWH approach on student achievement, but also demonstrated additive impacts found when students had longitudinal experiences with the approach. By engaging in the argument-based classrooms where teachers value students' prior knowledge, encourage students to take control of their learning, and provide non-threatening environment for students to developing big ideas through negotiation, student's achievement can be enhanced. The results also started to shed some light on sustainability of the SWH approach within the school district.
Feliz-Rodriguez, Darwin; Zudaire, Santiago; Carpio, Carlos; Martínez, Elizabet; Gómez-Mendieta, Antonia; Santiago, Ana; Alvarez-Sala, Rodolfo; García-Río, Francisco
BACKGROUND: An adequate evaluation of exacerbations is a primary objective in managing patients with chronic obstructive pulmonary disease (COPD). OBJECTIVES: To define the profile of health status recovery during severe exacerbations of COPD using the COPD Assessment Test (CAT) questionnaire and to evaluate its prognostic value. METHODS: Forty-five patients with previous COPD diagnoses who were hospitalized due to severe exacerbation(s) were included in the study. These patients were treated by their respective physicians following current recommendations; health status was assessed daily using the CAT questionnaire. The CAT score, spirometry and recurrent hospitalizations were recorded one and three months after hospital discharge. RESULTS: Global initiative for chronic Obstructive Lung Disease (GOLD) stage was an independent determinant for increased CAT score during the first days of exacerbation with respect to postexacerbation values. From hospitalization day 5, the CAT score was similar to that obtained in the stable phase. Body mass index, GOLD stage and education level were related to health status recovery pattern. CAT score increase and the area under the curve of CAT recovery were inversely related to the forced expiratory volume in 1 s achieved three months after discharge (r= ?0.606; P<0.001 and r= ?0.532; P<0.001, respectively). Patients with recurrent hospitalizations showed higher CAT score increases and slower recovery. CONCLUSIONS: The CAT detects early health status improvement during severe COPD exacerbations. Its initial worsening and recovery pattern are related to lung function and recurrent hospitalizations. PMID:24093119
Martinez, Josue G; Carroll, Raymond J; Muller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan
We consider the problem of score testing for certain low dimensional parameters of interest in a model that could include finite but high dimensional secondary covariates and associated nuisance parameters. We investigate the possibility of the potential gain in power by reducing the dimensionality of the secondary variables via oracle estimators such as the Adaptive Lasso. As an application, we use a recently developed framework for score tests of association of a disease outcome with an exposure of interest in the presence of a possible interaction of the exposure with other co-factors of the model. We derive the local power of such tests and show that if the primary and secondary predictors are independent, then having an oracle estimator does not improve the local power of the score test. Conversely, if they are dependent, there is the potential for power gain. Simulations are used to validate the theoretical results and explore the extent of correlation needed between the primary and secondary covariates to observe an improvement of the power of the test by using the oracle estimator. Our conclusions are likely to hold more generally beyond the model of interactions considered here. PMID:20405045
Jones, Tracy Anne
Researchers are increasingly aware of the role of spatial skills in preparing children for future mathematics achievement (National Mathematics Advisory Panel, 2008). In addition, sex differences have been consistently documented showing boys score higher than girls in assessments of spatial ability, particularly mental rotation (Linn & Peterson,…
Nagle, Barry T.
Out-of-School Time programs and their impact on standardized college entrance exam scores for black or African-American children of single parents who have applied for a competitive college scholarship program is the study focus. Study importance is supported by the large percentage of black children raised by single parents, the large percentage…
Dietrich, Cecile C.; Lichtenberger, Eric J.
Research studies have been ambivalent about whether enrolling in community college makes completing a bachelor's degree less likely than directly enrolling in a four-year institution. This study uses propensity score matching with a posttreatment adjustment to determine the treatment effect associated with taking the community college to…
Lefgren, Lars; Sims, David
This article develops a simple model of teacher value-added to show how efficient use of information across subjects can improve the predictive ability of value-added models. Using matched student-teacher data from North Carolina, we show that the optimal use of math and reading scores improves the fit of prediction models of overall future…
Waldfogel, Jane; Zhai, Fuhua
This study examines the effects of public preschool expenditures on the math and science scores of 4th graders, holding constant child, family, and school characteristics, other relevant social expenditures, and country and year effects, in 7 Organisation for Economic Co-operation and Development (OECD) countries--Australia, Japan, the…
Center on Education Policy, 2011
This paper profiles Rhode Island's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 445 for non-Title I students and 435 for Title I students. In 2009, the mean scale score in 4th grade reading was 448 for non-Title I students and 440 for Title I students. Between 2006 and 2009, the mean…
Center on Education Policy, 2011
This paper profiles North Carolina's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade math test was 351 for non-Title I students and 347 for Title I students. In 2009, the mean scale score in 4th grade math was 354 for non-Title I students and 350 for Title I students. Between 2006 and 2009, the mean scale…
Center on Education Policy, 2011
This paper profiles Tennessee's test score trends through 2008-09. In 2004, the mean scale score on the state 4th grade reading test was 501 for non-Title I students and 486 for Title I students. In 2009, the mean scale score in 4th grade reading was 512 for non-Title I students and 495 for Title I students. Between 2004 and 2009, the mean scale…
Center on Education Policy, 2011
This paper profiles Missouri's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 661 for non-Title I students and 642 for Title I students. In 2009, the mean scale score in 4th grade reading was 661 for non-Title I students and 648 for Title I students. Between 2006 and 2009, there was no…
Center on Education Policy, 2011
This paper profiles Kentucky's test score trends through 2008-09. In 2007, the mean scale score on the state 4th grade reading test was 455 for non-Title I students and 451 for Title I students. In 2009, the mean scale score in 4th grade reading was 455 for non-Title I students and 451 for Title I students. Between 2007 and 2009, the mean scale…
Center on Education Policy, 2011
This paper profiles Colorado's test score trends through 2008-09. In 2003, the mean scale score on the state 4th grade reading test was 598 for non-Title I students and 558 for Title I students. In 2009, the mean scale score in 4th grade reading was 599 for non-Title I students and 556 for Title I students. Between 2003 and 2009, the mean scale…
Center on Education Policy, 2011
This paper profiles New Hampshire's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 445 for non-Title I students and 438 for Title I students. In 2009, the mean scale score in 4th grade reading was 448 for non-Title I students and 441 for Title I students. Between 2006 and 2009, the mean…
Center on Education Policy, 2011
This paper profiles Texas's test score trends through 2008-09. In 2005, the mean scale score on the state 4th grade reading test was 2297 for non-Title I students and 2207 for Title I students. In 2009, the mean scale score in 4th grade reading was 2334 for non-Title I students and 2235 for Title I students. Between 2005 and 2009, the mean scale…
Center on Education Policy, 2011
This paper profiles Pennsylvania's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 1390 for non-Title I students and 1220 for Title I students. In 2009, the mean scale score in 4th grade reading was 1420 for non-Title I students and 1270 for Title I students. Between 2006 and 2009, the mean…
Center on Education Policy, 2011
This paper profiles Delaware's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 474 for non-Title I students and 464 for Title I students. In 2009, the mean scale score in 4th grade reading was 478 for non-Title I students and 467 for Title I students. Between 2006 and 2009, the mean scale…
Center on Education Policy, 2011
This paper profiles Maryland's test score trends through 2008-09. In 2004, 82% of non-Title I 4th graders and 61% of Title I 4th graders scored at the proficient level on the state reading test. In 2009, 90% of non-Title I 4th graders and 78% of Title I 4th graders scored at the proficient level in reading. Between 2004 and 2009, the percentage…
Center on Education Policy, 2011
This paper profiles Massachusetts's test score trends through 2008-09. In 2006, 59% of non-Title I 4th graders and 29% of Title I 4th graders scored at the proficient level on the state reading test. In 2009, 64% of non-Title I 4th graders and 31% of Title I 4th graders scored at the proficient level in reading. Between 2006 and 2009, the…
Schmidt, Frank L.; Hunter, John E.
Two competing definitions of test fairness and their differing implications are explained and illustrated, and data from a number of published studies on test bias are reanalyzed; the focus is on unfair bias that may exist in tests that are approximately equally valid for both majority and minority groups. (Author/JM)
Bodden, Jamie G; Needham, Robert A; Chockalingam, Nachiappan
This study assessed the basic fundamental movements of mixed martial arts (MMA) athletes using the functional movement screen (FMS) assessment and determined if an intervention program was successful at improving results. Participants were placed into 1 of the 2 groups: intervention and control groups. The intervention group was required to complete a corrective exercise program 4 times per week, and all participants were asked to continue their usual MMA training routine. A mid-intervention FMS test was included to examine if successful results were noticed sooner than the 8-week period. Results highlighted differences in FMS test scores between the control group and intervention group (p = 0.006). Post hoc testing revealed a significant increase in the FMS score of the intervention group between weeks 0 and 8 (p = 0.00) and weeks 0 and 4 (p = 0.00) and no significant increase between weeks 4 and 8 (p = 1.00). A ? analysis revealed that the intervention group participants were more likely to have an FMS score >14 than participants in the control group at week 4 (? = 7.29, p < 0.01) and week 8 (? = 5.2, p ? 0.05). Finally, a greater number of participants in the intervention group were free from asymmetry at week 4 and week 8 compared with the initial test period. The results of the study suggested that a 4-week intervention program was sufficient at improving FMS scores. Most if not all, the movements covered on the FMS relate to many aspects of MMA training. The knowledge that the FMS can identify movement dysfunctions and, furthermore, the fact that the issues can be improved through a standardized intervention program could be advantageous to MMA coaches, thus, providing the opportunity to adapt and implement new additions to training programs. PMID:23860293
Martinez, Edwin E.
This study examines the impact of instrumental music study and group chess lessons on the standardized test scores of suburban elementary public school students (grades three through five) in Levittown, New York. The study divides the students into the following groups and compares the standardized test scores of each: a) instrumental music…
Wilcox, Rand R.
The problem of determining an optimal passing score for a mastery test is discussed, when the purpose of the test is to predict success on an external criterion. For the case of constant losses for the two possible error types, a method for determining passing scores is derived. (Author/JKS)
Childs, Ruth A.; Dunn, Jennifer L.; van Barneveld, Christina; Jaciw, Andrew P.
This study compares five scoring approaches for a test of clinical reasoning skills. All of the approaches incorporate information about the correct item responses selected and the errors, such as selecting too many responses or selecting a response that is inappropriate and/or harmful to the patient. The approaches are combinations of theoretical…
Shariff, Zalilah Mohd; Yasin, Zaidah Mohamed
A total of 107 Malay primary school girls (8-9 yr. old) completed a set of measurements on eating behavior (ChEAT, food neophobia scales, and dieting experience), the Rosenberg Self-Esteem Scale, body shape satisfaction, dietary intake, weight, and height. About 38% of the girls scored 20 and more on the ChEAT, and 46% of them reported dieting by reducing sugar and sweets (73%), skipping meals (67%), reducing fat foods (60%) and snacks (53%) as the most frequent methods practiced. In general, those girls with higher ChEAT scores tended to have lower self-esteem (r=.39), indicating they were more unwilling to try new foods (food neophobic) (r=.29), chose a smaller figure for desired body size (r=-.25), and were more dissatisfied with their body size (r=.31). PMID:15974357
M. Skodak; O. L. Crissey
One-fourth of the stated vocational choices of 297 girl senior students from the pre-college, commercial, and general and home economics groups of two Flint, Michigan, high schools was in office work. The concentration of highest Strong scores was in stenography, office work, home-making, and nursing––4 occupations between which the Strong Blank does not discriminate adequately. Therefore the Strong Blank is
Vijaya L. Tirunahari; Syed A. Zaidi; Rakesh Sharma; Joan Skurnick; Hormoz Ashtyani
Study objective: To compare multiple sleep latency test (MSLT) and scoring of microsleep (presence of sleep electroencephalograph between 3 and 15s in an epoch) as a diagnostic test for excessive daytime sleepiness (EDS).Design: A retrospective study.Setting: Sleep center at a tertiary care teaching hospital.Subjects: Patients referred to a sleep center who had an MSLT and one or more of the
Lordo, R A; Feder, P I; Gettings, S D
The Cosmetic, Toiletry, and Fragrance Association (CTFA) Evaluation of Alternatives Program comprised a multi-phased study of the relationship between Draize eye irritation test data and comparable data from a selection of promising alternative (in vitro) tests. The CTFA Program was designed to determine the effectiveness and limitations of several in vitro tests over a range of different cosmetic and personal-care product types. Test materials constituted experimental formulations representative of three distinct product types. Each material was tested in vivo (according to a modified Draize eye irritation test protocol) and in vitro (according to one of up to forty different protocols). A statistical ranking and selection procedure ("concordance analysis") was used to identify those in vitro tests where the relationships between in vitro and in vivo score was sufficiently well defined to warrant further statistical analysis. In vitro test performance was then evaluated by regression modelling of these relationships. Maximum average Draize score (MAS) was utilized as the primary quantitative measure of eye irritation potential in vivo. The goodness-of-fit of the observed data to the regression model and comparison of the magnitude of upper and lower prediction-bounds on the range of probable MAS values associated with the regression model fit (prediction intervals) provide a means by which the performance of each in vitro test may be measured relative to Draize test outcome. The narrower the prediction interval (i.e. the more precise the fit), the more predictive of in vivo score (MAS) is the in vitro test result. The prediction interval thus represents uncertainty associated with Draize test prediction. Such uncertainty depends heavily on the degree of irritancy. In Phases I and II, the widths of the prediction intervals were narrowest in the region corresponding to low irritation potential; increasing widths were observed as irritation potential increased. In Phase III, relatively narrow prediction interval widths were observed at both the low and high end of the observed range of irritation potential; wider intervals were observed in the middle of the observed range. In general, the selected endpoints in each phase had similar average prediction interval widths and thereby differed only slightly in their ability to predict MAS to a given level of precision; any differences between endpoints tended to occur at the low and/or high ends of the observed range of irritation potential. The primary contributor to total variability associated with prediction of MAS is the deviation between the Draize score as observed in the laboratory and what is predicted by the model for a given formulation. Consistently, this component is responsible for 70% to 95% of the total variability. The other components (i.e. variability among replicate MAS and in vitro scores) could be reduced simply by increasing the number of replicate tests performed on each test formulation. However, this would have relatively little impact on the overall precision of prediction. PMID:20654467
Objective To determine the effect of clinical scores that predict streptococcal infection or rapid streptococcal antigen detection tests compared with delayed antibiotic prescribing. Design Open adaptive pragmatic parallel group randomised controlled trial. Setting Primary care in United Kingdom. Patients Patients aged ?3 with acute sore throat. Intervention An internet programme randomised patients to targeted antibiotic use according to: delayed antibiotics (the comparator group for analyses), clinical score, or antigen test used according to clinical score. During the trial a preliminary streptococcal score (score 1, n=1129) was replaced by a more consistent score (score 2, n=631; features: fever during previous 24 hours; purulence; attends rapidly (within three days after onset of symptoms); inflamed tonsils; no cough/coryza (acronym FeverPAIN). Outcomes Symptom severity reported by patients on a 7 point Likert scale (mean severity of sore throat/difficulty swallowing for days two to four after the consultation (primary outcome)), duration of symptoms, use of antibiotics. Results For score 1 there were no significant differences between groups. For score 2, symptom severity was documented in 80% (168/207 (81%) in delayed antibiotics group; 168/211 (80%) in clinical score group; 166/213 (78%) in antigen test group). Reported severity of symptoms was lower in the clinical score group (?0.33, 95% confidence interval ?0.64 to ?0.02; P=0.04), equivalent to one in three rating sore throat a slight versus moderate problem, with a similar reduction for the antigen test group (?0.30, ?0.61 to ?0.00; P=0.05). Symptoms rated moderately bad or worse resolved significantly faster in the clinical score group (hazard ratio 1.30, 95% confidence interval 1.03 to 1.63) but not the antigen test group (1.11, 0.88 to 1.40). In the delayed antibiotics group, 75/164 (46%) used antibiotics. Use of antibiotics in the clinical score group (60/161) was 29% lower (adjusted risk ratio 0.71, 95% confidence interval 0.50 to 0.95; P=0.02) and in the antigen test group (58/164) was 27% lower (0.73, 0.52 to 0.98; P=0.03). There were no significant differences in complications or reconsultations. Conclusion Targeted use of antibiotics for acute sore throat with a clinical score improves reported symptoms and reduces antibiotic use. Antigen tests used according to a clinical score provide similar benefits but with no clear advantages over a clinical score alone. Trial registration ISRCTN32027234 PMID:24114306
?ahin, Hilal; Pekçevik, Yeliz
PURPOSE Computed tomography (CT) angiography emerges as a viable alternative technique for confirmation of brain death. However, evaluation criteria are not well established for demonstration of cerebral circulatory arrest. This retrospective study aimed to evaluate CT angiography scoring systems in diagnosis of brain death, review the literature, and compare interobserver agreement between different scales for the diagnosis of brain death. METHODS CT angiography examinations of 25 patients with a clinical diagnosis of brain death were reevaluated according to 10-, 7-, and 4-point scales. Exams were performed with a 64-slice CT scanner including unenhanced, arterial (20 s) and venous phase (60 s) scans. Subtraction images of both phases were obtained. Interobserver agreement was evaluated for the assessment of vessel opacification and diagnosis of brain death. RESULTS According to 10-, 7-, and 4-point scales; 13, 16, and 22 of 25 patients had full score, respectively. Using the clinical exam as the reference standard, sensitivities obtained for 10-, 7-, and 4-point scales were 52%, 64%, and 88%, respectively. Percent agreement between readers was 100% for 10- and 7-point scales and 88% for 4-point scale. Percent agreement for opacification of scale vessels was equally high for all three scales (93.6%, 93.7%, 91% for 10-, 7-, and 4-point scales, respectively). CONCLUSION The 4-point scale appears to be more sensitive than the 10-and 7-point scales in CT angiography evaluation for brain death. Interobserver agreement is high for all three scales when subtraction images are used. PMID:25698093
Anderson, Paul S.
The Multi-Digit Technologies (MDT) testing technique is discussed as the first major advance in computer assisted testing in several decades. The MDT testing method uses fill-in-the-blank or completion-type questions, with an alphabetized long list of possible responses. An MDT answer sheet is used to record the code number of the answer. For…
Christian, Veronica Faye
The No Child Left Behind Act emphasized the responsibility of states to improve student academic performance. In one state, students are required to take subject-area tests and master each test to graduate; however, in some schools, many students are failing the English II test administered during students' sophomore year. Two districts have…
Versant tests are automated spoken language tests that are taken on the telephone or computer. If you would like to listen to a sample test, purchase a practice test, or view the test score after taking the test (if applicable), please visit
Versant tests are automated spoken language tests that are taken on the telephone or computer. If you would like to listen to a sample test, purchase a practice test, or view the test score after taking the test (if applicable), please visit www.VersantTest.com PART INSTRUCTIONS · Carefully read
Rodrigues, Clarissa Guimaraes; Rios-Neto, Eduardo Luiz Goncalves; de Xavier Pinto, Cristine Campos
In Brazil, the mean of math test scores for students of the fourth grade declined by approximately 0.2 standard deviation in the late 1990s. However, the potential changes in the distribution of scores have never been addressed. It is unclear if the decline was caused by deterioration in student performance levels at the upper and/or lower tails…
P. Medina-Pastor; M. Mezcua; C. Rodríguez-Torreblanca; A. R. Fernández-Alba
The obligation for accredited laboratories to participate in proficiency tests under ISO 17025, performing multiresidue methods\\u000a (MRMs) for pesticide residues, involves the reporting of a large number of individual z scores making the evaluation of the overall performance of the laboratories difficult. It entails, time and again, the need\\u000a for ways to summarise the laboratory’s overall assessment into a unique
McLaughlin, J. Patrick; White, Jason T.
Outcomes measurements have always been an important part of proving to outside constituencies how you "measure up" to other schools with your business programs. A common nationally-normed exam that is used is the Major Field Achievement Test in Business from Educational Testing Services. Our paper discusses some guidelines that we are "pilot…
Munoz, Carolyn Sue
The purpose of this study was to identify the impact intensive reading instruction had for 28 students with learning disabilities at the middle school level on standardized tests. National Assessment of Education Progress testing indicates that across the United States, learning disabled students literacy skills are decreasing annually, and these…
Bornheimer, Deane G.
Performance of limited-English speaking graduate school applicants on the Prueba de Admision para Estudios Graduados aptitude test is compared with Graduate Record Examination results, and the validity of the two tests as predictors of academic success for bilingual doctoral students in the New York University Puerto Rico program is examined. (MSE)
Bishop, Dorothy V. M.; McDonald, David
Background: Children who meet language test criteria for specific language impairment (SLI) are not necessarily the same as those who are referred to a speech and language therapist. Aims: To consider how far this discrepancy reflects insensitivity of traditional language tests to clinically important features of language impairment. Methods &…
Kelly, Colleen; And Others
The SPINE test (SPeech INtelligibility Evaluation), designed to measure speech intelligibility of severely to profoundly hearing-impaired children was administered to 30 hearing-impaired children (12-16 years old) to examine its validity. Results suggested that the SPINE test is a valid measure of speech intelligibility with hearing-impaired…
Williams, W. Larry; Weil, Timothy M.; Porter, James C. K.
Guided notes were employed in two undergraduate Psychology courses involving 71 students. The study design utilized an alternating treatments format to compare Traditional Lectures with Guided Notes lectures. In one of the two courses, tests were administered after each class lecture, whereas the same type of test was administered at the beginning…
Schulz, E. Matthew; Wang, Lin
In this study, items were drawn from a full-length test of 30 items in order to construct shorter tests for the purpose of making accurate pass/fail classifications with regard to a specific criterion point on the latent ability metric. A three-item parameter Item Response Theory (IRT) framework was used. The criterion point on the latent ability…
Samiran Sinha; Bhramar Mukherjee
Summary The paper considers the problem of determining the number of matched sets in 1 : M matched case- control studies with a categorical exposure having k þ 1 categories, k ? 1. The basic interest lies in constructing a test statistic to test whether the exposure is associated with the disease. Estimates of the k odds ratios for 1
Hahn, Jinsoo; Jang, Kyungho
International comparisons of economic understanding generally require a translation of a standardized test written in English into another language. Test results can differ based on how researchers translate the English written exam into one in their own language. To confirm this hypothesis, two differently translated versions of the "Basic…
Tatsuoka, Kikumi K.; Tatsuoka, Maurice M.
The family of Weibull distributions was investigated as a model for the distributions of response times for items in computer-based criterion-referenced tests. The fit of these distributions were, with a few exceptions, good to excellent according to the Kolmogorov-Smirnov test. For a few relatively simple items, the two-parameter gamma…
Homack, Susan Rae
?????????????????????????????????????................ LIST OF TABLES TABLE Page 1 Demographics Among Groups of Children with ADHD and No Diagnosis............................................................................................................................................. 2 Conners...' Continuous Performance Test-II Variable Means for Children with ADHD and No Diagnosis.......................................................................................................................................... 3 Gordon Diagnostic System...
The current study seeks to replicate and extend the research on the effects of implicit theories of intelligence. Study 1 tested the hypothesis that an incremental theory- a belief that intelligence is malleable would result in mastery...
Homack, Susan Rae
Today, there are numerous versions of the continuous performance test (CPT) used in clinical and research settings. Although CPTs may constitute a similar group of tasks with a common paradigm, they are very different in the parameters they measure...
Unger, Nathan R; Gauthier, Timothy P; Cheung, Linda W
As the progression of multidrug-resistant organisms and lack of novel antibiotics move us closer toward a potential postantibiotic era, it is paramount to preserve the longevity of current therapeutic agents. Moreover, novel interventions for antimicrobial stewardship programs are integral to combating antimicrobial resistance worldwide. One unique method that may decrease the use of second-line antibiotics (e.g., fluoroquinolones, vancomycin) while facilitating access to a preferred ?-lactam regimen in numerous health care settings is a penicillin skin test. Provided that up to 10% of patients have a reported penicillin allergy, of whom ~10% have true IgE-mediated hypersensitivity, significant potential exists to utilize a penicillin skin test to safely identify those who may receive penicillin or a ?-lactam antibiotic. In this article, we provide information on the background, associated costs, currently available literature, pharmacists' role, antimicrobial stewardship implications, potential barriers, and misconceptions, as well as future directions associated with the penicillin skin test. PMID:23712569
Geluk, Christiane A; Dikkers, Riksta; Kors, Jan A; Tio, René A; Slart, Riemer HJA; Vliegenthart, Rozemarijn; Hillege, Hans L; Willems, Tineke P; de Jong, Paul E; van Gilst, Wiek H; Oudkerk, Matthijs; Zijlstra, Felix
Background Asymptomatic subjects at intermediate coronary risk may need diagnostic testing for risk stratification. Both measurement of coronary calcium scores and exercise testing are well established tests for this purpose. However, it is not clear which test should be preferred as initial diagnostic test. We evaluated the prevalence of documented coronary artery disease (CAD) according to calcium scores and exercise test results. Methods Asymptomatic subjects with ST-T changes on a rest ECG were selected from the population based PREVEND cohort study and underwent measurement of calcium scores by electron beam tomography and exercise testing. With calcium scores ?10 or a positive exercise test, myocardial perfusion imaging (MPS) or coronary angiography (CAG) was recommended. The primary endpoint was documented obstructive CAD (?50% stenosis). Results Of 153 subjects included, 149 subjects completed the study protocol. Calcium scores ?400, 100–399, 10–99 and <10 were found in 16, 29, 18 and 86 subjects and the primary endpoint was present in 11 (69%), 12 (41%), 0 (0%) and 1 (1%) subjects, respectively. A positive, nondiagnostic and negative exercise test was present in 33, 27 and 89 subjects and the primary endpoint was present in 13 (39%), 5 (19%) and 6 (7%) subjects, respectively. Receiver operator characteristics analysis showed that the area under the curve, as measure of diagnostic yield, of 0.91 (95% CI 0.84–0.97) for calcium scores was superior to 0.74 (95% CI 0.64–0.83) for exercise testing (p = 0.004). Conclusion Measurement of coronary calcium scores is an appropriate initial non-invasive test in asymptomatic subjects at increased coronary risk. PMID:17629903
Gambichler, Thilo; Moussa, Georg; Sand, Michael; Sand, Daniel; Orlikov, Alexei; Altmeyer, Peter; Hoffmann, Klaus
Noninvasive imaging techniques might be of particular diagnostic value for studying and monitoring cutaneous inflammatory conditions such as contact dermatitis. We evaluate acute allergic contact dermatitis (AACD) by means of optical coherence tomography (OCT) and correlate the clinical grading of patch test reactions with the findings obtained from OCT. Twenty positive patch test reactions (+, n = 6; ++, n = 7; +++, n = 7) are investigated using a conventional OCT scanner. In comparison to the control sites, OCT of AACD showed pronounced skin folds, thickened and/or disrupted entrance signals, and a significant increase in epidermal thickness. Moreover, clearly demarcated signal-free cavities within the epidermis and considerable reduction of dermal reflectivity are demonstrated by OCT. Notably, the latter findings strongly correlate with the clinical patch test grading. OCT may be a useful tool for visualization of micromorphological features of AACD. However, before OCT can be employed as an objective parameter in grading severity of patch test reactions, larger studies are required that correlate clinical patch test readings and OCT findings with histopathology. PMID:16409095
In a 1977 review of the literature on test answer changing, Mueller and Wasser (EJ 163 236) cited 17 studies and concluded that students changing answers on objective tests gain more points than they lost by so doing. Higher scoring students tend to gain more than do the lower scoring students. Six additional studies not reported in the Mueller…
Ho, Andrew D.; Yu, Carol C.
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…
In his thoughtful focus article, Haertel (this issue) pushes testing experts to broaden the scope of their validation efforts and to invite scholars from other disciplines to join them. He credits existing validation frameworks for helping the measurement community to identify incomplete or nonexistent validity arguments. However, he notes his…
...unnecessary further testing and surgery due to false positive...that alerts users to the risk associated with off-label...or not to proceed with surgery. In the Federal Register...warning to address the risk of off-label use...Federal Food, Drug, and Cosmetic Act (FD&C...
...unnecessary further testing and surgery due to false positive...that alerts users to the risk associated with off-label...or not to proceed with surgery. While FDA is establishing...warning to address the risk of off-label use...Federal Food, Drug, and Cosmetic Act (FD&C...
Allen, Denise A.
Little empirical evidence suggested that independent reading abilities of students enrolled in biology predicted their performance on the Biology I Graduation End-of-Course Assessment (ECA). An archival study was conducted at one Indiana urban public high school in Indianapolis, Indiana, by examining existing educational assessment data to test…
Council, Forrest M.; And Others
The study compared the driver licensing test performance of two groups of driver education students: those involved in North Carolina's multi-vehicle range program and students in the "30 and 6" program (30 hours of class instruction and six hours of "behind the whell" instruction). It evaluated the performance of 3,049 applicants (all aged 16 and…
Sederberg, Per B.
of training-induced gains on fluid intelligence tests have fueled an explosion of interest in cognitive intelligence gain is not the only possible explanation for the observed control-adjusted far transfer across of eye movement data from 35 par- ticipants solving Raven's Advanced Progressive Matrices on two separate
Zajac, David J.
Purpose: To determine if children with repaired cleft palate and normal velopharyngeal (VP) closure as determined by aerodynamic testing exhibit greater acoustic nasalance than control children without cleft palate. Method: Pressure-flow procedures were used to identify 2 groups of children based on VP closure during the production of /p/ in the…
--------------------------------------------------------- 11 Psychometric Theories of Intelligence -------------------------------------------- 14 Cattell-Horn Fluid and Crystallized Intelligence -------------------------------- 18 Carroll?s Three-Stratum Hierarchy...-based intelligence tests have been preferred by the educators for more than a century, due to the aforementioned advantages. Psychometric Theories of Intelligence McGrew and Flanagan (1998) noted three different research traditions in the structural analysis...
Welsh, Megan E.; D'Agostino, Jerome V.; Kaniskan, Burcu
Standards-based progress reports (SBPRs) require teachers to grade students using the performance levels reported by state tests and are an increasingly popular report card format. They may help to increase teacher familiarity with state standards, encourage teachers to exclude nonacademic factors from grades, and/or improve communication with…
Educational stakeholders are aware that school administration has become an incredibly intricate dynamic that is too complex for principals to handle alone. Test-driven accountability has made the already daunting task of school administration even more challenging. Distributed leadership presents an opportunity to explore increased leadership…
Thomas, Antoinette D.
Empirical findings have supported an inverse relationship between closeness to extended family and friends versus spouse. The three foregoing interpersonal relationships in terms of affective quality, direction and dominance were investigated, using an objective test as well as the TAT. Bellak (1986) considered that the strength of the TAT lies in…
Hofer, Manfred; Kuhnle, Claudia; Kilian, Britta; Fries, Stefan
The predictive power of cognitive ability and self-control strength for self-reported grades and an achievement test were studied. It was expected that the variables use of time structure, academic procrastination, and motivational interference during learning further aid in predicting students' achievement because they are operative in situations…
Daniel N. Allen; Joshua E. Caron; Lisa A. Duke; Gerald Goldstein
Recent factor-analytic studies of the Halstead Category Test (HCT) indicate that its seven subtests form three factors including a Counting factor (subtests I and II), a Spatial Positional Reasoning factor (subtests III, IV, and VII), and a Proportional Reasoning factor (subtests V, VI, and VII). The sensitivity and specificity of these factors to heterogeneous forms of brain damage was examined
Tuteja, Sony; Haynes, Kevin; Zayac, Cara; Sprague, Jon E; Bernhardt, Barbara; Pyeritz, Reed
Aim To examine community pharmacists’ attitudes towards pharmacogenetic (PGx) testing, including their views of the clinical utility of PGx and the ethical, social, legal and practical implications of PGx testing. Methods A web-based survey administered to 5600 licensed community pharmacists in the states of Ohio and Pennsylvania (USA). Results Of 580 respondents, 78% had a Bachelor of Science degree in pharmacy and 58% worked in a chain drug store. Doctors of pharmacy-trained pharmacists had a significantly higher knowledge score than those with a Bachelor of Science in pharmacy (3.2 ± 0.9 vs 2.6 ± 0.6; p < 0.0001). All pharmacists had positive attitudes towards PGx and most (87%) felt it would decrease the number of adverse events, and optimize drug dosing. More than half (57%) of pharmacists felt that it was their role to counsel patients regarding PGx information. Many (65%) were concerned that PGx test results may be used to deny health insurance. Conclusion Regardless of the type of education, all pharmacists had positive attitudes towards PGx. There is still a concern among pharmacists that PGx test results may be used to deny health insurance and, thus, there is a need to educate pharmacists about legal protections prohibiting certain forms of unfair discrimination based on genotype. PMID:24409195
An examination of mathematics achievement as measured by standardized test scores and grade distribution among urban high schools to determine the relationship between student outcomes in key courses and standardized tests
Irene Gail Norde
The study examined the relationship between mathematics achievement as measured by standardized test scores and grade distribution among high schools in a large urban school district to determine if MEAP and MAT 7 scores reflect student outcomes in key courses. Statistical analysis was used to determine the relationship between student outcomes in key courses and standardized tests. ^ An ex
Development and administration of institutional ESL placement tests require a great deal of financial and human resources. Due to a steady increase in the number of international students studying in the United States, some US universities have started to consider using standardized test scores for ESL placement. The English Placement Test (EPT)…
Euliss, Ned H.; Mushet, David M.
In the prairie pothole region of North America, development of Indices of Biotic Integrity (IBIs) to detect anthropogenic impacts on wetlands has been hampered by naturally dynamic inter-annual climate fluctuations. Of multiple efforts to develop IBIs for prairie pothole wetlands, only one, the Index of Plant Community Integrity (IPCI), has reported success. We evaluated the IPCI and its ability to distinguish between natural and anthropogenic variation using plant community data collected from 16 wetlands over a 4-year-period. We found that under constant anthropogenic influence, IPCI metric scores and condition ratings varied annually in response to environmental variation driven primarily by natural climate variation. Artificially forcing wetlands that occur along continuous hydrologic gradients into a limited number of discrete classes (e.g., temporary, seasonal, and semi-permanent) further confounded the utility of IPCI metrics. Because IPCI scores vary significantly due to natural climate dynamics as well as human impacts, methodology must be developed that adequately partitions natural and anthropogenically induced variation along continuous hydrologic gradients. Until such methodology is developed, the use of the IPCI to evaluate prairie pothole wetlands creates potential for misdirected corrective or regulatory actions, impairment of natural wetland functional processes, and erosion of public confidence in the wetland sciences.
Euliss, Ned H.; Mushet, David M.
In the prairie pothole region of North America, development of Indices of Biotic Integrity (IBIs) to detect anthropogenic impacts on wetlands has been hampered by naturally dynamic inter-annual climate fluctuations. Of multiple efforts to develop IBIs for prairie pothole wetlands, only one, the Index of Plant Community Integrity (IPCI), has reported success. We evaluated the IPCI and its ability to distinguish between natural and anthropogenic variation using plant community data collected from 16 wetlands over a 4-year-period. We found that under constant anthropogenic influence, IPCI metric scores and condition ratings varied annually in response to environmental variation driven primarily by natural climate variation. Artificially forcing wetlands that occur along continuous hydrologic gradients into a limited number of discrete classes (e.g., temporary, seasonal, and semipermanent) further confounded the utility of IPCI metrics. Because IPCI scores vary significantly due to natural climate dynamics as well as human impacts, methodology must be developed that adequately partitions natural and anthropogenically induced variation along continuous hydrologic gradients. Until such methodology is developed, the use of the IPCI to evaluate prairie pothole wetlands creates potential formisdirected corrective or regulatory actions, impairment of natural wetland functional processes, and erosion of public confidence in the wetland sciences.
Bagust, Jeff; Docherty, Sharon; Haynes, Wayne; Telford, Richard; Isableu, Brice
The Rod and Frame Test has been used to assess the degree to which subjects rely on the visual frame of reference to perceive vertical (visual field dependence- independence perceptual style). Early investigations found children exhibited a wide range of alignment errors, which reduced as they matured. These studies used a mechanical Rod and Frame system, and presented only mean values of grouped data. The current study also considered changes in individual performance. Changes in rod alignment accuracy in 419 school children were measured using a computer-based Rod and Frame test. Each child was tested at school Grade 2 and retested in Grades 4 and 6. The results confirmed that children displayed a wide range of alignment errors, which decreased with age but did not reach the expected adult values. Although most children showed a decrease in frame dependency over the 4 years of the study, almost 20% had increased alignment errors suggesting that they were becoming more frame-dependent. Plots of individual variation (SD) against mean error allowed the sample to be divided into 4 groups; the majority with small errors and SDs; a group with small SDs, but alignments clustering around the frame angle of 18°; a group showing large errors in the opposite direction to the frame tilt; and a small number with large SDs whose alignment appeared to be random. The errors in the last 3 groups could largely be explained by alignment of the rod to different aspects of the frame. At corresponding ages females exhibited larger alignment errors than males although this did not reach statistical significance. This study confirms that children rely more heavily on the visual frame of reference for processing spatial orientation cues. Most become less frame-dependent as they mature, but there are considerable individual differences. PMID:23724139
Vadivelu, Nalini; Chen, Isabel L; Kodumudi, Vijay; Ortigosa, Esperanza; Gudin, Maria Teresa
In the treatment of pain management, physicians employ a variety of drugs, ranging from low-impact to highly potent, and to maximize patient health, urine toxicology analyses can significantly improve the delivery of pain treatment. Drugs such as opioids that are used for pain management are peculiar in that they provide effective pain relief and have a high risk of addiction. The use of illicit drugs in the general population has been on the rise; however, self-reporting and close monitoring of patient behavior are insufficient means to detect drug abuse and confirm compliance. Therefore, in order to create more effective drug treatment plans, physicians must understand and account for the implications of patient drug use history. Urine toxicology analysis is an important tool for pain physicians because it is more sensitive than most alternative blood tests, more efficient and cost-effective. Urine testing in addition to improving patient pain management also has forensic and legal implications. There are however limitations to urine toxicology methods as they can produce false-positive and false-negative results and are prone to human error and sample contamination There is also a need for more specific and rapid urine drug testing. Healthcare professionals should therefore be familiar with the limitations of various urine drug testing methods, and possess skills necessary to properly interpret these results. This review suggests that the overall benefits incurred by both the patient's short-term and long-term health support the routine integration of urine toxicology analysis in routine clinical care. In addition to improving health care and patient health, it has a strong potential to improve patient-physician relationships and protects the interest of involved healthcare practitioners. PMID:20394568
Taylor, Ann T S; Rogers, Jill Cellars
The development of classroom experiments where students examine their own DNA is frequently described as an innovative teaching practice. Often these experiences involve students analyzing their genes for various polymorphisms associated with disease states, like an increased risk for developing cancer. Such experiments can muddy the distinction between classroom investigation and medical testing. Although the goals and issues surrounding classroom genotyping do not directly align with those of clinical testing, instructors can use the guidelines and standards established by the medical genetics community when evaluating the ethics of human genotyping. We developed a laboratory investigation and discussion which allowed undergraduate science students to explore current DNA manipulation techniques to isolate their p53 gene, followed by a dialogue probing the ethical implications of examining their sample for various polymorphisms. Students never conducted genotyping on their samples because of the ethical concerns presented in this paper, so the discussion replaced the actual genetic testing in the class. A science faculty member led the laboratory portion, while a genetic counselor facilitated the discussion of the ethical concepts underlying genetic counseling: autonomy, beneficence, confidentiality, and justice. In their final papers, students demonstrated an understanding of the practice guidelines established by the genetics community and acknowledged the ethical considerations inherent in p53 genotyping. Given the burgeoning market for personalized medicine, teaching undergraduates about the psychosocial and ethical dimensions of human genetic testing is important and timely. Moreover, incorporating a genetic counselor in the classroom discussion provided a rich and dynamic discussion of human genetic testing. PMID:21774053
Ohno, Y; Kaneko, T; Inoue, T; Morikawa, Y; Yoshida, T; Fujii, A; Masuda, M; Ohno, T; Hayashi, M; Momma, J; Uchiyama, T; Chiba, K; Ikeda, N; Imanishi, Y; Itakagaki, H; Kakishima, H; Kasai, Y; Kurishita, A; Kojima, H; Matsukawa, K; Nakamura, T; Ohkoshi, K; Okumura, H; Saijo, K; Sakamoto, K; Suzuki, T; Takano, K; Tatsumi, H; Tani, N; Usami, M; Watanabe, R
A three-step interlaboratory validation of alternative methods to the Draize eye irritation test (Draize test) was conducted by the co-operation of 27 organizations including national research institutes, universities, cosmetic industries, kit suppliers and others. Twelve alternative methods were evaluated using 38 cosmetic ingredients and isotonic sodium chloride solution. Draize tests were conducted according to the OECD guidelines using the same lot of test substances as was evaluated in the alternative tests. Results were as follows. (1) Variation in Draize scores was large near the critical range (maximal average Draize total scores (MAS)=15-50) for the evaluation of cosmetic ingredients. (2) Interlaboratory variation was relatively small for the alternative tests. The mean coefficients of variation (CV%) were less than 50 for all assays except for the hen's egg-chorioallantoic membrane test (HET-CAM), chorioallantoic membrane-trypan blue staining test (CAM-TB) and haemoglobin denaturation test (HD). The CV% of these three methods came into the same range as the other tests when non-irritants were excluded from the data analysis. (3) Results for acids (pH of 10% solution <2.5), alkalis (pH of 10% solution >11.5) and alcohols (lower mono-ol) in cytotoxicity tests clearly deviated from the other samples in the comparison of cytotoxicity with Draize results. (4) Pearson's correlation coefficients (r) between results from cytotoxicity tests using serum and MAS were -0.86 to -0.92 for samples excluding acids, alkalis and alcohols. (5) When the samples were divided into liquids and powders, r of CAM-TB increased from 0.71 for all samples to 0.80 and 0.92, respectively. (6) Spearman's rank correlation coefficients between the results of alternative methods and MAS were relatively high (r>0.8) in the case of HET-CAM and CAM-TB. Those for cytotoxicity tests were high if the data for acids, alkalis and alcohols were excluded (SIRC-CVS: r=0.945, SIRC-NRU: r=0.931, HeLa-MTT: r=0.926, CHL-CVS: r=0.880). Exclusion of data for powdered samples also increased the coefficient of HET-CAM and CAM-TB to 0.831 and 0.863, respectively. These results suggest that no single method can constitute an evaluation system applicable to all types of test substances by itself. However, several methods will be useful for the prediction of eye irritation potential of cosmetic ingredients if they are used with clear understanding of the characteristics of those methods. PMID:20654468
Zou, Xiaoling; Zhang, Xuning
A new score report based on a mechanism of formative assessment and feedback is developed to offer individual testees not only their final scores but also their sub-scale scores, their percentile position, as well as corresponding feedback on self-regulation strategies. Structural equation modeling is adopted in the confirmatory factor analysis to…
Hulett, Judie L; Weiss, Robert E; Bwibo, Nimrod O; Galal, Osman M; Drorbaugh, Natalie; Neumann, Charlotte G
Micronutrient deficiencies and suboptimal energy intake are widespread in rural Kenya, with detrimental effects on child growth and development. Sporadic school feeding programmes rarely include animal source foods (ASF). In the present study, a cluster-randomised feeding trial was undertaken to determine the impact of snacks containing ASF on district-wide, end-term standardised school test scores and nutrient intake. A total of twelve primary schools were randomly assigned to one of three isoenergetic feeding groups (a local plant-based stew (githeri) with meat, githeri plus whole milk or githeri with added oil) or a control group receiving no intervention feeding. After the initial term that served as baseline, children were fed at school for five consecutive terms over two school years from 1999 to 2001. Longitudinal analysis was used controlling for average energy intake, school attendance, and baseline socio-economic status, age, sex and maternal literacy. Children in the Meat group showed significantly greater improvements in test scores than those in all the other groups, and the Milk group showed significantly greater improvements in test scores than the Plain Githeri (githeri+oil) and Control groups. Compared with the Control group, the Meat group showed significant improvements in test scores in Arithmetic, English, Kiembu, Kiswahili and Geography. The Milk group showed significant improvements compared with the Control group in test scores in English, Kiswahili, Geography and Science. Folate, Fe, available Fe, energy per body weight, vitamin B??, Zn and riboflavin intake were significant contributors to the change in test scores. The greater improvements in test scores of children receiving ASF indicate improved academic performance, which can result in greater academic achievement. PMID:24168874
Fehr, Charles Norman
Learning to read requires knowledge of word meanings for those words most commonly encountered in basic reading materials. Many young students lack the basic vocabulary knowledge needed to facilitate learning to read. Two randomized studies were conducted to test the effects of an online, computer-adaptive vocabulary instruction program designed…
Angela Anjorin; Helga Schmidt; Hans-Georg Posselt; Christina Smaczny; Hanns Ackermann; Michael Deimling; Thomas J. Vogl; Nasreddin Abolmaali
The aim of this study was to investigate whether the parenchymal lung damage in patients suffering from cystic fibrosis (CF)\\u000a can be equivalently quantified by the Chrispin-Norman (CN) scores determined with low-field magnetic resonance imaging (MRI)\\u000a and conventional chest radiography (CXR). Both scores were correlated with pulmonary function tests (PFT) and the Shwachman-Kulczycki\\u000a method (SKM). To evaluate the comparability of
Bond, Timothy N.; Lang, Kevin
Although both economists and psychometricians typically treat them as interval scales, test scores are reported using ordinal scales. Using the Early Childhood Longitudinal Study and the Children of the National Longitudinal Survey, we examine the effect of order-preserving scale transformations on the evolution of the black-white reading test…
Evans, Richard M.; Surkan, Alvin J.
The recent arrival of portable computer systems with high-level language interpreters now makes it practical to rapidly develop complex testing and scoring programs. These programs permit undergraduates access, at arbitrary times, to testing as an integral part of a mastery learning strategy. Effects of introducing the computer were studied by…
Angela L. Duckworth; Patrick D. Quinn; Eli Tsukayama
The increasing prominence of standardized testing to assess student learning motivated the current investigation. We propose that standardized achievement test scores assess competencies determined more by intelligence than by self-control, whereas report card grades assess competencies determined more by self-control than by intelligence. In particular, we suggest that intelligence helps students learn and solve problems independent of formal instruction, whereas
Papay, John P.; Murnane, Richard J.; Willett, John B.
Students receive abundant information about their educational performance, but how this information affects future educational-investment decisions is not well understood. Increasingly common sources of information are state-mandated standardized tests. On these tests, students receive a score and a label that summarizes their performance. Using a…
Sinharay, Sandip; Holland, Paul W.
The Non-Equivalent groups with Anchor Test (NEAT) design involves "missing data" that are "missing by design." Three nonlinear observed score equating methods used with a NEAT design are the "frequency estimation equipercentile equating" (FEEE), the "chain equipercentile equating" (CEE), and the "item-response-theory observed-score-equating" (IRT…
Konstantinov, K V; Beard, K T; Goddard, M E; van der Werf, J H J
A multi-trait (MT) random regression (RR) test day (TD) model has been developed for genetic evaluation of somatic cell scores for Australian dairy cattle, where first, second and third lactations were considered as three different but correlated traits. The model includes herd-test-day, year-season, age at calving, heterosis and lactation curves modelled with Legendre polynomials as fixed effects, and random genetic and permanent environmental effects modelled with Legendre polynomials. Residual variance varied across the lactation trajectory. The genetic parameters were estimated using asreml. The heritability estimates ranged from 0.05 to 0.16. The genetic correlations between lactations and between test days within lactations were consistent with most of the published results. Preconditioned conjugate gradient algorithm with iteration on data was implemented for solving the system of equations. For reliability approximation, the method of Tier and Meyer was used. The genetic evaluation system was validated with Interbull validation method III by comparing proofs from a complete evaluation with those from an evaluation based on a data set excluding the most recent 4 years. The genetic trend estimate was in the allowed range and correlations between the two sets of proofs were very high. Additionally, the RR model was compared to the previous test day model. The correlations of proofs between both models were high (0.97) for bulls with high reliabilities. The correlations of bulls decreased with increasing incompleteness of daughter performance information. The correlations between the breeding values from two consecutive runs were high ranging from 0.97 to 0.99. The MT RR TD model was able to make effective use of available information on young bulls and cows, and could offer an opportunity to breeders to utilize estimated breeding values for first and later lactations. PMID:19646149
Education Digest: Essential Readings Condensed for Quick Review, 2004
This article presents an adaptation of an article from School Board News, January 6, 2004 edition. The article describes the effort of de-tracking students of varying ability levels, made by officials of South Side High School, in Rockville Centre, New York, and Noble High School, in North Berwick, Maine. Officials from both schools say that the…
Comparison of three learning indicators college grade-point average (GPA), student-reported growth, and Graduate Record Examination scores found: (1) student-reported cognitive growth survey items have a modest relative validity; (2) the attenuation associated with use of residual gain scores does not invalidate their use; and (3) GPA and…
Casabianca, Jodi M.; Lockwood, J. R.; McCaffrey, Daniel F.
Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System-Secondary (CLASS-S) protocol from…
Anjorin, Angela; Schmidt, Helga; Posselt, Hans-Georg; Smaczny, Christina; Ackermann, Hanns; Deimling, Michael; Vogl, Thomas J; Abolmaali, Nasreddin
The aim of this study was to investigate whether the parenchymal lung damage in patients suffering from cystic fibrosis (CF) can be equivalently quantified by the Chrispin-Norman (CN) scores determined with low-field magnetic resonance imaging (MRI) and conventional chest radiography (CXR). Both scores were correlated with pulmonary function tests (PFT) and the Shwachman-Kulczycki method (SKM). To evaluate the comparability of MRI and CXR for different states of the disease, all scores were applied to patients divided into three age groups. Seventy-three CF patients (mean SKM score: 62 +/- 8) with a median age (range) of 14 years (7-32) were included. The mean CN scores determined with both imaging methods were comparable (CXR: 12.1 +/- 4.7; MRI: 12.0 +/- 4.5) and showed high correlation (P < 0.05, R = 0.97). Only weak correlations were found between imaging, PFT, and SKM. Both imaging modalities revealed significantly more severe disease expression with age, while PFT and SKM failed to detect early signs of disease. We conclude that imaging of the lung in CF patients is capable of detecting subtle and early parenchymal destruction before lung function or clinical scoring is affected. Furthermore, low-field MRI revealed high consistency with chest radiography and may be used for a thorough follow-up while avoiding radiation exposure. PMID:18274754
Miller, Joshua D; Hyatt, Courtland S; Rausher, Steven; Maples, Jessica L; Zeichner, Amos
The Elemental Psychopathy Assessment (EPA) is a relatively new self-report measure of the basic traits associated with psychopathy. Using community participants (N = 104) oversampled for the presence of psychopathic traits, we examined the convergent and criterion validity of the EPA total and factor scores (i.e., Antagonism, Emotional Stability, Disinhibition, and Narcissism) in relation to self- and informant reports of psychopathy and the general personality dimensions of the HEXACO (Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, and Openness to Experience; Ashton & Lee, 2009), as well as self-reported scores on narcissism, Machiavellianism, and externalizing behaviors (EBs) such as antisocial behavior and aggression. The EPA total and factor scores manifested substantial positive correlations with self- and informant-reported psychopathy scores and dimensions from the HEXACO, narcissism, Machiavellianism, and EBs. The patterns of these relations became clearer and more differentiated when examined via regression analyses such that the EPA factors manifested differential relations with various aspects of psychopathy (e.g., EPA Antagonism was the only unique correlate of psychopathy traits related to callousness and manipulation). Overall, the EPA is a promising assessment tool given the breadth of its coverage, the flexibility with which it can be used (total score; 4-factor scores; 18 subscale scores), and its ties to a popular model of basic personality traits. PMID:24548152
John R. Hayes; Jonathan I. Groner
BackgroundMissing data and the retrospective, nonrandomized nature of trauma registries can decrease the quality of registry-based research. Therefore, we used multiple imputation and propensity scores to test the effect of car seats and seat belt usage on injury severity in children involved in motor vehicle crashes.
May, Judy Jackson; Sanders, Eugene T. W.
Districts throughout the nation are engaged in comprehensive transformation to "turn around" low performing schools. Standardized test scores are used to gauge student achievement; however, academic gains may lag behind leading indicators such as improved school climate and effective leadership. This study examines 16 underperforming…
Quinn, David M.
Black-white test score gaps form in early childhood and widen over elementary school. Sociologists have debated the roles that socioeconomic status (SES) and school quality play in explaining these patterns. In this study, I replicate and extend past research using new nationally representative data from the Early Childhood Longitudinal…
Bridging the Gap through Academic Intervention Programs: A Quantitative Study of the Efficacy of the Health Sciences and Technology Academy (HSTA) on Underrepresented Students' State Standardized Test Scores
Smith, Feon M.
The purpose of the quantitative research study was to determine if participation in the Health Sciences and Technology Academy (HSTA) led to significant differences in the math and reading/language arts scores on the West Virginia Educational Standards Test 2 (WESTEST 2), between students who participated in the program compared to students who…
Kramarz, Francis; Machin, Stephen; Ouazad, Amine
What makes a test score? There is a great deal of uncertainty surrounding the exact contribution of school quality, pupil background, and peers in educational achievement. If peers make most of the difference, then diversity and heterogeneous classrooms may narrow the gap between high- and low-performing students. If pupil background is the first…
Eighth-grade students in New Jersey take the Early Warning Test (EWT), which involves reading, writing, and mathematics. Students with EWT scores below the state level of competency take a remedial mathematics course that provides students with computer-assisted instruction (2 days per week) as well as regular classroom instruction (3 days per…
Wold, Donald C.; And Others
This study found that clinician-generated SPINE (Speech Intelligibility Evaluation) test scores were correlated with objective computer-generated measures of tongue deviancy during vowel production in 28 persons (ages 14-20) with severe/profound hearing loss. Data suggest that subjects were more deviant in their production of front vowels than…
Igoe, Deirdre; Peralta, Christopher; Jean, Lindsey; Vo, Sandra; Yep, Linda Ngan; Zabjek, Karl; Wright, F. Virginia
Preschool-aged children continually learn new skills and perfect existing ones. "Mastery motivation" is theorized to be a personality trait linked to skill learning. The Dimensions of Mastery Questionnaire (DMQ) quantifies mastery motivation. This pilot study evaluated DMQ test-retest score reliability (preschool-version) and included exploratory…
Tucker and chained linear equatings were evaluated in two testing scenarios. In Scenario 1, referred to as rater comparability scoring and equating, the anchor-to-total correlation is often very high for the new form but moderate for the reference form. This may adversely affect the results of Tucker equating, especially if the new and reference…
Carlo Cianchetti; Simona Corona; Maria Foscoliano; Daniela Contu; Giuseppina Sannio-Fancello
According to Nelson's (1976) criteria, the MCST (MWCST) is a simplification of the Wisconsin Card Sorting Test (WCST). As the MCST is particularly suitable for children, the aim of this study was to establish the normative data presently lacking for that group. The MCST was administered to 1126 normal children aged 4 to 13 years. Scoring was based on all
Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
In this technical report, we document the results of a cross-validation study designed to identify optimal cut-scores for the use of the easyCBM[R] mathematics test in the state of Washington. A large sample, randomly split into two groups of roughly equal size, was used for this study. Students' performance classification on the Washington state…
Moses, Tim; Liu, Jinghua
In equating research and practice, equating functions that are smooth are typically assumed to be more accurate than equating functions with irregularities. This assumption presumes that population test score distributions are relatively smooth. In this study, two examples were used to reconsider common beliefs about smoothing and equating. The…
Kerns, Claretta M.
The purpose of this study was to examine the effectiveness of high school transition strategies for ninth grade students in comparison to the traditional high school experience of first time ninth grade students. This study compared the English End-of-Course (EOC) test scores of first time ninth grade students in a traditional high school setting…
St Clair-Thompson, Helen; Sykes, Sarah
Measures of working memory (WM) are useful predictors of cognitive skills and educational attainment in children. A number of scoring methods can be used for WM tasks-for example, the sum of all correctly recalled stimuli in perfectly recalled lists (absolute score) or the proportion of items recalled in the correct serial position during the task (proportion correct). The present study explored whether proportion correct scoring had an advantage over absolute scoring of WM tasks for predicting children's educational attainment. The participants were 81 primary school children aged 7-8 years. Each participant was tested on five measures of WM. Schools supplied national curriculum attainment levels for each child in reading, writing, mathematics, and science. The results revealed that proportion correct scoring resulted in WM tasks' being better predictors of children's achievement. The results are discussed in terms of both psychological theory and implications for research methods. PMID:21139163
This study investigated the relationship between the pattern of impairment on test scores of the neurologically impaired children and proximity to an inactive toxic waste disposal site. Subjects (N = 147) were students, ages 6-16, classified as neurologically impaired. Seventy-six who lived within six miles of the site served as the experimental group and 71 who did not live near a site comprised the control group. Research was based on existing data available through the Child Study Team evaluation process. Attention was given to the ACID cluster of the WISC-R, the Arithmetic and Reading subtests on the WRAT, and the Koppitz scores of the Bender Visual Motor Gestalt Test. No significant difference was found between the experimental and control groups. Sex differences within the experimental group were not significant. Time of exposure and patterning of scores in the experimental group were investigated. Time had a significant main effect on WISC-R Arithmetic and Digit Span subtests, the ACID cluster and the Bender Test for the total group. Main effect for sex was significant for the WISC-R Information subtest. An interaction effect was found to be significant on the WRAT Arithmetic subtest WRAT. The longer the girls lived within the site area the lower they scored on the WISC-R Information subtest and the WRAT Arithmetic subtest. The variable exposure (interaction of distance and time) was related to lower scores on the WISC-R Arithmetic and Digit Span subtest. A two-way interaction was found on the WRAT Arithmetic subtest. The longer the females were exposed to the waste site area, the lower they scored on the WRAT Arithmetic subtest. A comparison of those children in the site area from birth and those in the area three years prior to the evaluation was done. A significant main effect was found for the Bender Gestalt.
Brown, JP; Amaechi, BT; Bader, JD; Gilbert, GH; Makhija, SK; Lozano-Pineda, J; Leo, MC; Chuhe, C; Vollmer, WM
Objectives To better understand the effectiveness of xylitol in caries prevention in adults, and to attempt improved clinical trial efficiency. Methods As part of the Xylitol for Adult Caries Trial (X-ACT), non-cavitated and cavitated caries lesions were assessed in subjects who were experiencing the disease. The trial was a test of the effectiveness of 5 grams/day of xylitol, consumed by dissolving in the mouth five 1 gram lozenges spaced across each day, compared with a sucralose placebo. For this analysis, seeking trial efficiency, 538 subjects aged 21–80, with complete data for four dental examinations were selected from the 691 randomized into the three year trial, conducted at three sites. Acceptable inter and intra examiner reliability before and during the trial was quantified using the kappa statistic. Results The mean annualized non-cavitated plus cavitated lesion transition scores in coronal and root surfaces, from sound to carious favoured xylitol over placebo, during the three cumulative periods of 12, 24, and 33 months, but these clinically and statistically non-significant differences declined in magnitude over time. Restricting the present assessment to those subjects with a higher baseline lifetime caries experience showed possible but inconsistent benefit. Conclusions There was no clear and clinically relevant preventive effect of xylitol on caries in adults with adequate fluoride exposure when non-cavitated plus cavitated lesions were assessed. This conformed to the X-ACT trial result assessing cavitated lesions. Including non-cavitated lesion assessment in this full scale, placebo controlled, multi site, randomized, double blinded clinical trial in adults experiencing dental caries, did not achieve added trial efficiency or demonstrate practical benefit of xylitol. Trial Registration ClinicalTrials.Gov NCT00393055 PMID:24205951
Wang, Jinhao; Brown, Michelle Stallone
The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by human…
Effects of Absence and Cognitive Skills Index on Various Achievement Indicators. A Study of ISTEP Scores, Discrepancies, and School-Based Math and English Tests of 1997-1998 Seventh Grade Students at Sarah Scott Middle School, Terre Haute, Indiana.
Davis, Holly S.
This study examines the correlation between absence, cognitive skills index (CSI), and various achievement indicators such as the Indiana Statewide Testing for Educational Progress (ISTEP) test scores, discrepancies, and school-based English and mathematics tests for 64 seventh-grade students from one middle school. Scores for each of the subtests…
Sani, Claudia; Grilli, Leonardo
The performance of a school system can be evaluated through the learning levels of the pupils, usually summarized by school mean scores. The variability of the mean scores among schools is rarely studied in detail, though it is a crucial issue especially in primary schools: in fact, a high variability among schools raises doubts on the capacity of…
A Study of the Relationship Between Scores on the School and College Ability Test (SCAT Series II), the College English Placement Test (CEPT) and Academic Achievement in American History and Constitution (History 27).
Schaumburg, Gary F.
This paper reports the results of an investigation of the relationship between scores on the School and College Ability Test (SCAT), the College English Placement Test (CEPT), and grades earned in American History and Constitution (History 27 at Cerritos College, California) in order to ascertain if predictability of "successful" or "unsuccessful"…
Rains, Cherri Sloan
This study was an investigation of the effectiveness of mathematics instruction using the interactive whiteboard (IWB) for 1, 2, and 3 years. Guided by Gagne's conditions of learning theory, this program evaluation study investigated the impact of receiving 1, 2, or 3 years of mathematics instruction using the IWB on mathematics scores on the…
High, Clennis F.
A study was conducted to identify factors affecting student performance on the Texas Academic Skills Program (TASP), a state-mandated measure designed to assess students' basic skills and competencies. TASP and Assessment of Student Skills for Entry Transfer (ASSET) scores were analyzed for 328 academic track students from 6 community colleges in…
Robert F. Bornstein
The degree to which projection plays a role in Rorschach (Rorschach, 1921\\/1942) responding remains controversial, in part because extant data have yielded inconclusive results. In this investigation, I examined the impact of social projection on Rorschach Oral Dependency (ROD) scores using methods adapted from social cognition research. In Study 1, I prescreened 85 college students (40 women and 45 men)
Harder, Valerie S.; Stuart, Elizabeth A.; Anthony, James C.
There is considerable interest in using propensity score (PS) statistical techniques to address questions of causal inference in psychological research. Many PS techniques exist, yet few guidelines are available to aid applied researchers in their understanding, use, and evaluation. In this study, the authors give an overview of available…
, integrated theories of cognition and action. Such theories provide the necessary computational means. Keywords: rational adaptation, bounded optimality, cognitive architecture, theory comparison, responseRational Adaptation Under Task and Processing Constraints: Implications for Testing Theories
Background Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both. Results Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors. Conclusion We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations. PMID:19917096
Jacobs, Paul I.
Although there are fundamental differences in the objectives of the two activities, the programing of instructional materials bears many similarities to the construction of tests. A systematic comparison of the problems and procedures reveals important implications for programing from the older field of testing. Theory and experience in test…
Wasylkiw, Louise; Tomes, Jennifer L.; Smith, Francine
In 3 studies, the authors examined the prevalence and effects of a testing strategy whereby they gave a set of items to participants in advance and subsequently tested them on a portion of those items (i.e., subset testing). In a survey of university instructors, Study 1 showed that subset testing is a commonly used testing strategy. In this…
Wilcox, Rand R.
A mastery test is frequently described as follows: an examinee responds to n dichotomously scored test items. Depending upon the examinee's observed (number correct) score, a mastery decision is made and the examinee is advanced to the next level of instruction. Otherwise, a nonmastery decision is made and the examinee is given remedial work. This…
Hausknecht, John P; Trevor, Charlie O; Farr, James L
This field study investigated the effect of retaking identical selection tests on subsequent test scores of 4,726 candidates for law enforcement positions. For both cognitive ability and oral communication ability selection tests, candidates produced significant score increases between the 1st and 2nd and the 2nd and 3rd test administrations. Furthermore, the repeat testing relationships with posthire training performance and turnover were examined in a sample of 1,515 candidates eventually selected into the organization. As predicted from persistence and continuance commitment rationales, the number of tests necessary to gain entry into the organization was positively associated with training performance and negatively associated with turnover probability. PMID:12002953
Soke, Gnakub Norbert; Philofsky, Amy; Diguiseppi, Carolyn; Lezotte, Dennis; Rogers, Sally; Hepburn, Susan
We prospectively examined mean changes in Autism Diagnostic Interview-Revised (ADI-R) Total and Domains scores and stability of the ADI-R diagnostic classification in 28 children with autism initially assessed at age 2-4 years and reassessed 2 years later. Mean Total, Social Interaction, and Communication scores decreased significantly from Time 1…
Livingston, Samuel A.
To many people, standardized testing means multiple-choice testing. However, some tests contain questions that require the test taker to produce the answer, rather than simply choosing it from a list. The required response can be as simple as the writing of a single word as complex as the design of a laboratory experiment to test a scientific…
85 boys in the New Jersey State Colony for the Feeble-minded were given the Goodenough test 3 times with 18 and 25 days between tests. The median increase and median decrease from test to test did not exceed one year. The test-retest reliability of the complete Goodenough scale ranged from .68 to .80, and the reliability of an abbreviated scale
Introduction COPD exacerbations have a negative impact on lung function, decrease quality of life (QoL) and increase the risk of death. The objective of this study was to assess the course of health status after an outpatient or inpatient exacerbation in patients with COPD. Methods This is an epidemiological, prospective, multicentre study that was conducted in 79 hospitals and primary care centres in Spain. Four hundred seventy-six COPD patients completed COPD assessment test (CAT) and Clinical COPD Questionnaire (CCQ) questionnaires during the 24 hours after presenting at hospital or primary care centres with symptoms of an exacerbation, and also at weeks 4–6. The scores from the CAT and CCQ were evaluated and compared at baseline and after recovery from the exacerbation. Results A total of 164 outpatients (33.7%) and 322 inpatients (66.3%) were included in the study. The majority were men (88.2%), the mean age was 69.4 years (SD?=?9.5) and the mean FEV1 (%) was 47.7% (17.4%). During the exacerbation, patients presented high scores in the CAT: [mean: 22.0 (SD?=?7.0)] and the CCQ: [mean: 4.4 (SD?=?1.2)]. After recovery there was a significant reduction in the scores of both questionnaires [CAT: mean: -9.9 (SD?=?5.1) and CCQ: mean: -3.1 (SD?=?1.1)]. Both questionnaires showed a strong correlation during and after the exacerbation and the best predictor of the magnitude of improvement in the scores was the severity of each score at onset. Conclusions Due to their good correlation, CAT and CCQ can be useful tools to measure health status during an exacerbation and to evaluate recovery. However, new studies are necessary in order to identify which factors are influencing the course of the recovery of health status after a COPD exacerbation. PMID:23987232
Looney, Marilyn A.
Given that equating/linking applications are now appearing in kinesiology literature, this article provides an overview of the different types of linked test scores: equated, concordant, and predicted. It also addresses the different types of evidence required to determine whether the scores from two different field tests (measuring the same…
Root Kustritz, Margaret V
Third-year veterinary students in a required theriogenology diagnostics course were allowed to self-select attendance at a lecture in either the evening or the next morning. One group was presented with PowerPoint slides in a traditional format (T group), and the other group was presented with PowerPoint slides in the assertion-evidence format (A-E group), which uses a single sentence and a highly relevant graphic on each slide to ensure attention is drawn to the most important points in the presentation. Students took a multiple-choice pre-test, attended lecture, and then completed a take-home assignment. All students then completed an online multiple-choice post-test and, one month later, a different online multiple-choice test to evaluate retention. Groups did not differ on pre-test, assignment, or post-test scores, and both groups showed significant gains from pre-test to post-test and from pre-test to retention test. However, the T group showed significant decline from post-test to retention test, while the A-E group did not. Short-term differences between slide designs were most likely unaffected due to required coursework immediately after lecture, but retention of material was superior with the assertion-evidence slide design. PMID:25000882
Abella, Rodolfo; Urrutia, Joanne; Shneyderman, Aleksandr
Approximately 1,700 English language learners (ELLs) and former ELL students, in Grades 4 and 10, were tested using both an English-language (Stanford Achievement Test, 9th ed.) and a Spanish-language (Aprenda, 2nd ed.) achievement test. Their performances on the two tests were contrasted. The results showed that ELL students, for the most part,…
Catherine Worthington; Randy Jackson; Judy Mill; Tracey Prentice; Ted Myers; Susan Sommerfeldt
The objective of this study was to explore HIV testing experiences and service views of Canadian Aboriginal youth in order to provide information for HIV testing services. An exploratory, mixed-method, community-based research design was used for this study. Findings reported here are from 210 survey participants who had experienced an HIV test. Youth were recruited through 11 Aboriginal organizations across
Duckworth, Angela Lee; Seligman, Martin E. P.
Throughout elementary, middle, and high school, girls earn higher grades than boys in all major subjects. Girls, however, do not out perform boys on achievement or IQ tests. To date, explanations for the underprediction of girls' GPAs by standardized tests have focused on gender differences favoring boys on such tests. The authors' investigation…
Blanton, Hart; Jaccard, James; Burrows, Christopher N
Psychometricians strive to eliminate random error from their psychological inventories. When random error affecting tests is diminished, tests more accurately characterize people on the psychological dimension of interest. We document an unusual property of the scoring algorithm for a measure used to assess a wide range of psychological states. The "D-score" algorithm for coding the Implicit Association Test (IAT) requires the presence of random noise in order to obtain variability. Without consequential degrees of random noise, all individuals receive extreme scores. We present results from an algebraic proof, a computer simulation, and an online survey of implicit racial attitudes to show how trial error can bias IAT assessments. We argue as a result that the D-score algorithm should not be used for formal assessment purposes, and we offer an alternative to this approach based on multiple regression. Our critique focuses primarily on the IAT designed to measure unconscious racial attitudes, but it applies to any IAT developed to provide psychological assessments within clinical, organizational, and developmental branches of psychology-and in any other field where the IAT might be used. PMID:25296761
Harder, Valerie S.; Stuart, Elizabeth A.; Anthony, James C.
There is considerable interest in using propensity score (PS) statistical techniques to address questions of causal inference in psychological research. Many PS techniques exist, yet few guidelines are available to aid applied researchers in their understanding, use and evaluation. This study gives an overview of available techniques for PS estimation and PS application. It also provides a way to help compare PS techniques, using the resulting measured covariate balance as the criterion for selecting between techniques. The empirical example for this study involves the potential causal relationship linking early-onset cannabis problems and subsequent negative mental health outcomes, using data from a prospective cohort study. PS techniques are described and evaluated based on their ability to balance the distributions of measured potentially confounding covariates for individuals with and without early-onset cannabis problems. This paper identifies the PS techniques that yield good statistical balance of the chosen measured covariates within the context of this particular research question and cohort. PMID:20822250
Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E
The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. PMID:25724515
Burnstein, A.V.; Galambos, J.T.
The (14C)aminopyrine breath test (APBT) score, an estimate of hepatic mixed-oxidase function, was evaluated in 21 consecutive patients wih active nonalcoholic chronic liver diseases. Ten had primary biliary cirrhosis (PBC) and 11 had chronic active hepatitis (CAH). The APBT score was normal or elevated in patients with PBC (P less than 0.001), and lower than normal in CAH patients (P less than 0.01); 10.5 +/- 1.6 and 3.5 +/- 1.86, respectively, vs control 7.65 +/- 1.15 (mean +/- SD). The 11 patients with CAH included two middle-aged women who displayed ambiguous severe intrahepatic cholestasis. There was no overlap between the APBT scores of the 10 PBC and 11 CAH patients. These initial data suggest that the APBT may be helpful in the differentiation of PBC and CAH, including misleading cholestatic forms of CAH.
Bastian, Mauresa; Eggett, Dennis L; Jefferies, Laura K
Question placement and usage of pre-evaluation instructions (PEI) in questionnaires for food sensory analysis may bias consumers' scores via carry-over effects. Data from consumer sensory panels previously conducted at a central location, spanning 11 years and covering a broad range of food product categories, were compiled. Overall acceptance (OA) question placement was studied with categories designated as first (the first evaluation question following demographic questions), after nongustation questions (immediately following questions that do not require panelists to taste the product), and later (following all other hedonic and just-about-right [JAR] questions, but occasionally before ranking, open-ended comments, and/or intent to purchase questions). Each panel was categorized as having or not having PEI in the questionnaire; PEI are instructions that appear immediately before the first evaluation question and show panelists all attributes they will evaluate before receiving test samples. Postpanel surveys were administered regarding the self-reported effect of PEI on panelists' evaluation experience. OA scores were analyzed and compared (1) between OA question placement categories and (2) between panels with and without PEI. For most product categories, OA scores tended to be lower when asked later in the questionnaire, suggesting evidence of a carry-over effect. Usage of PEI increased OA scores by 0.10 of a 9-point hedonic scale point, which is not practically significant. Postpanel survey data showed that presence of PEI typically improved the panelists' experience. Using PEI does not appear to introduce a meaningful carry-over effect. PMID:25604650
Cummings, Steven R.; Sanders, Jason L.; Caserotti, Paolo; Rosano, Caterina; Satterfield, Suzanne; Strotmeyer, Elsa S.; Harris, Tamara B.; Simonsick, Eleanor M.; Cawthon, Peggy M.
Abstract Background Characterization of long-term health trajectory in older individuals is important for proactive health management. However, the relative prognostic value of information contained in clinical profiles of nonfrail older adults is often unclear. Methods We screened 825 phenotypic and genetic measures evaluated during the Health, Aging, and Body Composition Study (Health ABC) baseline visit (3,067 men and women aged 70–79). Variables that best predicted mortality over 13 years of follow-up were identified using 10-fold cross-validation. Results Mortality was most strongly associated with low Digit Symbol Substitution Test (DSST) score (DSST<25; 21.9% of cohort; hazard ratio [HR]=1.87±0.06) and elevated serum cystatin C (?1.30?mg/mL; 12.1% of cohort; HR=2.25±0.07). These variables predicted mortality better than 823 other measures, including baseline age and a 45-variable health deficit index. Given elevated cystatin C (?1.30?mg/mL), mortality risk was further increased by high serum creatinine, high abdominal visceral fat density, and smoking history (2.52?HR ?3.73). Given a low DSST score (<25) combined with low-to-moderate cystatin C (<1.30?mg/mL), mortality risk was highest among those with elevated plasma resistin and smoking history (1.90?HR?2.02). Conclusions DSST score and serum cystatin C warrant priority consideration for the evaluation of mortality risk in older individuals. Both variables, taken individually, predict mortality better than chronological age or a health deficit index in well-functioning older adults (ages 70–79). DSST score and serum cystatin C can thus provide evidence-based tools for geriatric assessment. PMID:22607624
Pope, Gregory A.; Wentzel, Carolyn; Cammaert, Ron
Analysis of all January and June 2000 test scores for Alberta high school seniors found weak relationships between gender and both diploma examination and school-awarded scores. The largest statistical effect for gender was that the difference between the two sets of scores was greater for girls than boys, with school-awarded scores being higher.…
de la Torre, Jimmy; Patz, Richard J.
This article proposes a practical method that capitalizes on the availability of information from multiple tests measuring correlated abilities given in a single test administration. By simultaneously estimating different abilities with the use of a hierarchical Bayesian framework, more precise estimates for each ability dimension are obtained.…
Blanding, Joseph Dwayne
Standardized tests continue to be used in the United States to evaluate applicants for admission to most colleges and universities, which often results in less access for students--specifically students of color--who may have been inadequately prepared in grades K-12 for standardized testing. The purpose of this phenomenological case study was to…
New Jersey State Dept. of Education, Trenton.
The New Jersey High School Proficiency Test for grade 11 (HSPT11) replaced a similar requirement for grade 9 and became a graduation requirement in October 1993. As in the past, the writing test consists of a writing sample, which assesses student abilities to write sustained discourse, and a multiple-choice portion that assesses how well students…
VanderLaan, Ski R.
This mixed methods study (Creswell, 2008) was designed to test the influence of collaborative testing on learning using a quasi-experimental approach. This study used a modified embedded mixed method design in which the qualitative and quantitative data, associated with the secondary questions, provided a supportive role in a study based primarily…
Zhong, Ming; Zhang, Yiwei; Lange, Kenneth; Fan, Ruzong
In this article, we developed a cross-population comparison test statistic to detect chromosome regions in which there is no significant excess homozygosity in one population but homozygosity remains high in the other. We treated an extended stretch of homozygosity as a surrogate indicator of a recent positive selection. Conditioned on existing linkage disequilibrium, we proposed to test the haplotype version of the Hardy–Weinberg equilibrium (HWE). For each population, we assumed that a random sample of unrelated individuals were typed on a large number of single nucleotide polymorphisms (SNPs). A pooled-test statistic was constructed by comparing the measurements of homozygosity of the two samples around a core SNP. In the chromosome regions where HWE is roughly true in one population and HWE is not true in the other, the pooled-test statistic led to significant results to detect the positive selection. We evaluated the performance of the test statistic by type I error comparison and power evaluation. We showed that the proposed test statistic was very conservative and it had good power when the selected allele remains polymorphic. Then, we applied the test to HapMap Phase II data to make a comparison with previous results and to search for new candidate regions.
Background Determining the variation of circulating cathodic antigen (CCA) in urine and egg counts variation in stool between days in Schistosoma mansoni (S. mansoni) infected individuals is vital to decide whether or not to rely on a single-sample test for diagnosis of Schistosomiasis. In this study, the magnitude of day-to-day variation in urine-CCA test scores and in faecal egg counts was evaluated in school children in Ethiopia. Methods A total of 620 school children (age 8 to 12 years) were examined for S. mansoni infection using double Kato-Katz and single urine-CCA cassette methods (batch 32727) on three consecutive days. Results The prevalence of S. mansoni infection was 81.1% based on triple urine-CCA-cassette test and 53.1% based on six Kato-Katz thick smears. Among the study participants, 26.3% showed fluctuation in urine CCA and 32.4% showed fluctuation in egg output. Mean egg count as well as number of cases in each class of intensity and intensity of cassette band color varied over the three days of examination. Over 85% of the children that showed day-to-day variations in status of S. mansoni infection from negative to positive or vice versa by the Kato-Katz and the CCA methods had light intensity of infection. The fluctuation in both the CCA test scores and faecal egg count was not associated with age and sex. Conclusions The current study showed day-to-day variation in CCA and Kato-Katz test results of children infected with S. mansoni. This indicates the necessity of more than one urine or stool samples to be collected on different days for more reliable diagnosis of S. mansoni infection in low endemic areas. PMID:24742192
Farhat, M R; Mitnick, C D; Franke, M F; Kaur, D; Sloutsky, A; Murray, M; Jacobson, K R
Fluoroquinolone (FQ) drug susceptibility testing (DST) is an important step in the design of effective treatment regimens for multidrug-resistant tuberculosis. Here we compare ciprofloxacin, ofloxacin and moxifloxacin (MFX) resistance results from 226 multidrug-resistant samples. The low level of concordance observed suggests that DST should be performed for the specific FQ planned for clinical use. The results also support the new World Health Organization recommendation for testing MFX at a critical concentration of 2.0 ?g/ml. PMID:25686144
This paper investigates the credit scoring accuracy of five neural network models: multilayer perceptron, mixture-of-experts, radial basis function, learning vector quantization, and fuzzy adaptive resonance. The neural network credit scoring models are tested using 10-fold crossvalidation with two real world data sets. Results are benchmarked against more traditional methods under consideration for commercial applications including linear discriminant analysis, logistic regression,
This article provides an introduction to the kind of computer software that is used to score student writing in some high stakes testing programs, and that is being promoted as a teaching and learning tool to schools. It sketches the state of play with machines for the scoring of writing, and describes how these machines work and what they do.…
This study uses analysis of co-variance in order to determine which cognitive/learning (working memory, knowledge integration, epistemic belief of learning) or social/personality factors (test anxiety, performance-avoidance goals) might account for gender differences in SAT-V, SAT-M, and overall SAT scores. The results revealed that none of the cognitive/learning factors accounted for gender differences in SAT performance. However, the social/personality factors of test anxiety and performance-avoidance goals each separately accounted for all of the significant gender differences in SAT-V, SAT-M, and overall SAT performance. Furthermore, when the influences of both of these factors were statistically removed simultaneously, all non-significant gender differences reduced further to become trivial by Cohen's (1988) standards. Taken as a whole, these results suggest that gender differences in SAT-V, SAT-M, and overall SAT performance are a consequence of social/learning factors. PMID:23997382
Shieh, Joseph T.C.
Noncompaction/hypertrabeculation is increasingly being recognized in children and adults, yet we understand little about the causes of disease. Genes associated with noncompaction/hypertrabeculation have been identified, but how can these assist in clinical management? Genomic technologies have also expanded tremendously, making testing more comprehensive, but they also present new questions given the tremendous diversity of phenotypes and variability of genomes. Here we present genetic evaluation strategies and assess clinical testing options for noncompaction/hypertrabeculation. We assess genes/gene panels offered by clinical laboratories and the potential for high-throughput sequencing to fuel further discovery. We discuss challenges in cardiovascular genetics, such as interpretation of genomic variants, prediction and disease penetrance. PMID:23843345
Bhat, Venkatraman; Wahab, Atiqa Abdul; Garg, Kailash C; Janahi, Ibrahim; Singh, Rajvir
Background and Aims: Pulmonary changes in patients with cystic fibrosis (CF) with CFTR I1234V mutation have not been extensively documented. Impact of geographic influence on phenotypical expression is largely unknown. This descriptive clinical study presents the high-resolution computed tomography (HRCT) pulmonary findings and computed tomography (CT) scoring with respect to pulmonary function tests (PFT) in a small subset of CF group. Materials and Methods: We examined 29 patients between 2 and 31 years of age with CFTR I1234V mutation. HRCT and PFT were performed within 2 weeks of each other. Imaging abnormalities on HRCT were documented and analyzed by utilizing the scoring system described by Bhalla et al., Brody et al., Helbich et al.,and Santamaria et al. Efficacy of the scoring system with respect to PFT was compared. Statistical Analysis: Inter-observer reliability of the scoring systems was tested using intraclass correlation (ICC) between the two observers. Spearman correlation coefficients were calculated between the scoring systems and between the scoring systems and PFT results. Results: In our study, right upper and middle lobes were the most frequently involved sites of involvement. Bronchiectasis and peribronchial thickening were the most frequent imaging findings. Scores with all four scoring systems were reproducible, with good ICC coefficient of 0.69. There was good agreement between senior radiologists in all scoring systems. Conclusion: We noted pulmonary imaging abnormalities in a large majority (96%) of our CF patients. There was no significant difference in the CT scores observed from various systems. The CT evaluation system by Broody is detailed and time consuming, and is ideal for research and academic setup. On the other hand, the systems by Bhalla and Santamaria are easy to use, quick, and equally informative. We found the scoring system by Santamaria preferable over that of Bhalla by virtue of additional points of evaluation and ease of use, and therefore better suited for busy clinical practice. PMID:25709165
Eduardo S. Schwartz; Walter N. Torous
We test the implications of real option pricing models with competitive interactions for commercial real estate development. The competitive nature of a local commercial real estate market relies on a Herfindahl ratio derived from individual developers' shares of total office construction in their market. All else being equal, greater competition among local developers is associated with more building starts. Other
P. A. Vroon; A. Boxtel
In this paper some implications of the pulse generator model of time perception have been tested. In the case of serial (re)production of an interval a lengthening effect occurs. Generally, this phenomenon is explained by assuming that the time-keeper is driven by the state of general physiological activation which decreases in the course of the task. An experiment was carried
Carlson, Janet F.; Benson, Nicholas; Oakland, Thomas
Implications of the International Classification of Functioning, Disability and Health (ICF) on the development and use of tests in school settings are enumerated. We predict increased demand for behavioural assessments that consider a person's activities, participation and person-environment interactions, including measures that: (a) address…
Horan, Sean M.; Houser, Marian L.
The goal of the present study was to test predicted outcome value theory (POV) in the classroom in order to discover the implications of students' POV judgments. Specifically, we explored the relationships among students' initial POV judgments and students' communication. To that end, we conducted a two-phase study in which students completed…
Jue, Penny Y.
Beginning in fall 1991, Napa Valley College (NVC), in California, switched from essentially mandatory placement of incoming students to an advisory, self-selection system, where students receive course recommendations based on assessment test results. In order to evaluate the validity of NVC's assessment procedures, a study was conducted of…
Zhu, Daming; Thompson, Tony D.
This study attempted to control differences in achievement when examining omitting tendencies of examinees. Test data of randomly sampled examinees (7 samples of 2,000 examinees each) from one national administration of the ACT Assessment were used. The number of responses omitted per examinee was examined over all examinees and over only those…
Feldt, Leonard S.
This article presents a simple, computer-assisted method of determining the extent to which increases in reliability increase the power of the "F" test of equality of means. The method uses a derived formula that relates the changes in the reliability coefficient to changes in the noncentrality of the relevant "F" distribution. A readily available…
Heinrich Stumpf; Julian C. Stanley
For every 4-year college in the United States listed in the 1998 College Handbook of the College Board, the percentages of students graduating within 6 years of entering and of students having high school grade point averages (GPAs) of at least 3.00 were recorded. The authors also obtained the College Board Scholastic Assessment Test I (SAT I) Verbal and Math
Ebert, Kerry Danahy; Scott, Cheryl M.
Purpose: Both narrative language samples and norm-referenced language tests can be important components of language assessment for school-age children. The present study explored the relationship between these 2 tools within a group of children referred for language assessment. Method: The study is a retrospective analysis of clinical records from…
Valdiviezo, Laura A.
At Smith Street Elementary School, the globalizing education trends that English language learner (ELL) teachers face focus on measuring student achievement through testing and the English mainstreaming of non-dominant students as opposed to the cultivation of the students' linguistic and cultural diversity. The ELL teachers at Smith Street…
Pantzare, Anna Lind
In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers' ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality…
Reeve, Charlie L.; Bonaccio, Silvia
Claims of changes in the validity coefficients associated with general mental ability (GMA) tests due to the passage of time (i.e., temporal validity degradation) have been the focus of an on-going debate in applied psychology. To evaluate whether and, if so, under what conditions this degradation may occur, we integrate evidence from multiple…
Reddy, Linda A.; Fabiano, Gregory A.; Dudek, Christopher M.; Hsu, Louis
The present study examined the validity of a teacher observation measure, the Classroom Strategies Scale-Observer Form (CSS), as a predictor of student performance on statewide tests of mathematics and English language arts. The CSS is a teacher practice observational measure that assesses evidence-based instructional and behavioral management…
Hill, Grant; Downing, Aaron
The purpose of this study was to determine the effects of frequent peer-monitored Fitnessgram testing, with student goal setting, on the PACER and push-up performance of middle school students. Subjects were 176 females and 189 males in 10 physical education classes at a middle school with an 83.7% Hispanic student population. Students were…
Grossman, Pam; Cohen, Julie; Ronfeldt, Matthew; Brown, Lindsay
In this study, we examined how the relationships between one observation protocol, the Protocol for Language Arts Teaching Observation (PLATO), and value-added measures shift when different tests are used to assess student achievement. Using data from the Measures of Effective Teaching Project, we found that PLATO was more strongly related to the…
This article discusses the scientific and ethical implications of random drug testing in the workplace. Random drug testing, particularly in safety-sensitive sectors, is a common practice, yet it has received little critical analysis. My conclusion is that there are important ethical challenges with these programs. Employers must ensure that every aspect of their policies are rooted in scientific evidence, linked rationally to the goal of workplace safety, and are ethically justifiable. PMID:26022100
John D. Martin; Elinor M. Martin
The purpose of the present project was to determine the degree of correlation between the PIL and (a) the Time Competency (TC) scale of the Personal Orientation Inventory (POI), (b) the Inner-directed (I) scale of the POI, (c) IQ scores of high school students who took the Otis-Lennon Mental Ability Test, Form L-M, and (d) grade point average (GPA) of
Park, Joong-Il; Shin, So-Young; Park, Sue K; Barrett-Connor, Elizabeth
To investigate the association between analyses of submaximal treadmill exercise test (TMT) and long-term myocardial ischemia (Mis) and silent Mis in community-dwelling older adults, 898 Rancho Bernardo Study participants (mean age 55 years) without coronary heart disease underwent TMT and were followed up to 27 years. The main outcome measures are incidence of Mis and silent Mis. During follow-up, 97 Mis and 103 silent Mis events occurred. We measured ST change, inability to achieve target heart rate, abnormal heart rate recovery (HRR), and chronotropic incompetence (ChI). Each parameter was a significant predictor for Mis and silent Mis. An integrated scoring model was based on these 4 parameters and defined as sum of numbers of abnormal parameters. After multiple adjustments, an integrated scoring model independently predicted Mis and silent Mis. The incidence rates of abnormalities of parameters are 36.5% for 1 abnormality, 9.1% for 2 abnormalities, and 2.0% for 3 or 4 abnormalities. Compared with those with normal results, participants with 1 or 2 abnormalities had significantly increased risk for Mis (hazard ratio [HR] 1.79 or 2.34, respectively) and silent Mis (HR 1.80 or 2.64, respectively). Participants with 3 or more positive findings showed an even greater risk for Mis (HR 7.96 [3.02 to 21.00]) and silent Mis (HR 3.22 [0.76 to 13.60]). In conclusion, ST change, ChI, abnormal HRR, inability to achieve target heart rate, and integrated scoring model of TMT were independent predictors of long-term Mis and silent Mis in an asymptomatic middle-aged population. Management of ChI or abnormal HRR in an asymptomatic population may prevent future ischemic heart disease and thus improve the quality of life. PMID:25728643
Ong, Kimberly J.; MacCormack, Tyson J.; Clark, Rhett J.; Ede, James D.; Ortega, Van A.; Felix, Lindsey C.; Dang, Michael K. M.; Ma, Guibin; Fenniri, Hicham; Veinot, Jonathan G. C.; Goss, Greg G.
The evaluation of engineered nanomaterial safety has been hindered by conflicting reports demonstrating differential degrees of toxicity with the same nanoparticles. The unique properties of these materials increase the likelihood that they will interfere with analytical techniques, which may contribute to this phenomenon. We tested the potential for: 1) nanoparticle intrinsic fluorescence/absorbance, 2) interactions between nanoparticles and assay components, and 3) the effects of adding both nanoparticles and analytes to an assay, to interfere with the accurate assessment of toxicity. Silicon, cadmium selenide, titanium dioxide, and helical rosette nanotubes each affected at least one of the six assays tested, resulting in either substantial over- or under-estimations of toxicity. Simulation of realistic assay conditions revealed that interference could not be predicted solely by interactions between nanoparticles and assay components. Moreover, the nature and degree of interference cannot be predicted solely based on our current understanding of nanomaterial behaviour. A literature survey indicated that ca. 95% of papers from 2010 using biochemical techniques to assess nanotoxicity did not account for potential interference of nanoparticles, and this number had not substantially improved in 2012. We provide guidance on avoiding and/or controlling for such interference to improve the accuracy of nanotoxicity assessments. PMID:24618833
Ishmael, S. D.; Regenie, V. A.; Mackall, D. A.
Advanced fighter technologies are evolving into highly complex systems. Flight controls are being integrated with advanced avionics to achieve a total system. The advanced fighter technology integration (AFTI) F-16 aircraft is an example of a highly complex digital flight control system integrated with advanced avionics and cockpit. The architecture of these new systems involves several general issues. The use of dissimilar backup modes if the primary system fails requires the designer to trade off system simplicity and capability. This tradeoff is evident in the AFTI/F-16 aircraft with its limited stability and fly-by-wire digital flight control systems. In case of a generic software failure, the backup or normal mode must provide equivalent envelope protection during the transition to degraded flight control. The complexity of systems like the AFTI/F-16 system defines a second design issue, which can be divided into two segments: the effect on testing, and the pilot's ability to act correctly in the limited time available for cockpit decisions. The large matrix of states possible with the AFTI/F-16 flight control system illustrates the difficulty of both testing the system and choosing real-time pilot actions.
Ickovics, Jeannette R.; Carroll-Scott, Amy; Peters, Susan M.; Schwartz, Marlene; Gilstad-Hayden, Kathryn; McCaslin, Catherine
Background The Institute of Medicine (2012) concluded that we must “strengthen schools as the heart of health.” To intervene for better outcomes in both health and academic achievement, identifying factors that impact children is essential. Study objectives are to (1) document associations between health assets and academic achievement, and (2) examine cumulative effects of these assets on academic achievement. Methods Participants include 940 students (grades 5 and 6) from 12 schools randomly selected from an urban district. Data include physical assessments, fitness testing, surveys, and district records. Fourteen health indicators were gathered including physical health (eg, body mass index [BMI]), health behaviors (eg, meeting recommendations for fruit/vegetable consumption), family environment (eg, family meals), and psychological well-being (eg, sleep quality). Data were collected 3-6 months prior to standardized testing. Results On average, students reported 7.1 health assets out of 14. Those with more health assets were more likely to be at goal for standardized tests (reading/writing/mathematics), and students with the most health assets were 2.2 times more likely to achieve goal compared with students with the fewest health assets (both p < .001). Conclusions Schools that utilize nontraditional instructional strategies to improve student health may also improve academic achievement, closing equity gaps in both health and academic achievement. PMID:24320151
The Fit for Delivery study: rationale for the recommendations and test-retest reliability of a dietary score measuring adherence to 10 specific recommendations for prevention of excessive weight gain during pregnancy.
Øverby, Nina C; Hillesund, Elisabet R; Sagedal, Linda R; Vistad, Ingvild; Bere, Elling
Aiming at preventing excessive weight gain during pregnancy, 10 specific dietary recommendations are given to pregnant women in the intervention arm of the Norwegian Fit for Delivery (FFD) study. This paper presents the rationale and test-retest reliability of the food frequency questionnaire (FFQ) and a dietary score measuring adherence to the recommendations. The study is part of the ongoing FFD study, a randomised, controlled, intervention study in nulliparous pregnant women. A 43-item FFQ was developed for the FFD study. A dietary score was constructed from 10 subscales corresponding to the 10 dietary recommendations. Adding the subscales yielded a score from 0 to 10 with increasing score indicating healthier dietary behaviour. The score was divided into tertiles, grouping participants into low, medium and high adherence to the dietary recommendations. Pregnant women attending ultrasound screening at about week 19 of pregnancy were asked to complete the FFQ twice, 2 weeks apart. Of 154 pregnant women completing the first questionnaire, 106 (69%) completed the form on both occasions and was included in the study. The test-retest correlations of the score and subscales were r?=?0.68 and r?=?0.56-0.84, respectively (both P???0.001). There was 68% test-retest correct classification of the score and 70-87% of the subscales. In conclusion, acceptable test-retest reliability of the FFQ and the dietary score was found. The score will be used in the FFD study to measure adherence to the dietary recommendations throughout pregnancy and in the following year post-partum. PMID:23241065
Nyroos, Mikaela; Wiklund-Hornqvist, Carola
Introduction: The Swedish government has decided to introduce national tests in primary education. Swedish pupils in general have few tests and a recognised possible adverse effect of testing is test anxiety among pupils, which may have a negative impact on examination performance. However, there has been little research on effects of testing on…
Demir, Ozan M; Dobson, Peter; Papamichael, Nikolaos D; Byrne, Jonathan; Plein, Sven; Alfakih, Khaled
The European Society of Cardiology (ESC) and UK National Institute for Health and Care Excellence (NICE) have recently published guidelines for investigating patients with suspected coronary artery disease (CAD). Both provide a risk score (RS) to assess the pre-test probability for CAD to guide clinicians to undertake the most effective investigation. The aim of the study was to establish whether there is a difference between the two RS models. We retrospectively reviewed records of 479 patients who presented to a UK district general hospital with chest pain between August 2011 and April 2013. The RS was calculated using ESC and NICE guidelines and compared. From the 479 patients, 277 (58%) were male and the mean age was 60 years. The mean RS was greater using NICE guidelines compared with ESC (66.3 vs 47.9%, 18.4% difference; p<0.0001). The difference in mean RS was smaller in patients with typical chest pain (13.0%). When we divided the cohort based on NICE criteria into 'high'- and 'low'-risk groups, the difference in the mean RS was 24.3% in the 'high'-risk group (p<0.001) compared with 2.8% in the 'low'-risk group. The UK NICE risk score model overestimates risk compared with the ESC model. PMID:26031971
Y Ohno; T Kaneko; T Inoue; Y Morikawa; T Yoshida; A Fujii; M Masuda; T Ohno; M Hayashi; J Momma; T Uchiyama; K Chiba; N Ikeda; Y Imanishi; H Itakagaki; H Kakishima; Y Kasai; A Kurishita; H Kojima; K Matsukawa; T Nakamura; K Ohkoshi; H Okumura; K Saijo; K Sakamoto; T Suzuki; K Takano; H Tatsumi; N Tani; M Usami; R Watanabe
A three-step interlaboratory validation of alternative methods to the Draize eye irritation test (Draize test) was conducted by the co-operation of 27 organizations including national research institutes, universities, cosmetic industries, kit suppliers and others. Twelve alternative methods were evaluated using 38 cosmetic ingredients and isotonic sodium chloride solution. Draize tests were conducted according to the OECD guidelines using the same
Hammerschlag, Margaret R.; Guillén, Christina D.
Summary: Testing for sexually transmitted infections (STIs) in children presents a number of problems for the practitioner that are not usually faced when testing adults for the same infections. The identification of an STI in a child can have, in addition to medical implications, serious legal implications. The presence of an STI is often used to support the presence or allegations of sexual abuse, and conversely, the identification of an STI in a child will prompt an investigation of possible abuse. The purpose of this paper is to review the epidemiology of child sexual abuse, including the epidemiology of major STIs including Neisseria gonorrhoeae, Chlamydia trachomatis, syphilis, herpes simplex virus (HSV), Trichomonas vaginalis, and human papillomavirus, and the current recommendations for diagnostic testing in this population. PMID:20610820
Crawford, John R; Garthwaite, Paul H; Slick, Daniel J
Normative data for neuropsychological tests are often presented in the form of percentiles. One problem when using percentile norms stems from uncertainty over the definitional formula for a percentile. (There are three co-existing definitions and these can produce substantially different results.) A second uncertainty stems from the use of a normative sample to estimate the standing of a raw score in the normative population. This uncertainty is unavoidable but its extent can be captured using methods developed in the present paper. A set of reporting standards for the presentation of percentile norms in neuropsychology is proposed. An accompanying computer program (available to download) implements these standards and generates tables of point and interval estimates of percentile ranks for new or existing normative data. PMID:19322734
Tsegaye, Mulugeta Tarekegne; De Bleser, Ria; Iribarren, Carolina
Most studies investigating the impact of literacy on oral language processing have shown that literacy provides phonological awareness skills in the processing of oral language. The implications of these results on aphasia tests could be significant and pose questions on the adequacy of such tools for testing non-literate individuals. Aiming at examining the impact of literacy on oral language processing and its implication on aphasia tests, this study tested 12 non-literate and 12 literate individuals with a modified Amharic version of the Bilingual Aphasia Test (Paradis and Amberber, 1991, Bilingual Aphasia Test. Amharic version. Hillsdale, NJ: Lawrence Erlbaum.). The problems of phonological awareness skills in oral language processing in non-literates are substantiated. In addition, compared with literate participants, non-literate individuals demonstrated difficulties in the word/sentence-picture matching tasks. This study has also revealed that the Amharic version of the Bilingual Aphasia Test may be viable for testing Amharic-speaking non-literate individuals with aphasia when modifications are incorporated. PMID:21631306
Sandman, Peter M.; Weinstein, Neil D.
Analysis of 4 New Jersey studies of 3,329 homeowners found that (1) thinking about radon testing is predicted by general radon knowledge; (2) decision to test is related to perceived likelihood of risk; and (3) actual testing is influenced by situational factors such as locating and choosing test kits. (SK)
van der Linden, Wim J.
Traditionally, error in equating observed scores on two versions of a test is defined as the difference between the transformations that equate the quantiles of their distributions in the sample and population of test takers. But it is argued that if the goal of equating is to adjust the scores of test takers on one version of the test to make…
Maza, Paul Sadiri
In recent years, technological advances such as computers have been employed in teaching gross anatomy at all levels of education, even in professional schools such as medical and veterinary medical colleges. Benefits of computer based instructional tools for gross anatomy include the convenience of not having to physically view or dissect a cadaver. Anatomy educators debate over the advantages versus the disadvantages of computer based resources for gross anatomy instruction. Many studies, case reports, and editorials argue for the increased use of computer based anatomy educational tools, while others discuss the necessity of dissection for various reasons important in learning anatomy, such as a three-dimensional physical view of the specimen, physical handling of tissues, interactions with fellow students during dissection, and differences between specific specimens. While many articles deal with gross anatomy education using computers, there seems to be a lack of studies investigating the use of computer based resources as an assessment tool for gross anatomy, specifically using the Apple application QuickTime Virtual Reality (QTVR). This study investigated the use of QTVR movie modules to assess if using computer based QTVR movie module assessments were equal in quality to actual physical specimen examinations. A gross anatomy course in the College of Veterinary Medicine at Cornell University was used as a source of anatomy students and gross anatomy examinations. Two groups were compared, one group taking gross anatomy examinations in a traditional manner, by viewing actual physical specimens and answering questions based on those specimens. The other group took the same examinations using the same specimens, but the specimens were viewed as simulated three-dimensional objects in a QTVR movie module. Sample group means for the assessments were compared. A survey was also administered asking students' perceptions of quality and user-friendliness of the QTVR movie modules. The comparison of the two sample group means of the examinations show that there was no difference in results between using QTVR movie modules to test gross anatomy knowledge versus using physical specimens. The results of this study are discussed to explain the benefits of using such computer based anatomy resources in gross anatomy assessments.
Ibitoye, Mobolaji; Frasca, Timothy; Giguere, Rebecca; Carballo-Diéguez, Alex
The recent approval in the United States of the first rapid home test to diagnose HIV raises questions about its potential use and impact. We reviewed the existing literature on the unassisted use of home tests involving self-collection and testing of biological samples by untrained users – including existing HIV self-testing studies – to shed some light on what can be expected from the availability of the HIV home test. The studies reviewed showed that most participants could properly perform home tests, obtain accurate results, and interpret them – yielding high correlations with laboratory and health-professional performed tests. Users often had trouble performing blood-based tests. Participants generally understood the need to confirm positive test results. Materials accompanying HIV home tests should emphasize symptoms of acute infection and the need for additional testing when recent infection is suspected. Different home-test-based screening modalities, personalized HIV-counseling resources and HIV home test impact evaluation methods should be studied. PMID:24281697
Aktar, Sirac; Akdeniz, Necmettin; Calka, Omer; Karadag, Ayse Serap
Introduction Some previous studies reported autoimmunity as an etiologic factor in chronic urticaria (CU), but the results of some autoimmunity tests in these studies are conflicting. Aim To concretize whether there was any relation of autologous serum skin test (ASST) and autologous plasma skin test (APST) results with sex, age and urticarial activity score (UAS) in patients with CU. Material and methods Fifty patients with CU and twenty healthy subjects admitted to our dermatology clinic were included in the present study. The ASST and APST were applied to all individuals. Results The positiveness rates of ASST and APST were significantly higher in the patient group than controls (p = 0.027, p = 0.001, respectively). Among patients, the APST positiveness rate (72%) was significantly (p < 0.05) higher than ASST (46%). It was seen that 48% of patients with negative ASST results had positive APST. However, no patient with negative APST results had positive ASST. There were significant (p < 0.05) relations of the tests’ positiveness rates with sex and old age but with UAS. The diameter of the erythematous papule was remarkably (p < 0.05) larger in APST than ASST and also significantly (p < 0.05) larger in females compared to males in both tests (p < 0.05). It was positively increased with old age (p < 0.05). Conclusions We can suggest that APST is more sensitive than ASST in the assessment of autoimmunity in CU. A high positiveness rate of APST results may be attributed to high numbers of autoantibodies and coagulation factors present in plasma that might probably play a role in etiopathogenesis of CU. PMID:26161057
Wiberg, Marie; van der Linden, Wim J.
Two methods of local linear observed-score equating for use with anchor-test and single-group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed-score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980)…
Roberts, Mary Roduta; Gierl, Mark J.
This paper presents a framework to provide a structured approach for developing score reports for cognitive diagnostic assessments ("CDAs"). Guidelines for reporting and presenting diagnostic scores are based on a review of current educational test score reporting practices and literature from the area of information design. A sample diagnostic…
Educational Research Service, Arlington, VA.
Set against the backdrop of a decade in which college admissions test scores have declined, this report reviews issues affecting college admissions testing and their implications, and focuses specifically on the debate between the makers and supporters of standardized tests and test critics. Overviews of the College Entrance Examination Board's…
K. D. Graves; B. N. Peshkin; G. Luta; W. Tuong; M. D. Schwartz
Background: Advances in genomics may eventually lead to ‘personalized genetic medicine,’ yet the clinical utility of predictive testing for modest changes in risk is unclear. We explored interest in genetic testing for genes related to modest changes in breast cancer risk in women at moderate to high risk for breast cancer. Methods: Women (n = 105) with a negative breast
This paper reviews foreign and domestic sexism research and practice in language testing and reveals that China lags behind in this sociolinguistics perspective in both theoretical study and practice. The paper indicates that sexism is represented in the listening comprehension section in National Matriculation English Test (NMET) after a case…
Winter, David G.
The Advisory Panel on the Scholastic Aptitude Test (SAT) Score Decline contends that many factors have contributed to the drop in scores on the SAT and many other tests. The research evidence and theory about three social motives that could be expected to play some role in test performance and academic functioning are examined: the motives for…
Prince, Joan Marie
Over the past years, progress in Black academic achievement, particularly in the area of science, has generally slowed or ceased. According to the 1994 NAEP assessment, twelfth-grade Black students are performing at the level of White eighth-grade students in the discipline of science (Department of Education, 1996). These students, in their last year of required schooling, are about to graduate, yet they lag at least four years behind their white counterparts in science achievement. Despite the establishment and implementation of numerous science intervention programs, Black students still suffer from a disparate gap in standardized test score achievement. The purpose of this research is to investigate teachers' perceptions of the effectiveness of an urban sciences intervention tool that was designed to assist in narrowing the Black-White science academic achievement gap. Specifically, what factors affect teachers' personal sense of instructional efficacy, and how does this translate into their outcome expectancy for student academic success? A multiple-case, replicative design, grounded in descriptive theory, was selected for the study. Multiple sources of evidence were queried to provide robust findings. These sources included a validated health sciences self-efficacy instrument, an interview protocol, a classroom observation, and a review of archival material that included case study participants' personnel files and meeting minutes. A cross-comparative analytic approach was selected for interpretation (Yin, 1994). Findings indicate that teachers attribute the success or failure of educational intervention tools in closing the Black-White test score gap to a variety of internal and external factors. These factors included a perceived lack of both monetary and personal support by the school leadership, as well as a perceived lack of parental involvement which impacted negatively on student achievement patterns. The case study participants displayed a depressed outcome expectancy effect for successful student achievement, which they directly attributed to the barriers stated above. If educational reforms are to be successful, the issues of teachers' perceptions of factors that inhibit their personal ability to instruct, and how that translates to student academic achievement must be addressed.
Lord, Thomas R; Clausen-May, Tandi
Questions from a national mathematics test taken by over a 1,000 12- and 13-yr.-olds in the United Kingdom were perused for heavily loaded spatial and numerical items. Pupils' answers to the two types of item were examined, and those who answered well in one of the categories but poorly in the other were selected to form two groups, those high in spatial thinking but low in numerical thinking and those high in numerical thinking but low in spatial thinking to assess whether the approaches to solve the item used by each group were different. The two groups did indeed utilize different thinking strategies to solve the questions. For example, questions involving angle and volume, items thought to require a high spatial facility, were answered correctly by the predominantly numerical thinkers as often as by the predominantly spatial thinkers. This would indicate that one of the samples, i.e., numerical thinkers, used a different strategy than the other, i.e., spatial thinkers. This was verified by examination of students' work in the test booklets and personal interviews of them. Also, the same proportion of boys in each group was recorded, but a higher percentage of girls was recorded in the spatial group than in the numerical. This reflected the large number of boys who scored high on the spatial measures also doing moderately well on the numerical items and so, being moved out of the solely spatial group. Since the tests used by mathematics educators to assess learning are so heavily laden with linguistic/analytical rather than holistic/spatial types of questions, pupils high in spatial but low in numerical thinking face a severe handicap in schools. PMID:14604026
Steinheiser, Frederick H., Jr.; Hirshfeld, Stephen L.
The scientific implications and practical applications of the Stein estimator approach for estimating true scores from observed scores are of potentially great importance. The conceptual complexity is not much greater than that required for more conventional regression models. The empirical Bayesian aspect allows the examiner to incorporate…
Lee, Tayla T. C.; Graham, John R.; Sellbom, Martin; Gervais, Roger O.
Using a sample of individuals undergoing medico-legal evaluations (690 men, 519 women), the present study extended past research on potential gender biases for scores of the Symptom Validity (FBS) scale of the Minnesota Multiphasic Personality Inventory-2 by examining score- and item-level differences between men and women and determining the…
von Davier, Alina A.; Wilson, Christine
Dorans and Holland (2000) and von Davier, Holland, and Thayer (2003) introduced measures of the degree to which an observed-score equating function is sensitive to the population on which it is computed. This article extends the findings of Dorans and Holland and of von Davier et al. to item response theory (IRT) true-score equating methods that…
Anjeh, Divine; Caputo, Jennifer; Armani, Sossi
This study seeks to investigate the implications of high stakes state-mandated testing on the educational future of language minority learners. It sets off with a definition of high stakes state-mandated testing and proceeds with an in depth review of the incidence and ramification of high stakes testing and its impact on Less English Proficient…
Kim, Bora; Koo, Min-Seong; Jun, Jin-Yong; Park, Il Ho; Oh, Dong-Yul
Objective The aim of this study was to evaluate the association between a variable number of tandem repeats polymorphism at the dopamine D4 receptor gene (DRD4) and the performance of children with attention deficit hyperactivity disorder (ADHD) in a continuous performance test (CPT). Methods This study included 72 ADHD children (mean age=9.39±2.05 years) who were recruited from one child psychiatric clinic. The omission errors, commission errors, reaction time and reaction standardization in the CPT were computed. The number of 48-base pairs tandem repeats in the exon III of DRD4 was analyzed in a blind manner. Results The homozygosity of the 4-repeat allele at DRD4 was significantly associated with fewer commission errors (t=2.364, df=28.685, p=0.025) and standard deviation of reaction time (t=2.351, df=24.648, p=0.027) even after adjusting for age. The results of analyses of CPT measured values among three groups showed that the group with higher frequency of the 4-repeat allele showed a lower mean score of commission errors (F=4.268, df=2, p=0.018). Conclusion These results suggest a protective role of 4-repeat allele of the DRD4 polymorphisms on commission errors in the CPT in children with ADHD. PMID:20046398
Looney, Marilyn A
Given that equating/linking applications are now appearing in kinesiology literature, this article provides an overview of the different types of linked test scores: equated, concordant, and predicted. It also addresses the different types of evidence required to determine whether the scores from two different field tests (measuring the same construct) can be used interchangeably (equated). Thus, evidence issues are addressed to help the reader determine whether the creation of equipercentile equated or concordant scores is appropriate and useful. The article is organized according to the following issues: (a) the degree to which the two tests measure the same construct, (b) the equating/linking process, (c) the evaluation of equating/ linking function, and (d) stability of equating/linking function across populations. PMID:23611008
Oris, J.T. [Miami Univ., Oxford, OH (United States). Center for Environmental Toxicology and Statistics
The purpose of this study was to describe statistical procedures to test the equivalence of concentration-response relationships in acute toxicology studies and to illustrate the implications of nonequivalence on potency endpoints such as LC10, LC50, or LC90. A logistic regression model for binary response endpoints such as mortality that allowed for the examination of equivalence of slopes and intercepts of the responses between populations is described. Test statistics were derived from comparing nested regression models. This procedure was used to test the equivalence of concentration versus acute mortality response relationships between two nonpolar, narcotic chemicals in a single population of fish and between two populations of fish with different exposure histories to a polycyclic aromatic hydrocarbon. These case studies illustrate different outcomes in the comparison of concentration-response relationships and demonstrate the need to consider more than a single endpoint (e.g., LC50) in a risk assessment context when nonparallel concentration-responses are observed.
Kirkwood, Michael W; Yeates, Keith Owen; Randolph, Christopher; Kirk, John W
If an examinee exerts inadequate effort to perform well during a psychological or neuropsychological exam, the resulting data will represent an inaccurate representation of the individual's true abilities and difficulties. In adult populations, methodologies to identify noncredible effort have grown exponentially in the last 2 decades. Though a comparatively modest amount of work has focused on tools to identify noncredible effort in pediatric populations, recent research has demonstrated that children can consistently pass several stand-alone symptom validity tests (SVTs) using cutoffs established with adults. However, no identified studies have examined the implications of pediatric SVT failure for ability-based test performance. The current sample consisted of 276 children aged 8-16 years referred consecutively for outpatient clinical neuropsychological consultation following mild traumatic brain injury (TBI). An earlier subgroup of this same case series that also included 17-year-olds was presented in Kirkwood and Kirk (2010). Nineteen percent of the current sample performed below the actuarial cutoff on the Medical Symptom Validity Test (MSVT). No background or injury-related variable differentiated those who passed from those who failed the MSVT. Performance on the MSVT was correlated significantly with performance on all ability-based tests and explained 38% of the total ability-based test variance. Participants failing the MSVT performed significantly worse on nearly all neuropsychological tests, with large effect sizes apparent across most tests. The results provide compelling evidence that practitioners should add objective SVTs to the evaluation of school-aged youth, even when secondary gain issues might not be readily apparent and particularly following mild TBI. PMID:21767023
Motloch, Chester George; Belt, Jeffrey R; Hunt, Gary Lynn; Ashton, Clair Kirkendall; Murphy, Timothy Collins; Miller, Ted J.; Coates, Calvin; Tataria, H. S.; Lucas, Glenn E.; Duong, T.Q.; Barnes, J.A.; Sutula, Raymond
Nickel Metal-Hydride (NiMH) is an advanced high-power battery technology that is presently employed in Hybrid Electric Vehicles (HEVs) and is one of several technologies undergoing continuing research and development by FreedomCAR. Unlike some other HEV battery technologies, NiMH exhibits a strong hysteresis effect upon charge and discharge. This hysteresis has a profound impact on the ability to monitor state-of-charge and battery performance. Researchers at the Idaho National Engineering and Environmental Laboratory (INEEL) have been investigating the implications of NiMH hysteresis on HEV battery testing and performance. Experimental results, insights, and recommendations are presented.
Research conducted on the relationship between international students’ TOEFL scores and academic performance has produced contradictory results. To examine this relationship, a meta-analysis was conducted on extant studies centered on international...
STUART G ITZKOWITZ
The Student Development Task Inventory, second edition, was administered to 234 Northern Black freshmen and sophomores at five urban post secondary institutions. Comparisons were made between collected and reported (Southern) scores by race, sex, and class level. Southern scores used for the comparison were those in the test manual.^ Generally Northern scores were lower than Southern scores. No significant differences
Sinharay, Sandip; Holland, Paul W.
The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each…
Jamrozik, J; Schaeffer, L R
Test-day (TD) records of milk, fat-to-protein ratio (F:P) and somatic cell score (SCS) of first-lactation Canadian Holstein cows were analysed by a three-trait finite mixture random regression model, with the purpose of revealing hidden structures in the data owing to putative, sub-clinical mastitis. Different distributions of the data were allowed in 30 intervals of days in milk (DIM), covering the lactation from 5 to 305 days. Bayesian analysis with Gibbs sampling was used for model inferences. Estimated proportion of TD records originated from cows infected with mastitis was 0.66 in DIM from 5 to 15 and averaged 0.2 in the remaining part of lactation. Data from healthy and mastitic cows exhibited markedly different distributions, with respect to both average value and the variance, across all parts of lactation. Heterogeneity of distributions for infected cows was also apparent in different DIM intervals. Cows with mastitis were characterized by smaller milk yield (down to -5 kg) and larger F:P (up to 0.13) and SCS (up to 1.3) compared with healthy contemporaries. Differences in averages between healthy and infected cows for F:P were the most profound at the beginning of lactation, when a dairy cow suffers the strongest energy deficit and is therefore more prone to mammary infection. Residual variances for data from infected cows were substantially larger than for the other mixture components. Fat-to-protein ratio had a significant genetic component, with estimates of heritability that were larger or comparable with milk yield, and was not strongly correlated with milk and SCS on both genetic and environmental scales. Daily milk, F:P and SCS are easily available from milk-recording data for most breeding schemes in dairy cattle. Fat-to-protein ratio can potentially be a valuable addition to SCS and milk yield as an indicator trait for selection against mastitis. PMID:22225580
Rotou, Ourania; Elmore, Patricia B.; Headrick, Todd C.
This study investigated the number-correct scoring method based on different theories (classical true-score theory and multidimensional item response theory) when a standardized test requires more than one ability for an examinee to get a correct response. The number-correct scoring procedure that is widely used is the one that is defined in…
Letko, M D
Apgar scores are determined for every neonate born in a U.S. hospital. Despite the frequency with which the scores are calculated, they are not always accurate. In addition, some individuals attempt to use the scores to substantiate certain claims, such as birth asphyxia. This article discusses some of the common misunderstandings and limitations of the Apgar score and suggests measures for improvement. PMID:8708830
Jamrozik, J; Schaeffer, L R
Multiple-trait (MT) finite mixture random regression (MIX) model was applied using Bayesian methods to first lactation test-day (TD) milk yield and somatic cell score (SCS) of Canadian Holsteins, allowing for heterogeneity of distributions with respect to days in milk (DIM) in lactation. The assumption was that the associations between patterns of variation in these traits and mastitis would allow revealing the hidden structure in the data distribution because of unknown health status of cows. The MIX model assumed separate means and residual co-variance structures for two components in four intervals of lactation, in addition to fitting the fixed effect of herd-test-day, and fixed and random regressions with Legendre polynomials. Results indicated that the mixture model was superior to standard MT model, as supported by the Bayes factor. Approximately 20% of TD records were classified as originated from cows with a putative, sub-clinical form of mastitis. The proportion of records from mastitic cows was the largest at the beginning of lactation. The MIX model exhibited different distributions of data from healthy and infected cows in different parts of lactation. Records from sick cows were characterized by larger (smaller) means for SCS (milk) and larger variances. Residual, and daily genetic and environmental correlations between milk and SCS were smaller from the MIX model when compared with MT estimates. Heritabilities of both traits differed significantly among records from healthy, sick and MT model estimates. Both models fitted milk records from healthy cows relatively well. The ability of the MT model in handling SCS records, measured by model residuals, was low, but improved substantially, however, where the data were allowed to be separated into two components in the MIX parameterization. Correlations between estimated breeding values (EBV) for sires from both models were very high for cumulative milk yield (>0.99) and slightly lower (0.95 in the interval from 5 to 45?DIM) for daily SCS. EBV for SCS from MT and MIX models were weakly correlated with posterior probability of sub-clinical mastitis on the phenotypic scale. PMID:20831560
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…
Luo, Yong; Gou, Xin; Huang, Peng; Mou, Chan
The specificity of prostate-specific antigen (PSA) for early intervention in repeat biopsy is unsatisfactory. Prostate cancer antigen 3 (PCA3) may be more accurate in outcome prediction than other methods for the early detection of prostate cancer (PCa). However, the results were inconsistent in repeated biopsies. Therefore, we performed a systematic review and meta-analysis to evaluate the role of PCA3 in outcome prediction. A systematic bibliographic search was conducted for articles published before April 2013, using PubMed, Medline, Web of Science, Embase and other databases from health technology assessment agencies. The quality of the studies was assessed on the basis of QUADAS criteria. Eleven studies of diagnostic tests with moderate to high quality were selected. A meta-analysis was carried out to synthesize the results. The results of the meta-analyses were heterogeneous among studies. We performed a subgroup analysis (with or without inclusion of high-grade prostatic intraepithelial neoplasia (HGPIN) and atypical small acinar proliferation (ASAP)). Using a PCA3 cutoff of 20 or 35, in the two sub-groups, the global sensitivity values were 0.93 or 0.80 and 0.79 or 0.75, specificities were 0.65 or 0.44 and 0.78 or 0.70, positive likelihood ratios were 1.86 or 1.58 and 2.49 or 1.78, negative likelihood ratios were 0.81 or 0.43 and 0.91 or 0.82 and diagnostic odd ratios (ORs) were 5.73 or 3.45 and 7.13 or 4.11, respectively. The areas under the curve (AUCs) of the summary receiver operating characteristic curve were 0.85 or 0.72 and 0.81 or 0.69, respectively. PCA3 can be used for repeat biopsy of the prostate to improve accuracy of PCa detection. Unnecessary biopsies can be avoided by using a PCa cutoff score of 20. PMID:24713827
Null, Elizabeth Higgins
East Feliciana Parish (Louisiana) has raised achievement scores by involving students in hands-on projects related to community needs and resources. Project Connect, a hands-on science and math program begun by the Delta Rural Systemic Initiative, has expanded into a comprehensive place-based program. In response to new state standards, teams of…
Weaver, Gabriela C.
Examines the database for the National Educational Longitudinal Study (NELS:88) for connections between student use of computers in math and science classes and their academic success. Finds that computer use was significantly correlated with gender, socioeconomic status, parent's level of education, and Item Response Theory (IRT) scores.…
Waldron, Chad H.
The research study examined whether a difference existed between the reading achievement scores of an experimental group and a control group in standardized reading achievement. This difference measured the effect of systematic oral reading fluency instruction with repeated readings. Data from the 4Sight Pennsylvania Benchmark Reading Assessments…
Wilcox, Rand R.
A procedure for determining the reliability of an examinee knowing k out of n possible multiple choice items given his or her performance on those items is presented. Also, a scoring procedure for determining which items an examinee knows is presented. (Author/JKS)
Zheng, Zheng; Merz, Kenneth M.
A central problem in de novo drug design is determining the binding affinity of a ligand with a receptor. A new scoring algorithm is presented that estimates the binding affinity of a protein-ligand complex given a three-dimensional structure. The method, LISA (Ligand Identification Scoring Algorithm), uses an empirical scoring function to describe the binding free energy. Interaction terms have been designed to account for van der Waals (VDW) contacts, hydrogen bonding, desolvation effects and metal chelation to model the dissociation equilibrium constants using a linear model. Atom types have been introduced to differentiate the parameters for VDW, H-bonding interactions and metal chelation between different atom pairs. A training set of 492 protein-ligand complexes was selected for the fitting process. Different test sets have been examined to evaluate its ability to predict experimentally measured binding affinities. By comparing with other well known scoring functions, the results show that LISA has advantages over many existing scoring functions in simulating protein-ligand binding affinity, especially metalloprotein-ligand binding affinity. Artificial Neural Network (ANN) was also used in order to demonstrate that the energy terms in LISA are well designed and do not require extra cross terms. PMID:21561101
Porras, Carolina; Wentzensen, Nicolas; Rodríguez, Ana C; Morales, Jorge; Burk, Robert D; Alfaro, Mario; Hutchinson, Martha; Herrero, Rolando; Hildesheim, Allan; Sherman, Mark E; Wacholder, Sholom; Solomon, Diane; Schiffman, Mark
Human papillomavirus (HPV) testing is more sensitive than cytology; some cervical cancer prevention programs will switch from cytology to carcinogenic HPV test-based screening. The objective of our study is to evaluate the clinical implications of a switch to HPV test-based screening on performance and workload of colposcopy. Women in the population-based, 7-year Guanacaste cohort study were screened at enrollment using cytology. We also took another specimen for HPV DNA testing and collected magnified cervical photographic images (cervigrams). A final case diagnosis (?cervical intraepithelial neoplasia [CIN] grade 3, CIN2,
Tsang-Long Pao; Wei-Chih Pan; Hsiu-Wen Cheng
Abstract-In formal entrance examination, after the score is being graded, it still requires quite a lot ofeffort to record ,the score into database. These processes are inefficient, time-consuming and laborious. Therefore, we propose an Automatic Score Recording ,System ,that uses ,the ,image processing techniques to simplify the procedure and speed up the process. In the proposed system, we use adigital,camera
A suburban Philadelphia district set aside $100,000 for merit-pay (bonuses) for individuals and groups of teachers. Although teachers are resistant, vowing to give to charity any bonuses linked to test scores, morale and scores have improved. Cincinnati and Castle Rock, Colorado, have workable plans. (MLH)
van der Linden, Wim J.
This article is a response to the commentaries on the position paper on observed-score equating by van der Linden (this issue). The response focuses on the more general issues in these commentaries, such as the nature of the observed scores that are equated, the importance of test-theory assumptions in equating, the necessity to use multiple…
Briggs, Derek C.; Dadey, Nathan
This study focuses on an instance in which the mean grade-to-grade scale scores on a vertical scale showed evidence of common test items that do not get easier from one grade to the next. The issue was examined as part of a 2-day workshop in which participants were asked to predict the growth on all linking items used in the construction of…
Evaluation of first generation vaccines against human leishmaniasis and the implication for leishmaniasis 30 1.6.2 Leishmanin skin test (LST) and its application in vaccine clinical trials 30 1.7 Vaccine (Leishmanization) 33 1.7.4 Whole parasite vaccines 34 220.127.116.11 Killed whole parasite (first generation) prophylactic
Julian B. Rotter
The purpose of this paper is to explicate some of the implications of a social learning theory of personality for the measurement of personality variables. The particular point of emphasis is the measurement of goal directed behavior conceptualized in social learning terms as need potential. Secondarily, the paper aims at illustrating the nature of the relationship between testing procedures and
Wolfgang Kunz; Dhiraj K. Pradhan
Motivated by the problem of test pattern generation in digital circuits, this paper presents a novel technique called recursive learning that is able to perform a logic analysis on digital circuits. By recursively calling certain learning functions, it is possible to extract all logic dependencies between signals in a circuit and to perform precise implications for a given set of