Sample records for test scores implications

  1. Does Test Preparation Work? Implications for Score Validity

    ERIC Educational Resources Information Center

    Xie, Qin

    2013-01-01

    This article reports an empirical study that examined the pattern of test preparation for College English Test Band 4 (CET4) and the differential effects of test preparation practices on its scores, thereby drawing implications for CET4 score validity. Data collection involved 1,003 test takers of CET4. A pretest was administered at the beginning…

  2. Prediction of true test scores from observed item scores and ancillary data.

    PubMed

    Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

    2015-05-01

    In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.

  3. A process dissociation approach to objective-projective test score interrelationships.

    PubMed

    Bornstein, Robert F

    2002-02-01

    Even when self-report and projective measures of a given trait or motive both predict theoretically related features of behavior, scores on the 2 tests correlate modestly with each other. This article describes a process dissociation framework for personality assessment, derived from research on implicit memory and learning, which can resolve these ostensibly conflicting results. Research on interpersonal dependency is used to illustrate 3 key steps in the process dissociation approach: (a) converging behavioral predictions, (b) modest test score intercorrelations, and (c) delineation of variables that differentially affect self-report and projective test scores. Implications of the process dissociation framework for personality assessment and test development are discussed.

  4. The Implications of Family Size and Birth Order for Test Scores and Behavioral Development

    ERIC Educational Resources Information Center

    Silles, Mary A.

    2010-01-01

    This article, using longitudinal data from the National Child Development Study, presents new evidence on the effects of family size and birth order on test scores and behavioral development at age 7, 11 and 16. Sibling size is shown to have an adverse causal effect on test scores and behavioral development. For any given family size, first-borns…

  5. Pedagogical Implications of Score Distribution Pattern and Learner Satisfaction in an Intensive TOEIC Course

    ERIC Educational Resources Information Center

    Kang, Che Chang

    2014-01-01

    The study aimed at investigating TOEIC score distribution patterns and learner satisfaction in an intensive TOEIC course and drew implications for pedagogical practice. A one-group pre-test post-test experiment and a survey on learner satisfaction were conducted on Taiwanese college EFL students (n = 50) in a case study. Results showed that the…

  6. The Mediating Effect of Listening Metacognitive Awareness between Test-Taking Motivation and Listening Test Score: An Expectancy-Value Theory Approach

    PubMed Central

    Xu, Jian

    2017-01-01

    The present study investigated test-taking motivation in L2 listening testing context by applying Expectancy-Value Theory as the framework. Specifically, this study was intended to examine the complex relationships among expectancy, importance, interest, listening anxiety, listening metacognitive awareness, and listening test score using data from a large-scale and high-stakes language test among Chinese first-year undergraduates. Structural equation modeling was used to examine the mediating effect of listening metacognitive awareness on the relationship between expectancy, importance, interest, listening anxiety, and listening test score. According to the results, test takers’ listening scores can be predicted by expectancy, interest, and listening anxiety significantly. The relationship between expectancy, interest, listening anxiety, and listening test score was mediated by listening metacognitive awareness. The findings have implications for test takers to improve their test taking motivation and listening metacognitive awareness, as well as for L2 teachers to intervene in L2 listening classrooms. PMID:29312063

  7. The Mediating Effect of Listening Metacognitive Awareness between Test-Taking Motivation and Listening Test Score: An Expectancy-Value Theory Approach.

    PubMed

    Xu, Jian

    2017-01-01

    The present study investigated test-taking motivation in L2 listening testing context by applying Expectancy-Value Theory as the framework. Specifically, this study was intended to examine the complex relationships among expectancy, importance, interest, listening anxiety, listening metacognitive awareness, and listening test score using data from a large-scale and high-stakes language test among Chinese first-year undergraduates. Structural equation modeling was used to examine the mediating effect of listening metacognitive awareness on the relationship between expectancy, importance, interest, listening anxiety, and listening test score. According to the results, test takers' listening scores can be predicted by expectancy, interest, and listening anxiety significantly. The relationship between expectancy, interest, listening anxiety, and listening test score was mediated by listening metacognitive awareness. The findings have implications for test takers to improve their test taking motivation and listening metacognitive awareness, as well as for L2 teachers to intervene in L2 listening classrooms.

  8. Do Test Scores Buy Happiness?

    ERIC Educational Resources Information Center

    McCluskey, Neal

    2017-01-01

    Since at least the enactment of No Child Left Behind in 2002, standardized test scores have served as the primary measures of public school effectiveness. Yet, such scores fail to measure the ultimate goal of education: maximizing happiness. This exploratory analysis assesses nation level associations between test scores and happiness, controlling…

  9. Predicting occupational personality test scores.

    PubMed

    Furnham, A; Drakeley, R

    2000-01-01

    The relationship between students' actual test scores and their self-estimated scores on the Hogan Personality Inventory (HPI; R. Hogan & J. Hogan, 1992), an omnibus personality questionnaire, was examined. Despite being given descriptive statistics and explanations of each of the dimensions measured, the students tended to overestimate their scores; yet all correlations between actual and estimated scores were positive and significant. Correlations between self-estimates and actual test scores were highest for sociability, ambition, and adjustment (r = .62 to r = .67). The results are discussed in terms of employers' use and abuse of personality assessment for job recruitment.

  10. Exploring a Source of Uneven Score Equity across the Test Score Range

    ERIC Educational Resources Information Center

    Huggins-Manley, Anne Corinne; Qiu, Yuxi; Penfield, Randall D.

    2018-01-01

    Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have…

  11. Do Examinees Understand Score Reports for Alternate Methods of Scoring Computer Based Tests?

    ERIC Educational Resources Information Center

    Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G.

    2011-01-01

    This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…

  12. How Accurate Is a Test Score?

    ERIC Educational Resources Information Center

    Doppelt, Jerome E.

    1956-01-01

    The standard error of measurement as a means for estimating the margin of error that should be allowed for in test scores is discussed. The true score measures the performance that is characteristic of the person tested; the variations, plus and minus, around the true score describe a characteristic of the test. When the standard deviation is used…

  13. What Do Test Score Really Mean? A Latent Class Analysis of Danish Test Score Performance

    ERIC Educational Resources Information Center

    McIntosh, James; Munk, Martin D.

    2014-01-01

    Latent class Poisson count models are used to analyse a sample of Danish test score results from a cohort of individuals born in 1954-1955, tested in 1968, and followed until 2011. The procedure takes account of unobservable effects as well as excessive zeros in the data. We show that the test scores measure manifest or measured ability as it has…

  14. Relationship of Friends, Physical Education, and State Test Scores: Implications for School Counselors

    ERIC Educational Resources Information Center

    Hollingsworth, Mary Ann

    2010-01-01

    This study examined the relationship between dimensions of wellness and academic performance for 634 third through fifth grade students in Title One schools in rural Mississippi, using composites of the Five Factor Wellness Inventory for Elementary Children and Reading, Language, and Math Scores of the Mississippi Curriculum Test (a state level…

  15. Validating Test Score Meaning and Defending Test Score Use: Different Aims, Different Methods

    ERIC Educational Resources Information Center

    Cizek, Gregory J.

    2016-01-01

    Advances in validity theory and alacrity in validation practice have suffered because the term "validity" has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test…

  16. The Impact of the 2004 Hurricanes on Florida Comprehensive Assessment Test Scores: Implications for School Counselors

    ERIC Educational Resources Information Center

    Baggerly, Jennifer; Ferretti, Larissa K.

    2008-01-01

    What is the impact of natural disasters on students' statewide assessment scores? To answer this question, Florida Comprehensive Assessment Test (FCAT) scores of 55,881 students in grades 4 through 10 were analyzed to determine if there were significant decreases after the 2004 hurricanes. Results reveal that there was statistical but no practical…

  17. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP

    ERIC Educational Resources Information Center

    Chudowsky, Naomi; Chudowsky, Victor

    2010-01-01

    In recent years, scores on the annual state reading and mathematics tests used for accountability have gone up in most states. These trends in state test scores do not always coincide, however, with trends on the National Assessment of Educational Progress (NAEP), the federally sponsored assessment that is administered periodically to…

  18. Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.

    ERIC Educational Resources Information Center

    Sachar, Jane; Suppes, Patrick

    1980-01-01

    The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)

  19. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Washington

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Washington's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) decreased in grade 4 reading. In grade 4 math, the percentage scoring proficient on the state test decreased…

  20. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Utah

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Utah's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 8 reading. In grade 4 reading, the percentage scoring proficient on the state test showed a…

  1. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Arkansas

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Arkansas's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) went up in math at grades 4 and 8. In reading, the percentages scoring proficient on the state test went up at…

  2. EDUCATION AND PSYCHOLOGICAL TEST SCORES

    PubMed Central

    Pershad, Dwarka; Verma, S. K.

    1980-01-01

    Education, a long neglected variable affecting psychological test score, is in search of reemphasis. Some evidence for this has accumulated on the psychological tests constructed and standardized here at the department of Psychiatry, P.G.I., Chandigarh. Tentative norms prepared education wise on WAIS-Verbal section, PGI-Memory Scale, Proverb and Similarity Tests, Psychoticism Questionnaire, and PGI MQN 2, for adults, in the age range of 16-50, are reported. The results showed marked difference in the mean scores of different educational categories and thus stressed the need for reporting norms separately for different educational levels. PMID:22064617

  3. Test/score/report: Simulation techniques for automating the test process

    NASA Technical Reports Server (NTRS)

    Hageman, Barbara H.; Sigman, Clayton B.; Koslosky, John T.

    1994-01-01

    A Test/Score/Report capability is currently being developed for the Transportable Payload Operations Control Center (TPOCC) Advanced Spacecraft Simulator (TASS) system which will automate testing of the Goddard Space Flight Center (GSFC) Payload Operations Control Center (POCC) and Mission Operations Center (MOC) software in three areas: telemetry decommutation, spacecraft command processing, and spacecraft memory load and dump processing. Automated computer control of the acceptance test process is one of the primary goals of a test team. With the proper simulation tools and user interface, the task of acceptance testing, regression testing, and repeatability of specific test procedures of a ground data system can be a simpler task. Ideally, the goal for complete automation would be to plug the operational deliverable into the simulator, press the start button, execute the test procedure, accumulate and analyze the data, score the results, and report the results to the test team along with a go/no recommendation to the test team. In practice, this may not be possible because of inadequate test tools, pressures of schedules, limited resources, etc. Most tests are accomplished using a certain degree of automation and test procedures that are labor intensive. This paper discusses some simulation techniques that can improve the automation of the test process. The TASS system tests the POCC/MOC software and provides a score based on the test results. The TASS system displays statistics on the success of the POCC/MOC system processing in each of the three areas as well as event messages pertaining to the Test/Score/Report processing. The TASS system also provides formatted reports documenting each step performed during the tests and the results of each step. A prototype of the Test/Score/Report capability is available and currently being used to test some POCC/MOC software deliveries. When this capability is fully operational it should greatly reduce the time necessary

  4. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP

    ERIC Educational Resources Information Center

    Chudowsky, Naomi; Chudowsky, Victor

    2010-01-01

    This report compares state math and reading proficiency scores in grades 4 and 8 to National Assessment of Educational Progress (NAEP) basic scores for the period of 2005 to 2009. The study found that scores on state tests and NAEP have increased in most states with sufficient data. Also included with the report are profiles for the 23 states that…

  5. Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.

    ERIC Educational Resources Information Center

    Sachar, Jane; Suppes, Patrick

    It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…

  6. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Ohio

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Ohio's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 4 reading and grade 8 math. In grade 8 reading, the percentage of students scoring proficient…

  7. Estimating the Reliability of a Test Battery Composite or a Test Score Based on Weighted Item Scoring

    ERIC Educational Resources Information Center

    Feldt, Leonard S.

    2004-01-01

    In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.

  8. Implications of Deployed and Nondeployed Fathers on Seventh Graders' California Achievement Test Scores during a Military Crisis.

    ERIC Educational Resources Information Center

    Pisano, Mark C.

    The differences in California Achievement Test (CAT) scores from 1990 to 1991 in seventh graders, currently enrolled in Albritton Junior High School in the Fort Bragg Schools, of deployed and nondeployed fathers were analyzed. CAT percentile scores from 1990 and 1991 (1991 being the year of "Desert Storm") were obtained in reading, math…

  9. ITC Guidelines on Quality Control in Scoring, Test Analysis, and Reporting of Test Scores

    ERIC Educational Resources Information Center

    Allalouf, Avi

    2014-01-01

    The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…

  10. Summary of Score Changes (in other Tests).

    ERIC Educational Resources Information Center

    Cleary, T. Anne; McCandless, Sam A.

    Scholastic Aptitude Test (SAT) scores have declined during the last 14 years. Similar score declines have been observed in many different testing programs, many groups, and tested areas. The declines, while not large in any given year, have been consistent over time, area, and group. The period around 1965 is critical for the interpretation of…

  11. Test Scores, Class Rank and College Performance: Lessons for Broadening Access and Promoting Success.

    PubMed

    Niu, Sunny X; Tienda, Marta

    2012-04-01

    Using administrative data for five Texas universities that differ in selectivity, this study evaluates the relative influence of two key indicators for college success-high school class rank and standardized tests. Empirical results show that class rank is the superior predictor of college performance and that test score advantages do not insulate lower ranked students from academic underperformance. Using the UT-Austin campus as a test case, we conduct a simulation to evaluate the consequences of capping students admitted automatically using both achievement metrics. We find that using class rank to cap the number of students eligible for automatic admission would have roughly uniform impacts across high schools, but imposing a minimum test score threshold on all students would have highly unequal consequences by greatly reduce the admission eligibility of the highest performing students who attend poor high schools while not jeopardizing admissibility of students who attend affluent high schools. We discuss the implications of the Texas admissions experiment for higher education in Europe.

  12. Testing Intelligently Includes Double-Checking Wechsler IQ Scores

    ERIC Educational Resources Information Center

    Kuentzel, Jeffrey G.; Hetterscheidt, Lesley A.; Barnett, Douglas

    2011-01-01

    The rigors of standardized testing make for numerous opportunities for examiner error, including simple computational mistakes in scoring. Although experts recommend that test scoring be double-checked, the extent to which independent double-checking would reduce scoring errors is not known. A double-checking procedure was established at a…

  13. Test Scores and Stereotypes.

    ERIC Educational Resources Information Center

    Gose, Ben

    1995-01-01

    A psychologist's research suggests that black and female students may have lower standardized test scores and academic achievement because they have accepted stereotypes concerning their ability. Critics feel the researcher, Claude M. Steele, may be overlooking other factors. Steele has developed a program a Stanford University (California) to…

  14. Using Patterns of Summed Scores in Paper-and-Pencil Tests and Computer-Adaptive Tests to Detect Misfitting Item Score Patterns

    ERIC Educational Resources Information Center

    Meijer, Rob R.

    2004-01-01

    Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…

  15. Facilitating the Interpretation of English Language Proficiency Scores: Combining Scale Anchoring and Test Score Mapping Methodologies

    ERIC Educational Resources Information Center

    Powers, Donald; Schedl, Mary; Papageorgiou, Spiros

    2017-01-01

    The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…

  16. The Truth about Scores Children Achieve on Tests.

    ERIC Educational Resources Information Center

    Brown, Jonathan R.

    1989-01-01

    The importance of using the standard error of measurement (SEm) in determining reliability in test scores is emphasized. The SEm is compared to the hypothetical true score for standardized tests, and procedures for calculation of the SEm are explained. (JDD)

  17. The Probability of Obtaining Two Statistically Different Test Scores as a Test Index

    ERIC Educational Resources Information Center

    Muller, Jorg M.

    2006-01-01

    A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…

  18. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Nevada

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Nevada's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP increased in grade 8 reading and math. Average annual gains were larger on the state test than on NAEP in both subjects. Trends in average (mean) test scores…

  19. Do Gains in Test Scores Explain Labor Market Outcomes?

    ERIC Educational Resources Information Center

    Rose, Heather

    2006-01-01

    Using data from the National Education Longitudinal Study of 1988, this article investigates whether students who made relatively large test score gains during high school had larger earnings 7 years after high school compared to students whose scores improved little. In models that control for pre-high school test scores, family background, and…

  20. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Louisiana

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Louisiana's test score trends through 2008-09. Between 2005 and 2009, trends on state tests and NAEP (National Assessment of Educational Progress) sometimes differed. On the state test, the percentages of students reaching the proficient level increased at grades 4 and 8 in both reading and math. On NAEP, the percentage of…

  1. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Tennessee

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Tennessee's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 8 reading and math. At grade 4, trends on the state test and NAEP differed somewhat. In…

  2. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Maryland

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Maryland's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased at grades 4 and 8 in both reading and math. Average annual gains were larger on the state test than…

  3. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Pennsylvania

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Pennsylvania's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 8 reading and math. Average annual gains were larger on the state test than on NAEP in…

  4. Changing abilities vs. changing tasks: Examining validity degradation with test scores and college performance criteria both assessed longitudinally.

    PubMed

    Dahlke, Jeffrey A; Kostal, Jack W; Sackett, Paul R; Kuncel, Nathan R

    2018-05-03

    We explore potential explanations for validity degradation using a unique predictive validation data set containing up to four consecutive years of high school students' cognitive test scores and four complete years of those students' college grades. This data set permits analyses that disentangle the effects of predictor-score age and timing of criterion measurements on validity degradation. We investigate the extent to which validity degradation is explained by criterion dynamism versus the limited shelf-life of ability scores. We also explore whether validity degradation is attributable to fluctuations in criterion variability over time and/or GPA contamination from individual differences in course-taking patterns. Analyses of multiyear predictor data suggest that changes to the determinants of performance over time have much stronger effects on validity degradation than does the shelf-life of cognitive test scores. The age of predictor scores had only a modest relationship with criterion-related validity when the criterion measurement occasion was held constant. Practical implications and recommendations for future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  5. Speech-discrimination scores modeled as a binomial variable.

    PubMed

    Thornton, A R; Raffin, M J

    1978-09-01

    Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.

  6. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Nebraska

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Nebraska's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the percentages reaching the basic level on NAEP (National Assessment of Educational Progress) increased at grade 4 in both reading and math. At grade 8, however, the percentages…

  7. Equating Scores from Adaptive to Linear Tests

    ERIC Educational Resources Information Center

    van der Linden, Wim J.

    2006-01-01

    Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…

  8. Stability of scores for the Slosson Full-Range Intelligence Test.

    PubMed

    Williams, Thomas O; Eaves, Ronald C; Woods-Groves, Suzanne; Mariano, Gina

    2007-08-01

    The test-retest stability of the Slosson Full-Range Intelligence Test by Algozzine, Eaves, Mann, and Vance was investigated with test scores from a sample of 103 students. With a mean interval of 13.7 mo. and different examiners for each of the two test administrations, the test-retest reliability coefficients for the Full-Range IQ, Verbal Reasoning, Abstract Reasoning, Quantitative Reasoning, and Memory were .93, .85, .80, .80, and .83, respectively. Mean differences from the test-retest scores were not statistically significantly different for any of the scales. Results suggest that Slosson scores are stable over time even when different examiners administer the test.

  9. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Alaska

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Alaska's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in math and grade 8 in reading. In grade 4 reading, the percentage reaching the…

  10. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Massachusetts

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Massachusetts' test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 4 reading and math and grade 8 math. Average annual gains were larger on the state test…

  11. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. California

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles California's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in both reading and math. Average annual gains were larger on the state test…

  12. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Montana

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Montana's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 4 reading and math and grade 8 reading. In grade 8 math, however, the percentage proficient…

  13. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Colorado

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Colorado's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in both reading and math. Average annual gains were generally larger on NAEP than…

  14. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Wisconsin

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Wisconsin's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in math at grades 4 and 8 and in reading at grade 8. In grade 4 reading, the percentage scoring…

  15. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Alabama

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Alabama's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in both reading and math. Average annual gains were generally larger on the state…

  16. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Texas

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Texas' test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in reading at grades 4 and 8 and in math at grade 8. In grade 4 math, however, the percentage scoring…

  17. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Florida

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Florida's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in both reading and math. Average annual gains were generally larger on the state…

  18. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Arizona

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Arizona's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in both reading and math. Average annual gains were generally larger on the state…

  19. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Iowa

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles Iowa's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 4 reading and math and in grade 8 math. In grade 8 reading, the percentage of students reaching…

  20. Modified Balance Error Scoring System (M-BESS) test scores in athletes wearing protective equipment and cleats.

    PubMed

    Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott

    2016-01-01

    Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions.

  1. Modified Balance Error Scoring System (M-BESS) test scores in athletes wearing protective equipment and cleats

    PubMed Central

    Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott

    2016-01-01

    Background Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. Objective To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. Methods This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. Results 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Conclusions Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions. PMID:27900181

  2. THE EFFECTS ON ACHIEVEMENT TEST RESULTS OF VARYING CONDITIONS OF EXPERIMENTAL ATMOSPHERE, NOTICE OF TEST, TEST ADMINISTRATION, AND TEST SCORING.

    ERIC Educational Resources Information Center

    GOODWIN, WILLIAM L.; AND OTHERS

    NULL HYPOTHESES WERE TESTED TO DETERMINE THE DIFFERENTIAL EFFECTS OF (1) EXPERIMENTAL ATMOSPHERE AND ABSENCE OF SAME, (2) NOTICE OF TEST (10 SCHOOL DAYS) AND NO NOTICE (1 SCHOOL DAY), (3) TEACHER ADMINISTRATION AND OUTSIDE ADMINISTRATION OF TESTS, AND (4) TEACHER SCORING AND OUTSIDE SCORING OF TESTS. SIXTH-GRADE CLASSES (N=64), EACH FROM A…

  3. Score Equating and Nominally Parallel Language Tests.

    ERIC Educational Resources Information Center

    Moy, Raymond

    Score equating requires that the forms to be equated are functionally parallel. That is, the two test forms should rank order examinees in a similar fashion. In language proficiency testing situations, this assumption is often put into doubt because of the numerous tests that have been proposed as measures of language proficiency and the…

  4. Reporting Diagnostic Scores in Educational Testing: Temptations, Pitfalls, and Some Solutions

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J.

    2010-01-01

    Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…

  5. The relationship between selected standardized test scores and performance in advanced placement math and science exams: Analyzing the differential effectiveness of scores for course identification and placement

    NASA Astrophysics Data System (ADS)

    Urbina, Josue N.

    There is a national need to increase the STEM-related workforce. Among factors leading towards STEM careers include the number of advanced high school mathematics and science courses students complete. Florida's enrollment patterns in STEM-related Advanced Placement (AP) courses, however, reveal that only a small percentage of students enroll into these classes. Therefore, screening tools are needed to find more students for these courses, who are academically ready, yet have not been identified. The purpose of this study was to investigate the extent to which scores from a national standardized test, Preliminary Scholastic Assessment Test/ National Merit Qualifying Test (PSAT/NMSQT), in conjunction with and compared to a state-mandated standardized test, Florida Comprehensive Assessment Test (FCAT), are related to selected AP exam performance in Seminole County Public Schools. An ex post facto correlational study was conducted using 6,189 student records from the 2010 - 2012 academic years. Multiple regression analyses using simultaneous Full Model testing showed differential moderate to strong relationships between scores in eight of the nine AP courses (i.e., Biology, Environmental Science, Chemistry, Physics B, Physics C Electrical, Physics C Mechanical, Statistics, Calculus AB and BC) examined. For example, the significant unique contribution to overall variance in AP scores was a linear combination of PSAT Math (M), Critical Reading (CR) and FCAT Reading (R) for Biology and Environmental Science. Moderate relationships for Chemistry included a linear combination of PSAT M, W (Writing) and FCAT M; a combination of FCAT M and PSAT M was most significantly associated with Calculus AB performance. These findings have implications for both research and practice. FCAT scores, in conjunction with PSAT scores, can potentially be used for specific STEM-related AP courses, as part of a systematic approach towards AP course identification and placement. For courses with

  6. Towards reporting standards for neuropsychological study results: A proposal to minimize communication errors with standardized qualitative descriptors for normalized test scores.

    PubMed

    Schoenberg, Mike R; Rum, Ruba S

    2017-11-01

    Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of consistency in how neuropsychological scores are communicated. A major limitation in the communication of neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the use of vague terminology. PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus statements for the description and reporting of qualitative terms to communicate neuropsychological test scores was conducted. The review found the use of confusing and overlapping terms to describe various ranges of percentile standardized test scores. In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple) as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: 'very superior', 'superior', 'high average', 'average', 'low average', 'borderline' and 'abnormal/impaired'. A case example illustrates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for neurosurgical planning. The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize communication of standardized neuropsychological test scores. Research are needed to further evaluate neuropsychological communication errors. Conveying the clinical implications of neuropsychological results in a manner that minimizes risk for communication errors is a quintessential component of evidence-based practice. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Neuropsychological test scores, academic performance, and developmental disorders in Spanish-speaking children.

    PubMed

    Rosselli, M; Ardila, A; Bateman, J R; Guzmán, M

    2001-01-01

    Limited information is currently available about performance of Spanish-speaking children on different neuropsychological tests. This study was designed to (a) analyze the effects of age and sex on different neuropsychological test scores of a randomly selected sample of Spanish-speaking children, (b) analyze the value of neuropsychological test scores for predicting school performance, and (c) describe the neuropsychological profile of Spanish-speaking children with learning disabilities (LD). Two hundred ninety (141 boys, 149 girls) 6- to 11-year-old children were selected from a school in Bogotá, Colombia. Three age groups were distinguished: 6- to 7-, 8- to 9-, and 10- to 11-year-olds. Performance was measured utilizing the following neuropsychological tests: Seashore Rhythm Test, Finger Tapping Test (FTT), Grooved Pegboard Test, Children's Category Test (CCT), California Verbal Learning Test-Children's Version (CVLT-C), Benton Visual Retention Test (BVRT), and Bateria Woodcock Psicoeducativa en Español (Woodcock, 1982). Normative scores were calculated. Age effect was significant for most of the test scores. A significant sex effect was observed for 3 test scores. Intercorrelations were performed between neuropsychological test scores and academic areas (science, mathematics, Spanish, social studies, and music). In a post hoc analysis, children presenting very low scores on the reading, writing, and arithmetic achievement scales of the Woodcock battery were identified in the sample, and their neuropsychological test scores were compared with a matched normal group. Finally, a comparison was made between Colombian and American norms.

  8. Score Reporting for the 1991 Medical College Admission Test.

    ERIC Educational Resources Information Center

    Mitchell, Karen J.; Haynes, Robert

    1990-01-01

    Data used in a major review of the system for reporting scores on the Medical College Admission Test (MCAT) are presented and discussed. The data demonstrated the value of the current score-reporting system and led to retention of the 15-point MCAT score scale in 1991. (Author/MSE)

  9. Teacher Greetings Increase College Students' Test Scores

    ERIC Educational Resources Information Center

    Weinstein, Lawrence; Laverghetta, Antonio; Alexander, Ralph; Stewart, Megan

    2009-01-01

    The current study is an extension of a previous investigation dealing with teacher greetings to students. The present investigation used teacher greetings with college students and academic performance (test scores). We report data using university students and in-class test performance. Students in introductory psychology who received teachers'…

  10. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. New Mexico

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles New Mexico's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 4 math and grade 8 reading and math. In grade 4 reading, the percentage basic on NAEP …

  11. State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. North Dakota

    ERIC Educational Resources Information Center

    Center on Education Policy, 2010

    2010-01-01

    This paper profiles North Dakota's test score trends through 2008-09. Between 2005 and 2009, the percentage of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grades 4 and 8 in both reading and math. Average annual gains were larger on the state test…

  12. A prognostic scoring system for arm exercise stress testing.

    PubMed

    Xie, Yan; Xian, Hong; Chandiramani, Pooja; Bainter, Emily; Wan, Leping; Martin, Wade H

    2016-01-01

    Arm exercise stress testing may be an equivalent or better predictor of mortality outcome than pharmacological stress imaging for the ≥50% for patients unable to perform leg exercise. Thus, our objective was to develop an arm exercise ECG stress test scoring system, analogous to the Duke Treadmill Score, for predicting outcome in these individuals. In this retrospective observational cohort study, arm exercise ECG stress tests were performed in 443 consecutive veterans aged 64.1 (11.1) years. (mean (SD)) between 1997 and 2002. From multivariate Cox models, arm exercise scores were developed for prediction of 5-year and 12-year all-cause and cardiovascular mortality and 5-year cardiovascular mortality or myocardial infarction (MI). Arm exercise capacity in resting metabolic equivalents (METs), 1 min heart rate recovery (HRR) and ST segment depression ≥1 mm were the stress test variables independently associated with all-cause and cardiovascular mortality by step-wise Cox analysis (all p<0.01). A score based on the relation HRR (bpm)+7.3×METs-10.5×ST depression (0=no; 1=yes) prognosticated 5-year cardiovascular mortality with a C-statistic of 0.81 before and 0.88 after adjustment for significant demographic and clinical covariates. Arm exercise scores for the other outcome end points yielded C-statistic values of 0.77-0.79 before and 0.82-0.86 after adjustment for significant covariates versus 0.64-0.72 for best fit pharmacological myocardial perfusion imaging models in a cohort of 1730 veterans who were evaluated over the same time period. Arm exercise scores, analogous to the Duke Treadmill Score, have good power for prediction of mortality or MI in patients who cannot perform leg exercise.

  13. Test Operations Procedure (TOP) 03-2-827 Test Procedures for Video Target Scoring Using Calibration Lights

    DTIC Science & Technology

    2016-04-04

    Final 3. DATES COVERED (From - To) 4. TITLE AND SUBTITLE Test Operations Procedure (TOP) 03-2-827 Test Procedures for Video Target Scoring Using...ABSTRACT This Test Operations Procedure (TOP) describes typical equipment and procedures to setup and operate a Video Target Scoring System (VTSS) to...lights. 15. SUBJECT TERMS Video Target Scoring System, VTSS, witness screens, camera, target screen, light pole 16. SECURITY

  14. Semi-Quantitative Scoring of an Immunochromatographic Test for Circulating Filarial Antigen

    PubMed Central

    Chesnais, Cédric B.; Missamou, François; Pion, Sébastien D. S.; Bopda, Jean; Louya, Frédéric; Majewski, Andrew C.; Weil, Gary J.; Boussinesq, Michel

    2013-01-01

    The value of a semi-quantitative scoring of the filarial antigen test (Binax Now Filariasis card test, ICT) results was evaluated during a field survey in the Republic of Congo. One hundred and thirty-four (134) of 774 tests (17.3%) were clearly positive and were scored 1, 2, or 3; and 11 (1.4%) had questionable results. Wuchereria bancrofti microfilariae (mf) were detected in 41 of those 133 individuals with an ICT test score ≥ 1 who also had a night blood smear; none of the 11 individuals with questionable ICT results harbored night mf. Cuzick's test showed a significant trend for higher microfilarial densities in groups with higher ICT scores (P < 0.001). The ICT scores were also significantly correlated with blood mf counts. Because filarial antigen levels provide an indication of adult worm infection intensity, our results suggest that semi-quantitative reading of the ICT may be useful for grading the intensity of filarial infections in individuals and populations. PMID:24019435

  15. Automated Scoring of Short-Answer Reading Items: Implications for Constructs

    ERIC Educational Resources Information Center

    Carr, Nathan T.; Xi, Xiaoming

    2010-01-01

    This article examines how the use of automated scoring procedures for short-answer reading tasks can affect the constructs being assessed. In particular, it highlights ways in which the development of scoring algorithms intended to apply the criteria used by human raters can lead test developers to reexamine and even refine the constructs they…

  16. Critical Thinking: More than Test Scores

    ERIC Educational Resources Information Center

    Smith, Vernon G.; Szymanski, Antonia

    2013-01-01

    This article is for practicing or aspiring school administrators. The demand for excellence in public education has lead to an emphasis on standardized test scores. This article explores the development of a professional enhancement program designed to prepare teachers to teach higher order thinking skills. Higher order thinking is the primary…

  17. The Black-White Test Score Gap.

    ERIC Educational Resources Information Center

    Jencks, Christopher, Ed.; Phillips, Meredith, Ed.

    The 15 chapters of this book address issues related to the continuing test score gap between black and white students. The editors argue against traditional explanations which emphasize differences in economic resources and demographic factors, and they urge that more emphasis be put on psychological and cultural factors. The book suggests studies…

  18. Test Takers and the Validity of Score Interpretations

    ERIC Educational Resources Information Center

    Kopriva, Rebecca J.; Thurlow, Martha L.; Perie, Marianne; Lazarus, Sheryl S.; Clark, Amy

    2016-01-01

    This article argues that test takers are as integral to determining validity of test scores as defining target content and conditioning inferences on test use. A principled sustained attention to how students interact with assessment opportunities is essential, as is a principled sustained evaluation of evidence confirming the validity or calling…

  19. 21 CFR 866.6050 - Ovarian adnexal mass assessment score test system.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Ovarian adnexal mass assessment score test system... immunological Test Systems § 866.6050 Ovarian adnexal mass assessment score test system. (a) Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum or...

  20. ANOVA Analysis of Student Daily Test Scores in Multi-Day Test Periods

    ERIC Educational Resources Information Center

    Mouritsen, Matthew L.; Davis, Jefferson T.; Jones, Steven C.

    2016-01-01

    Instructors are often concerned when giving multiple-day tests because students taking the test later in the exam period may have an advantage over students taking the test early in the exam period due to information leakage. However, exam scores seemed to decline as students took the same test later in a multi-day exam period (Mouritsen and…

  1. Scoring Yes-No Vocabulary Tests: Reaction Time vs. Nonword Approaches

    ERIC Educational Resources Information Center

    Pellicer-Sanchez, Ana; Schmitt, Norbert

    2012-01-01

    Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there…

  2. Increased correlation coefficient between the written test score and tutors' performance test scores after training of tutors for assessment of medical students during problem-based learning course in Malaysia.

    PubMed

    Jaiprakash, Heethal; Min, Aung Ko Ko; Ghosh, Sarmishtha

    2016-03-01

    This paper is aimed at finding if there was a change of correlation between the written test score and tutors' performance test scores in the assessment of medical students during a problem-based learning (PBL) course in Malaysia. This is a cross-sectional observational study, conducted among 264 medical students in two groups from November 2010 to November 2012. The first group's tutors did not receive tutor training; while the second group's tutors were trained in the PBL process. Each group was divided into high, middle and low achievers based on their end-of-semester exam scores. PBL scores were taken which included written test scores and tutors' performance test scores. Pearson correlation coefficient was calculated between the two kinds of scores in each group. The correlation coefficient between the written scores and tutors' scores in group 1 was 0.099 (p<0.001) and for group 2 was 0.305 (p<0.001). The higher correlation coefficient in the group where tutors received the PBL training reinforces the importance of tutor training before their participation in the PBL course.

  3. The Effect of Pretest Exercise on Baseline Computerized Neurocognitive Test Scores.

    PubMed

    Pawlukiewicz, Alec; Yengo-Kahn, Aaron M; Solomon, Gary

    2017-10-01

    Baseline neurocognitive assessment plays a critical role in return-to-play decision making following sport-related concussions. Prior studies have assessed the effect of a variety of modifying factors on neurocognitive baseline test scores. However, relatively little investigation has been conducted regarding the effect of pretest exercise on baseline testing. The aim of our investigation was to determine the effect of pretest exercise on baseline Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores in adolescent and young adult athletes. We hypothesized that athletes undergoing self-reported strenuous exercise within 3 hours of baseline testing would perform more poorly on neurocognitive metrics and would report a greater number of symptoms than those who had not completed such exercise. Cross-sectional study; Level of evidence, 3. The ImPACT records of 18,245 adolescent and young adult athletes were retrospectively analyzed. After application of inclusion and exclusion criteria, participants were dichotomized into groups based on a positive (n = 664) or negative (n = 6609) self-reported history of strenuous exercise within 3 hours of the baseline test. Participants with a positive history of exercise were then randomly matched, based on age, sex, education level, concussion history, and hours of sleep prior to testing, on a 1:2 basis with individuals who had reported no pretest exercise. The baseline ImPACT composite scores of the 2 groups were then compared. Significant differences were observed for the ImPACT composite scores of verbal memory, visual memory, reaction time, and impulse control as well as for the total symptom score. No significant between-group difference was detected for the visual motor composite score. Furthermore, pretest exercise was associated with a significant increase in the overall frequency of invalid test results. Our results suggest a statistically significant difference in ImPACT composite scores between

  4. Observed-Score Equating as a Test Assembly Problem.

    ERIC Educational Resources Information Center

    van der Linden, Wim J.; Luecht, Richard M.

    1998-01-01

    Derives a set of linear conditions of item-response functions that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly. An example illustrates the use of the model for an item pool from the Law School Admissions Test (LSAT). (SLD)

  5. A Review of Scoring Algorithms for Ability and Aptitude Tests.

    ERIC Educational Resources Information Center

    Chevalier, Shirley A.

    In conventional practice, most educators and educational researchers score cognitive tests using a dichotomous right-wrong scoring system. Although simple and straightforward, this method does not take into consideration other factors, such as partial knowledge or guessing tendencies and abilities. This paper discusses alternative scoring models:…

  6. Distinctions between Item Format and Objectivity in Scoring.

    ERIC Educational Resources Information Center

    Terwilliger, James S.

    This paper clarifies important distinctions in item writing and item scoring and considers the implications of these distinctions for developing guidelines related to test construction for training teachers. The terminology used to describe and classify paper and pencil test questions frequently confuses two distinct features of questions:…

  7. Score tests for independence in semiparametric competing risks models.

    PubMed

    Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul

    2009-12-01

    A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.

  8. Interpreting the g loadings of intelligence test composite scores in light of Spearman's law of diminishing returns.

    PubMed

    Reynolds, Matthew R

    2013-03-01

    The linear loadings of intelligence test composite scores on a general factor (g) have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the g loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of this study was to (a) investigate whether the g loadings of composite scores from the Differential Ability Scales (2nd ed.) (DAS-II, C. D. Elliott, 2007a, Differential Ability Scales (2nd ed.). San Antonio, TX: Pearson) were nonlinear and (b) if they were nonlinear, to compare them with linear g loadings to demonstrate how SLODR alters the interpretation of these loadings. Linear and nonlinear confirmatory factor analysis (CFA) models were used to model Nonverbal Reasoning, Verbal Ability, Visual Spatial Ability, Working Memory, and Processing Speed composite scores in four age groups (5-6, 7-8, 9-13, and 14-17) from the DAS-II norming sample. The nonlinear CFA models provided better fit to the data than did the linear models. In support of SLODR, estimates obtained from the nonlinear CFAs indicated that g loadings decreased as g level increased. The nonlinear portion for the nonverbal reasoning loading, however, was not statistically significant across the age groups. Knowledge of general ability level informs composite score interpretation because g is less likely to produce differences, or is measured less, in those scores at higher g levels. One implication is that it may be more important to examine the pattern of specific abilities at higher general ability levels. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  9. Relationships of Declining Test Scores and Grade Inflation.

    ERIC Educational Resources Information Center

    Bellott, Fred K.

    The relationship between declining scores on national standardized tests and grade inflation is explored. Grade inflation refers to the indicated measure of evaluation of student performance having higher placement than is usual based on the performances. Data for this study were taken from the American College Testing (ACT) Program Class Profile…

  10. D.C. Student Test Scores Show Uneven Progress. Data Snapshot

    ERIC Educational Resources Information Center

    DuPre, Mary

    2011-01-01

    Over the past five years, both DC Public Schools (DCPS) and public charter schools (PCS) have seen significant growth in secondary reading and math scores on the state test known as the District of Columbia Comprehensive Assessment System (DC CAS). However, scores have not improved as much at the elementary level. Reading and math scores for DCPS…

  11. Reliability of Total Test Scores When Considered as Ordinal Measurements

    ERIC Educational Resources Information Center

    Biswas, Ajoy Kumar

    2006-01-01

    This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…

  12. Correlation of Simulation Examination to Written Test Scores for Advanced Cardiac Life Support Testing: Prospective Cohort Study.

    PubMed

    Strom, Suzanne L; Anderson, Craig L; Yang, Luanna; Canales, Cecilia; Amin, Alpesh; Lotfipour, Shahram; McCoy, C Eric; Osborn, Megan Boysen; Langdorf, Mark I

    2015-11-01

    Traditional Advanced Cardiac Life Support (ACLS) courses are evaluated using written multiple-choice tests. High-fidelity simulation is a widely used adjunct to didactic content, and has been used in many specialties as a training resource as well as an evaluative tool. There are no data to our knowledge that compare simulation examination scores with written test scores for ACLS courses. To compare and correlate a novel high-fidelity simulation-based evaluation with traditional written testing for senior medical students in an ACLS course. We performed a prospective cohort study to determine the correlation between simulation-based evaluation and traditional written testing in a medical school simulation center. Students were tested on a standard acute coronary syndrome/ventricular fibrillation cardiac arrest scenario. Our primary outcome measure was correlation of exam results for 19 volunteer fourth-year medical students after a 32-hour ACLS-based Resuscitation Boot Camp course. Our secondary outcome was comparison of simulation-based vs. written outcome scores. The composite average score on the written evaluation was substantially higher (93.6%) than the simulation performance score (81.3%, absolute difference 12.3%, 95% CI [10.6-14.0%], p<0.00005). We found a statistically significant moderate correlation between simulation scenario test performance and traditional written testing (Pearson r=0.48, p=0.04), validating the new evaluation method. Simulation-based ACLS evaluation methods correlate with traditional written testing and demonstrate resuscitation knowledge and skills. Simulation may be a more discriminating and challenging testing method, as students scored higher on written evaluation methods compared to simulation.

  13. Between-District Test Score Variation, 2009-2012

    ERIC Educational Resources Information Center

    Fahle, Erin; Reardon, Sean

    2016-01-01

    Describing the variation in test scores between and within school districts is critical for: (1) for policy-related and descriptive work that investigates the sorting of students among districts and the differential effectiveness of those districts; and (2) for methodological work planning future experiments or interventions. Intraclass…

  14. The Persisting Racial Scoring Gap on Graduate and Professional School Admission Tests.

    ERIC Educational Resources Information Center

    Journal of Blacks in Higher Education, 2003

    2003-01-01

    Discusses the racial scoring gap on tests for admission to medical, business, law, and other graduate programs, noting that in the highest-scoring brackets on the Medical College Admission Test (MCAT), the racial gap is even larger. Whites are five times, twelve times, and seven times more likely, respectively, to score higher on the MCAT, Law…

  15. Comparability of IQ Scores on Five Widely Used Intelligence Tests

    ERIC Educational Resources Information Center

    Hieronymus, A. N.; Stroud, James B.

    1969-01-01

    Attempts to fill research gap on testing by obtaining comparisons of deviation scores, at grade levels four, seven, and ten, from the California Test of Mental Maturity, Henmon-Nelson Tests, and Lorge-Thorndike Intelligence tests. Results tabulated. (CJ)

  16. Sex Differences in Cognitive Abilities Test Scores: A UK National Picture

    ERIC Educational Resources Information Center

    Strand, Steve; Deary, Ian J.; Smith, Pauline

    2006-01-01

    Background and aims: There is uncertainty about the extent or even existence of sex differences in the mean and variability of reasoning test scores ( Jensen, 1998; Lynn, 1994, ; Mackintosh, 1996). This paper analyses the Cognitive Abilities Test (CAT) scores of a large and representative sample of UK pupils to determine the extent of any sex…

  17. Implications of Changing Answers on Objective Test Items

    ERIC Educational Resources Information Center

    Mueller, Daniel J.; Wasser, Virginia

    1977-01-01

    Eighteen studies of the effects of changing initial answers to objective test items are reviewed. While students throughout the total test score range tended to gain more points than they lost, higher scoring students gain more than did lower scoring students. Suggestions for further research are made. (Author/JKS)

  18. Do candidate reactions relate to job performance or affect criterion-related validity? A multistudy investigation of relations among reactions, selection test scores, and job performance.

    PubMed

    McCarthy, Julie M; Van Iddekinge, Chad H; Lievens, Filip; Kung, Mei-Chuan; Sinar, Evan F; Campion, Michael A

    2013-09-01

    Considerable evidence suggests that how candidates react to selection procedures can affect their test performance and their attitudes toward the hiring organization (e.g., recommending the firm to others). However, very few studies of candidate reactions have examined one of the outcomes organizations care most about: job performance. We attempt to address this gap by developing and testing a conceptual framework that delineates whether and how candidate reactions might influence job performance. We accomplish this objective using data from 4 studies (total N = 6,480), 6 selection procedures (personality tests, job knowledge tests, cognitive ability tests, work samples, situational judgment tests, and a selection inventory), 5 key candidate reactions (anxiety, motivation, belief in tests, self-efficacy, and procedural justice), 2 contexts (industry and education), 3 continents (North America, South America, and Europe), 2 study designs (predictive and concurrent), and 4 occupational areas (medical, sales, customer service, and technological). Consistent with previous research, candidate reactions were related to test scores, and test scores were related to job performance. Further, there was some evidence that reactions affected performance indirectly through their influence on test scores. Finally, in no cases did candidate reactions affect the prediction of job performance by increasing or decreasing the criterion-related validity of test scores. Implications of these findings and avenues for future research are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved

  19. Teacher Use of Achievement Test Score Data

    ERIC Educational Resources Information Center

    Miller, Steven C.

    2012-01-01

    The Wyoming Department of Education (WDE) has invested time and money developing standardized achievement test score reports designed to give teachers data about each of their students' levels of mastery of particular concepts in order to differentiate their instruction. The purpose of this study was to determine the extent to which eighth-grade…

  20. Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

    ERIC Educational Resources Information Center

    Kim, Seonghoon

    2013-01-01

    With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…

  1. Misidentifying Factors Underlying Singapore's High Test Scores

    ERIC Educational Resources Information Center

    Usiskin, Zalman

    2012-01-01

    Singapore students have scored exceedingly well on international tests in mathematics. In response, there has been a desire in the United States--both at the policy level and at the school level--to emulate Singapore. Because what can be identified most easily about Singapore's school mathematics can be gleaned from curriculum documents from the…

  2. A weighted generalized score statistic for comparison of predictive values of diagnostic tests

    PubMed Central

    Kosinski, Andrzej S.

    2013-01-01

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343

  3. Using Raters from India to Score a Large-Scale Speaking Test

    ERIC Educational Resources Information Center

    Xi, Xiaoming; Mollaun, Pam

    2011-01-01

    We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…

  4. The impact of testing accommodations on MCAT scores: descriptive results.

    PubMed

    Julian, Ellen R; Ingersoll, Deborah J; Etienne, Patricia M; Hilger, Anthony E

    2004-04-01

    Medical College Admission Test (MCAT) examinees with disabilities who receive accommodations receive flagged scores indicating nonstandard administration. This report compares MCAT examinees who received accommodations and their performances with standard examinees. Aggregate history records of all 1994-2000 MCAT examinees were identified as flagged (2,401) or standard (297,880), then further sorted by race/ethnicity (broadly identified as underrepresented minority and non-URM, at the time of testing) and gender. Those with flagged scores were also classified by disability (LD = learning disability, ADHD = attention deficit hyperactivity disorder, LD/ADHD = learning disability and attention deficit hyperactivity disorder, and Other = other disability) and type of accommodation. Mean MCAT scores were calculated for all groups. A group of 866 examinees took the MCAT first as a standard administration and subsequently with accommodations. In a separate analysis, their two sets of scores were compared. Less than 1% of examinees (2,401) had accommodations; of these, 55% were LD, 17% ADHD, 5% LD/ADHD, and 23% Other. Extended time was the most frequently provided accommodation. Mean flagged scores slightly exceeded mean standard scores on all MCAT sections. Examinees who retook the MCAT with accommodations after a standard administration increased their scores by six points, quadrupling the average gain Standard-Standard retest cohort from another study. The small but statistically significant different higher flagged scores may reflect either appropriate compensation or overly generous accommodations. Extended time had a positive impact on the scores of those who retested with this accommodation. The validity the flagged MCAT in predicting success in medical school is not known, and further investigation is underway.

  5. Leveraging Gender Differences to Boost Test Scores

    ERIC Educational Resources Information Center

    Costello, Bill

    2008-01-01

    According to the 2004 National Assessment of Educational Progress, males who have made it through 12 years of school have significantly poorer reading skills than their female peers. In every age group, boys have been scoring lower than girls annually for more than three decades on U.S. Department of Education reading tests. The longer boys are in…

  6. Test Score Stability and the Relationship of Adult Manifest Anxiety Scale-College Version Scores to External Variables among Graduate Students

    ERIC Educational Resources Information Center

    Lowe, Patricia A.; Peyton, Vicki; Reynolds, Cecil R.

    2007-01-01

    A sample of 79 individuals participated in the present study to evaluate the test score stability (8-week test-retest interval) and construct validity of the scores of the Adult Manifest Anxiety Scale-College Version, a new measure used to assess anxiety in college students, for application to graduate-level students. Results of the study…

  7. An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments

    ERIC Educational Resources Information Center

    Dimitrov, Dimiter M.

    2016-01-01

    This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…

  8. Developing Test Score Reports that Work: The Process and Best Practices for Effective Communication

    ERIC Educational Resources Information Center

    Zenisky, April L.; Hambleton, Ronald K.

    2012-01-01

    Test scores matter these days. Test-takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees' information or usability needs, but this is clearly changing…

  9. Testing Students with Special Educational Needs in Large-Scale Assessments – Psychometric Properties of Test Scores and Associations with Test Taking Behavior

    PubMed Central

    Pohl, Steffi; Südkamp, Anna; Hardt, Katinka; Carstensen, Claus H.; Weinert, Sabine

    2016-01-01

    Assessing competencies of students with special educational needs in learning (SEN-L) poses a challenge for large-scale assessments (LSAs). For students with SEN-L, the available competence tests may fail to yield test scores of high psychometric quality, which are—at the same time—measurement invariant to test scores of general education students. We investigated whether we can identify a subgroup of students with SEN-L, for which measurement invariant competence measures of adequate psychometric quality may be obtained with tests available in LSAs. We furthermore investigated whether differences in test-taking behavior may explain dissatisfying psychometric properties and measurement non-invariance of test scores within LSAs. We relied on person fit indices and mixture distribution models to identify students with SEN-L for whom test scores with satisfactory psychometric properties and measurement invariance may be obtained. We also captured differences in test-taking behavior related to guessing and missing responses. As a result we identified a subgroup of students with SEN-L for whom competence scores of adequate psychometric quality that are measurement invariant to those of general education students were obtained. Concerning test taking behavior, there was a small number of students who unsystematically picked response options. Removing these students from the sample slightly improved item fit. Furthermore, two different patterns of missing responses were identified that explain to some extent problems in the assessments of students with SEN-L. PMID:26941665

  10. Automated Essay Scoring: Psychometric Guidelines and Practices

    ERIC Educational Resources Information Center

    Ramineni, Chaitanya; Williamson, David M.

    2013-01-01

    In this paper, we provide an overview of psychometric procedures and guidelines Educational Testing Service (ETS) uses to evaluate automated essay scoring for operational use. We briefly describe the e-rater system, the procedures and criteria used to evaluate e-rater, implications for a range of potential uses of e-rater, and directions for…

  11. Flow and diffusion of high-stakes test scores.

    PubMed

    Marder, M; Bansal, D

    2009-10-13

    We apply visualization and modeling methods for convective and diffusive flows to public school mathematics test scores from Texas. We obtain plots that show the most likely future and past scores of students, the effects of random processes such as guessing, and the rate at which students appear in and disappear from schools. We show that student outcomes depend strongly upon economic class, and identify the grade levels where flows of different groups diverge most strongly. Changing the effectiveness of instruction in one grade naturally leads to strongly nonlinear effects on student outcomes in subsequent grades.

  12. Scoring systems for the Clock Drawing Test: A historical review

    PubMed Central

    Spenciere, Bárbara; Alves, Heloisa; Charchat-Fichman, Helenice

    2017-01-01

    The Clock Drawing Test (CDT) is a simple neuropsychological screening instrument that is well accepted by patients and has solid psychometric properties. Several different CDT scoring methods have been developed, but no consensus has been reached regarding which scoring method is the most accurate. This article reviews the literature on these scoring systems and the changes they have undergone over the years. Historically, different types of scoring systems emerged. Initially, the focus was on screening for dementia, and the methods were both quantitative and semi-quantitative. Later, the need for an early diagnosis called for a scoring system that can detect subtle errors, especially those related to executive function. Therefore, qualitative analyses began to be used for both differential and early diagnoses of dementia. A widely used qualitative method was proposed by Rouleau et al. (1992). Tracing the historical path of these scoring methods is important for developing additional scoring systems and furthering dementia prevention research. PMID:29213488

  13. Effects of Test Media on Different EFL Test-Takers in Writing Scores and in the Cognitive Writing Process

    ERIC Educational Resources Information Center

    Zou, Xiao-Ling; Chen, Yan-Min

    2016-01-01

    The effects of computer and paper test media on EFL test-takers with different computer familiarity in writing scores and in the cognitive writing process have been comprehensively explored from the learners' aspect as well as on the basis of related theories and practice. The results indicate significant differences in test scores among the…

  14. The Effect of Schooling and Ability on Achievement Test Scores. NBER Working Paper Series.

    ERIC Educational Resources Information Center

    Hansen, Karsten; Heckman, James J.; Mullen, Kathleen J.

    This study developed two methods for estimating the effect of schooling on achievement test scores that control for the endogeneity of schooling by postulating that both schooling and test scores are generated by a common unobserved latent ability. The methods were applied to data on schooling and test scores. Estimates from the two methods are in…

  15. Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects.

    PubMed

    Ho, Andrew D; Yu, Carol C

    2015-06-01

    Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.

  16. Principles and Practices of Test Score Equating. Research Report. ETS RR-10-29

    ERIC Educational Resources Information Center

    Dorans, Neil J.; Moses, Tim P.; Eignor, Daniel R.

    2010-01-01

    Score equating is essential for any testing program that continually produces new editions of a test and for which the expectation is that scores from these editions have the same meaning over time. Particularly in testing programs that help make high-stakes decisions, it is extremely important that test equating be done carefully and accurately.…

  17. The Role of Test Scores in Explaining Race and Gender Differences in Wages

    ERIC Educational Resources Information Center

    Blackburn, McKinley L.

    2004-01-01

    Previous research has suggested that skills reflected in test-score performance on tests such as the Armed Forces Qualification Test (AFQT) can account for some of the racial differences in average wages. I use a more complete set of test scores available with the National Longitudinal Survey of Youth 1979 Cohort to reconsider this evidence, and…

  18. Does breastfeeding contribute to the racial gap in reading and math test scores?

    PubMed

    Peters, Kristen E; Huang, Jin; Vaughn, Michael G; Witko, Christopher

    2013-10-01

    The aim of this study was to examine the impact of divergent breastfeeding practices between Caucasian and African American mothers on the lingering achievement test gap between Caucasian and African American children. The Child Development Supplement of the Panel Study of Income Dynamics, beginning in 1997, followed a cohort of 3563 children aged 0-12 years. Reading and math test scores from 2002 for 1928 children were linked with breastfeeding history. Regression analysis was used to examine associations between ever having been breastfed and duration of breastfeeding and test scores, controlling for characteristics of child, mother, and household. African American students scored significantly lower than Caucasian children by 10.6 and 10.9 points on reading and math tests, respectively. After accounting for the impact of having been breastfed during infancy, the racial test gap decreased by 17% for reading scores and 9% for math scores. Study findings indicate that breastfeeding explains 17% and 9% of the observed gaps in reading and math scores, respectively, between African Americans and Caucasians, an effect larger than most recent educational policy interventions. Renewed efforts around policies and clinical practices that promote and remove barriers for African American mothers to breastfeed should be implemented. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. Discrepancies between modified Medical Research Council dyspnea score and COPD assessment test score in patients with COPD

    PubMed Central

    Rhee, Chin Kook; Kim, Jin Woo; Hwang, Yong Il; Lee, Jin Hwa; Jung, Ki-Suck; Lee, Myung Goo; Yoo, Kwang Ha; Lee, Sang Haak; Shin, Kyeong-Cheol; Yoon, Hyoung Kyu

    2015-01-01

    Background and objective According to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines, either a modified Medical Research Council (mMRC) dyspnea score of ≥2 or a chronic obstructive pulmonary disease (COPD) assessment test (CAT) score of ≥10 is considered to represent COPD patients who are more symptomatic. We aimed to identify the ideal CAT score that exhibits minimal discrepancy with the mMRC score. Methods A receiver operating characteristic curve of the CAT score was generated for an mMRC scores of 1 and 2. A concordance analysis was applied to quantify the association between the frequencies of patients categorized into GOLD groups A–D using symptom cutoff points. A κ-coefficient was calculated. Results For an mMRC score of 2, a CAT score of 15 showed the maximum value of Youden’s index with a sensitivity and specificity of 0.70 and 0.66, respectively (area under the receiver operating characteristic curve [AUC] 0.74; 95% confidence interval [CI], 0.70–0.77). For an mMRC score of 1, a CAT score of 10 showed the maximum value of Youden’s index with a sensitivity and specificity of 0.77 and 0.65, respectively (AUC 0.77; 95% CI, 0.72–0.83). The κ value for concordance was highest between an mMRC score of 1 and a CAT score of 10 (0.66), followed by an mMRC score of 2 and a CAT score of 15 (0.56), an mMRC score of 2 and a CAT score of 10 (0.47), and an mMRC score of 1 and a CAT score of 15 (0.43). Conclusion A CAT score of 10 was most concordant with an mMRC score of 1 when classifying patients with COPD into GOLD groups A–D. However, a discrepancy remains between the CAT and mMRC scoring systems. PMID:26316736

  20. An Investigation into the Relationships Between Cloze Test Scores and Informal Reading Inventory Scores of Fifth Grade Pupils.

    ERIC Educational Resources Information Center

    Walter, Richard Barry

    This study investigated the relationship between instructional level scores as determined by a cloze test and instructional level scores as determined by an informal reading inventory (IRI). Fifty male and 50 female subjects were randomly selected from the total fifth grade population of five schools chosen from a total of 22 midwestern elementary…

  1. Accountancy, teaching methods, sex, and American College Test scores.

    PubMed

    Heritage, J; Harper, B S; Harper, J P

    1990-10-01

    This study examines the significance of sex, methodology, academic preparation, and age as related to development of judgmental and problem-solving skills. Sex, American College Test (ACT) Mathematics scores, Composite ACT scores, grades in course work, grade point average (GPA), and age were used in studying the effects of teaching method on 96 students' ability to analyze data in financial statements. Results reflect positively on accounting students compared to the general college population and the women students in particular.

  2. Reduce, Reuse, Recycle: The Longitudinal Value of Local Cut Scores Using State Test Data

    ERIC Educational Resources Information Center

    Nelson, Peter M.; Van Norman, Ethan R.; VanDerHeyden, Amanda

    2017-01-01

    We used existing reading (n = 1,498) and math (n = 2,260) data to evaluate state test scores for screening middle school students. In Phase 1, state test data were used to create a research-derived cut score that was optimal for predicting state test performance the following year. In Phase 2, those cut scores were applied with future cohorts.…

  3. Online pre-race education improves test scores for volunteers at a marathon.

    PubMed

    Maxwell, Shane; Renier, Colleen; Sikka, Robby; Widstrom, Luke; Paulson, William; Christensen, Trent; Olson, David; Nelson, Benjamin

    2017-09-01

    This study examined whether an online course would lead to increased knowledge about the medical issues volunteers encounter during a marathon. Health care professionals who volunteered to provide medical coverage for an annual marathon were eligible for the study. Demographic information about medical volunteers including profession, specialty, education level and number of marathons they had volunteered for was collected. A 15-question test about the most commonly encountered medical issues was created by the authors and administered before and after the volunteers took the online educational course and compared to a pilot study the previous year. Seventy-four subjects completed the pre-test. Those who participated in the pilot study last year (N = 15) had pre-test scores that were an average of 2.4 points higher than those who did not (mean ranks: pilot study = 51.6 vs. non-pilot = 33.9, p = 0.004). Of the 74 subjects who completed the pre-test, 54 also completed the post-test. The overall post-pre mean score difference was 3.8 ± 2.7 (t = 10.5 df = 53 p < 0.001). While subjects with all levels of volunteer experience demonstrated improvement, only change among first time marathon volunteers was significantly different from the others. Subjects reporting all degree/certification levels demonstrated improvement, but no difference in improvement was found between degree/certification levels. In this follow-up to the previous year's pilot study, online education demonstrated a long-term (one-year) increase in test scores. Testing also continued to show short-term improvement in post-course test scores, compared to pre-course test scores. In general, marathon medical volunteers who had no volunteer experience demonstrated greater improvement than those who had prior volunteer experience.

  4. Examining the Validity of GED[R] Tests Scores with Scheduling and Setting Accommodations. GED Testing Service Research Studies, 2004-1

    ERIC Educational Resources Information Center

    George-Ezzelle, Carol E.; Skaggs, Gary

    2004-01-01

    Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…

  5. Do We Really Become Smarter When Our Fluid-Intelligence Test Scores Improve?

    PubMed Central

    Hayes, Taylor R.; Petrov, Alexander A.; Sederberg, Per B.

    2014-01-01

    Recent reports of training-induced gains on fluid intelligence tests have fueled an explosion of interest in cognitive training—now a billion-dollar industry. The interpretation of these results is questionable because score gains can be dominated by factors that play marginal roles in the scores themselves, and because intelligence gain is not the only possible explanation for the observed control-adjusted far transfer across tasks. Here we present novel evidence that the test score gains used to measure the efficacy of cognitive training may reflect strategy refinement instead of intelligence gains. A novel scanpath analysis of eye movement data from 35 participants solving Raven’s Advanced Progressive Matrices on two separate sessions indicated that one-third of the variance of score gains could be attributed to test-taking strategy alone, as revealed by characteristic changes in eye-fixation patterns. When the strategic contaminant was partialled out, the residual score gains were no longer significant. These results are compatible with established theories of skill acquisition suggesting that procedural knowledge tacitly acquired during training can later be utilized at posttest. Our novel method and result both underline a reason to be wary of purported intelligence gains, but also provide a way forward for testing for them in the future. PMID:25395695

  6. Do We Really Become Smarter When Our Fluid-Intelligence Test Scores Improve?

    PubMed

    Hayes, Taylor R; Petrov, Alexander A; Sederberg, Per B

    2015-01-01

    Recent reports of training-induced gains on fluid intelligence tests have fueled an explosion of interest in cognitive training-now a billion-dollar industry. The interpretation of these results is questionable because score gains can be dominated by factors that play marginal roles in the scores themselves, and because intelligence gain is not the only possible explanation for the observed control-adjusted far transfer across tasks. Here we present novel evidence that the test score gains used to measure the efficacy of cognitive training may reflect strategy refinement instead of intelligence gains. A novel scanpath analysis of eye movement data from 35 participants solving Raven's Advanced Progressive Matrices on two separate sessions indicated that one-third of the variance of score gains could be attributed to test-taking strategy alone, as revealed by characteristic changes in eye-fixation patterns. When the strategic contaminant was partialled out, the residual score gains were no longer significant. These results are compatible with established theories of skill acquisition suggesting that procedural knowledge tacitly acquired during training can later be utilized at posttest. Our novel method and result both underline a reason to be wary of purported intelligence gains, but also provide a way forward for testing for them in the future.

  7. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    PubMed

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  8. Validity of GRE General Test scores and TOEFL scores for graduate admission to a technical university in Western Europe

    NASA Astrophysics Data System (ADS)

    Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.

    2018-01-01

    Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the Master's programme grade point average (GGPA) with and without the addition of the undergraduate GPA (UGPA) and the TOEFL score, and of GRE scores for study completion and Master's thesis performance. GRE scores explained 20% of the variation in the GGPA, while additional 7% were explained by the TOEFL score and 3% by the UGPA. Contrary to common belief, the GRE quantitative reasoning score showed only little explanatory power. GRE scores were also weakly related to study progress but not to thesis performance. Nevertheless, GRE and TOEFL scores were found to be sensible admissions instruments. Rigorous methodology was used to obtain highly reliable results.

  9. Proficiency Standards and Cut-Scores for Language Proficiency Tests.

    ERIC Educational Resources Information Center

    Moy, Raymond H.

    1984-01-01

    Discusses the problems associated with "grading on a curve," the approach often used for standard setting on language proficiency tests. Proposes four main steps presented in the setting of a non-arbitrary cut-score. These steps not only establish a proficiency standard checked by external criteria, but also check to see that the test covers the…

  10. Effort Analysis: Individual Score Validation of Achievement Test Data

    ERIC Educational Resources Information Center

    Wise, Steven L.

    2015-01-01

    Whenever the purpose of measurement is to inform an inference about a student's achievement level, it is important that we be able to trust that the student's test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student's test event is not unduly influenced by construct-irrelevant factors…

  11. Student Laptop Use and Scores on Standardized Tests

    ERIC Educational Resources Information Center

    Kposowa, Augustine J.; Valdez, Amanda D.

    2013-01-01

    Objectives: The primary objective of the study was to investigate the relationship between ubiquitous laptop use and academic achievement. It was hypothesized that students with ubiquitous laptops would score on average higher on standardized tests than those without such computers. Methods: Data were obtained from two sources. First, demographic…

  12. The Dynamics of the Evolution of the Black-White Test Score Gap

    ERIC Educational Resources Information Center

    Sohn, Kitae

    2012-01-01

    We apply a quantile version of the Oaxaca-Blinder decomposition to estimate the counterfactual distribution of the test scores of Black students. In the Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999 (ECLS-K), we find that the gap initially appears only at the top of the distribution of test scores. As children age, however,…

  13. The Dental Hygiene Aptitude Tests and the American College Testing Program Tests as Predictors of Scores on the National Board Dental Hygiene Examination.

    ERIC Educational Resources Information Center

    Longenbecker, Sueann; Wood, Peter H.

    1984-01-01

    Scores from the National Board Dental Hygiene Examination (NBDHE) served as the criterion variable in a comparison of the predictive validity of the Dental Hygiene Aptitude Tests (DHAT) and the ACT Assessment tests. The DHAT-Science and Verbal tests combined to produce the highest multiple correlation with NBDHE scores. (Author/DWH)

  14. Comparing Graphical and Verbal Representations of Measurement Error in Test Score Reports

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Zapata-Rivera, Diego; Hegarty, Mary

    2014-01-01

    Research has shown that many educators do not understand the terminology or displays used in test score reports and that measurement error is a particularly challenging concept. We investigated graphical and verbal methods of representing measurement error associated with individual student scores. We created four alternative score reports, each…

  15. Do scores on a tachistoscope test correlate with baseball batting averages?

    PubMed

    Reichow, Alan W; Garchow, Kenneth E; Baird, Richard Y

    2011-05-01

    Millions of dollars are spent each year by individuals seeking to improve their athletic performance. One area of visual training is the use of the tachistoscope, which measures inspection time or visual recognition time. Although the potential of the tachistoscope as a training tool has received some research attention, its use as a means of measurement or predictor of athletic ability in sports has not been explored. The purpose of this pilot study is to assess the potential of the tachistoscope as a measurement instrument by determining if a baseball player's ability to identify a tachistoscopically presented picture of a pitch is correlated with hitting performance as measured by batting average. Using sport-specific slides, 20 subjects-all non-pitching members of the Pacific University Baseball Team-were administered a tachistoscopic test. The test consisted of identifying the type of pitch illustrated in 30 randomly ordered slides depicting a pitcher throwing four different baseball pitches. Each slide was presented for 0.2 sec. The results of the test were compared with the athlete's previous season's batting average. A positive correlation was found between an athlete's ability to correctly identify a picture of a pitch presented tachistoscopically and batting average (r=0.648; P<0.01). These results suggest that a superior ability to recognize pitches presented via tachistoscope may correlate with a higher skill level in batting. Tachistoscopic test scores correlated positively with batting averages. The tachistoscope may be an acceptable tool to help in assessing batting performance. Additional testing with players from different sports, different levels of ability, and different tachistoscopic times should be performed to determine if the tachistoscope is a valid measure of athletic ability. Implications may also be drawn in other areas such as military and police work.

  16. Rank score and permutation testing alternatives for regression quantile estimates

    USGS Publications Warehouse

    Cade, B.S.; Richards, J.D.; Mielke, P.W.

    2006-01-01

    Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application

  17. Contributions of Hamstring Stiffness to Straight-Leg-Raise and Sit-and-Reach Test Scores.

    PubMed

    Miyamoto, Naokazu; Hirata, Kosuke; Kimura, Noriko; Miyamoto-Mikami, Eri

    2018-02-01

    The passive straight-leg-raise (PSLR) and the sit-and-reach (SR) tests have been widely used to assess hamstring extensibility. However, it remains unclear to what extent hamstring stiffness (a measure of material properties) contributes to PSLR and SR test scores. Therefore, we aimed to clarify the relationship between hamstring stiffness and PSLR and SR scores using ultrasound shear wave elastography. Ninety-eight healthy subjects completed the study. Each subject completed PSLR testing, and classic and modified SR testing of the right leg. Muscle shear modulus of the biceps femoris, semitendinosus, and semimembranosus was quantified as an index of muscle stiffness. The relationships between shear modulus of each muscle and PSLR or SR scores were calculated using Pearson's product-moment correlation coefficients. Shear modulus of the semitendinosus and semimembranosus showed negative correlations with the two PSLR and two SR scores (absolute r value≤0.484). Shear modulus of the biceps femoris was significantly correlated with the PSLR score determined by the examiner and the modified SR score (absolute r value≤0.308). The present findings suggest that PSLR and SR test scores are strongly influenced by factors other than hamstring stiffness and therefore might not accurately evaluate hamstring stiffness. © Georg Thieme Verlag KG Stuttgart · New York.

  18. Manual for Scoring the Test of Directed Imagination.

    ERIC Educational Resources Information Center

    Veldman, Donald J.; And Others

    A scoring manual for the Directed Imagination Test, a projective technique wherein the subject is instructed to write four fictional stories (four minutes are allowed for each) about teachers and their experiences, is presented. The manual provides detailed instructions for rating each story by fifteen dimensions relevant to teacher education…

  19. An integrated model of academic self-concept development: Academic self-concept, grades, test scores, and tracking over 6 years.

    PubMed

    Marsh, Herbert W; Pekrun, Reinhard; Murayama, Kou; Arens, A Katrin; Parker, Philip D; Guo, Jiesi; Dicke, Theresa

    2018-02-01

    Our newly proposed integrated academic self-concept model integrates 3 major theories of academic self-concept formation and developmental perspectives into a unified conceptual and methodological framework. Relations among math self-concept (MSC), school grades, test scores, and school-level contextual effects over 6 years, from the end of primary school through the first 5 years of secondary school (a representative sample of 3,370 German students, 42 secondary schools, 50% male, M age at grade 5 = 11.75) support the (1) internal/external frame of reference model: Math school grades had positive effects on MSC, but the effects of German grades were negative; (2) reciprocal effects (longitudinal panel) model: MSC was predictive of and predicted by math test scores and school grades; (3) big-fish-little-pond effect: The effects on MSC were negative for school-average achievement based on 4 indicators (primary school grades in math and German, school-track prior to the start of secondary school, math test scores in the first year of secondary school). Results for all 3 theoretical models were consistent across the 5 secondary school years: This supports the prediction of developmental equilibrium. This integration highlights the robustness of support over the potentially volatile early to middle adolescent period; the interconnectedness and complementarity of 3 ASC models; their counterbalancing strengths and weaknesses; and new theoretical, developmental, and substantive implications at their intersections. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  20. AP Trends: Tests Soar, Scores Slip--Gaps between Groups Spur Equity Concerns

    ERIC Educational Resources Information Center

    Cech, Scott J.

    2008-01-01

    More students are taking Advanced Placement tests, but the proportion of tests receiving what is deemed a passing score has dipped, and the mean score is down for the fourth year in a row. Data released here this week by the New York City-based nonprofit organization that owns the AP brand shows that a greater-than-ever proportion of students…

  1. Generalized likelihood ratios for quantitative diagnostic test scores.

    PubMed

    Tandberg, D; Deely, J J; O'Malley, A J

    1997-11-01

    The reduction of quantitative diagnostic test scores to the dichotomous case is a wasteful and unnecessary simplification in the era of high-speed computing. Physicians could make better use of the information embedded in quantitative test results if modern generalized curve estimation techniques were applied to the likelihood functions of Bayes' theorem. Hand calculations could be completely avoided and computed graphical summaries provided instead. Graphs showing posttest probability of disease as a function of pretest probability with confidence intervals (POD plots) would enhance acceptance of these techniques if they were immediately available at the computer terminal when test results were retrieved. Such constructs would also provide immediate feedback to physicians when a valueless test had been ordered.

  2. Validity of GRE General Test Scores and TOEFL Scores for Graduate Admission to a Technical University in Western Europe

    ERIC Educational Resources Information Center

    Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.

    2018-01-01

    Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the…

  3. The Formalization of Fairness: Issues in Testing for Measurement Invariance Using Subtest Scores

    ERIC Educational Resources Information Center

    Molenaar, Dylan; Borsboom, Denny

    2013-01-01

    Measurement invariance is an important prerequisite for the adequate comparison of group differences in test scores. In psychology, measurement invariance is typically investigated by means of linear factor analyses of subtest scores. These subtest scores typically result from summing the item scores. In this paper, we discuss 4 possible problems…

  4. Estimating Achievement Gaps from Test Scores Reported in Ordinal "Proficiency" Categories

    ERIC Educational Resources Information Center

    Ho, Andrew D.; Reardon, Sean F.

    2012-01-01

    Test scores are commonly reported in a small number of ordered categories. Examples of such reporting include state accountability testing, Advanced Placement tests, and English proficiency tests. This paper introduces and evaluates methods for estimating achievement gaps on a familiar standard-deviation-unit metric using data from these ordered…

  5. Comprehensive Aristotle score: implications for the Norwood procedure.

    PubMed

    Sinzobahamvya, Nicodème; Photiadis, Joachim; Kumpikaite, Daiva; Fink, Christoph; Blaschczok, Hedwig C; Brecher, Anne Marie; Asfour, Boulos

    2006-05-01

    Aristotle score is emerging as a reliable tool to measure surgical performance. We estimated the comprehensive Aristotle score for the Norwood procedure, correlated it with survival, and considered its impact on surgical management of hypoplastic left heart syndrome. Comprehensive Aristotle score was retrospectively calculated for 39 consecutive Norwood procedures performed from 2001 to 2004. Survival was estimated by the Kaplan-Meier method. The Aristotle scores ranged from 14.5 to 23.5 (mean, 19.12 +/- 2.52; median, 19.5). The score was 20 or greater in 44% (17 of 39) of cases. The most frequent patient-adjusted factors were aortic atresia (n = 16), interrupted aortic arch (n = 9), mechanical ventilation to treat cardiorespiratory failure (n = 19) and shock resolved at time of surgery (n = 13). Hospital mortality was 58.8% (10 of 17) in case of score of 20 or more and 9.1% (2 of 22) for score less than 20 (p = 0.0014). From 2003 on, all patients with a score less than 20 survived. Actuarial estimate of survival at 1 year is 56.2% +/- 7.9% and there have been no late deaths after 1 year. One-year survival is much lower (p = 0.001) for patients with scores of 20 or greater (29.4% +/- 11.05%) compared with those whose scores were less than 20 (77.3% +/- 8.9%). This study shows significant correlation of comprehensive Aristotle score with hospital mortality and late survival after Norwood palliation. It suggests that operative survival on the order of 90% may be achieved in patients with comprehensive complexity scores of less than 20. Efforts should be devoted to improve survival of high-risk patients (score > or = 20).

  6. A Seven-Year Follow-Up of Intelligence Test Scores of Foster Grandparents

    ERIC Educational Resources Information Center

    Troll, Lillian E.; And Others

    1976-01-01

    After seven years, a group (N=32) of originally nonemployed poverty-level older people (over 60) now employed as foster grandparents were retested with the WAIS. Three subtest scores showed stability and Digit Span showed a statistically significant drop. Neither age nor initial level of health or WAIS scores was related to test-score changes over…

  7. Explaining the black-white gap in cognitive test scores: Toward a theory of adverse impact.

    PubMed

    Cottrell, Jonathan M; Newman, Daniel A; Roisman, Glenn I

    2015-11-01

    In understanding the causes of adverse impact, a key parameter is the Black-White difference in cognitive test scores. To advance theory on why Black-White cognitive ability/knowledge test score gaps exist, and on how these gaps develop over time, the current article proposes an inductive explanatory model derived from past empirical findings. According to this theoretical model, Black-White group mean differences in cognitive test scores arise from the following racially disparate conditions: family income, maternal education, maternal verbal ability/knowledge, learning materials in the home, parenting factors (maternal sensitivity, maternal warmth and acceptance, and safe physical environment), child birth order, and child birth weight. Results from a 5-wave longitudinal growth model estimated on children in the NICHD Study of Early Child Care and Youth Development from ages 4 through 15 years show significant Black-White cognitive test score gaps throughout early development that did not grow significantly over time (i.e., significant intercept differences, but not slope differences). Importantly, the racially disparate conditions listed above can account for the relation between race and cognitive test scores. We propose a parsimonious 3-Step Model that explains how cognitive test score gaps arise, in which race relates to maternal disadvantage, which in turn relates to parenting factors, which in turn relate to cognitive test scores. This model and results offer to fill a need for theory on the etiology of the Black-White ethnic group gap in cognitive test scores, and attempt to address a missing link in the theory of adverse impact. (c) 2015 APA, all rights reserved).

  8. Simple exercise test score versus cardiac stress test for the prediction of coronary artery disease in patients with type 2 diabetes.

    PubMed

    Pikto-Pietkiewicz, Witold; Przewłocka, Monika; Chybowska, Barbara; Cyciwa, Alona; Pasierski, Tomasz

    2014-01-01

    Type 2 diabetes markedly increases the risk of coronary heart disease (CHD), and screening for CHD is suggested by the guidelines. The aim of the study was to compare the diagnostic usefulness of the simple exercise test score, incorporating the clinical data and cardiac stress test results, with the standard stress test in patients with type 2 diabetes. A total of 62 consecutive patients (aged 65.4 ±8.5 years; 32 men) with type 2 diabetes and clinical symptoms suggesting CHD underwent a stress test followed by coronary angiography. The simple score was calculated for all patients. Significant coronary stenosis was observed in 41 patients (66.1%). Stress test results were positive in 36 patients (58.1%). The mean simple score was high (65.5 ±14.3 points). A positive linear relationship was observed between the score and the prevalence of CHD (R2 = 0.19; P <0.001) as well as its severity (R² = 0.23; P <0.001). The area under the receiver-operating characteristic curve for the simple score was 0.74 (95% confidence interval [CI], 0.62-0.86). At the original cut-off value of 60 points, the score had a similar prognostic value to that of the standard stress test. However, in a multivariate analysis, only the simple score (odds ratio [OR], 1.46; 95% CI, 1.11-1.94; P <0.01 for an increase in the score by 1 point) and male sex (OR, 1.57; 95% CI, 1.24-1.98; P <0.001) remained independent predictors of CHD. In patients with type 2 diabetes, the simple score correlated with the prevalence and severity of CHD. However, the cut-off value of 60 points was inadequate in the population of diabetic patients with high risk of CHD. The simple score used instead of or together with the stress test was a better predictor of CHD than the stress test alone.

  9. A Maturing Global Testing Regime Meets the World Economy: Test Scores and Economic Growth, 1960-2012

    ERIC Educational Resources Information Center

    Kamens, David H.

    2015-01-01

    This article considers the growth of the international testing regime. It discusses sources of growth and empirically examines two related sets of issues: (1) the stability of countries' achievement scores, and (2) the influence of those national scores on subsequent economic development over different time lags. The article suggests that…

  10. Assessment Test Scores of Incoming Students, Fall 2001.

    ERIC Educational Resources Information Center

    Negron, Maggie; Breindel, Matthew

    This assessment of placement test scores in reading, math, and sentence skills from incoming students at College of the Desert (California) shows that students are overwhelmingly underprepared for study at the college. Only 15% of students were prepared in sentence skills, 27% in reading skills, 7% in math skills; only 3% were prepared in all 3…

  11. Using Rasch Measurement to Score, Evaluate, and Improve Examinations in an Anatomy Course

    ERIC Educational Resources Information Center

    Royal, Kenneth D.; Gilliland, Kurt O.; Kernick, Edward T.

    2014-01-01

    Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high-stakes testing arena rely on classical test theory…

  12. Test Score Stability and Construct Validity of the Adult Manifest Anxiety Scale-College Version Scores among College Students: A Brief Report

    ERIC Educational Resources Information Center

    Lowe, Patricia A.; Papanastasiou, Elena C.; DeRuyck, Kimberly A.; Reynolds, Cecil R.

    2005-01-01

    In this study, the authors investigated the temporal stability and construct validity of the Adult Manifest Anxiety Scale-College Version (AMAS-C; C. R. Reynolds, B. O. Richmond, & P. A. Lowe, 2003b) scores. Results indicated that the AMAS-C scores had adequate to excellent test score stability, and evidence supported the construct validity of the…

  13. The Validity of IQ Scores Derived from Readiness Screening Tests

    ERIC Educational Resources Information Center

    Telegdy, Gabriel A.

    1976-01-01

    The Screening Test of Academic Readiness (STAR) and the Peabody Picture Vocabulary Test (PPVT) were administered to 52 kindergarten children to reveal the convergent validity of IQ scores derived from the STAR. The findings raise doubts about the validity of the deviation IQs derived from the STAR. (Author)

  14. Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

    ERIC Educational Resources Information Center

    Kolen, Michael J.; Lee, Won-Chan

    2011-01-01

    This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…

  15. Relationships between Speech Intelligibility and Word Articulation Scores in Children with Hearing Loss

    PubMed Central

    Ertmer, David J.

    2012-01-01

    Purpose This investigation sought to determine whether scores from a commonly used word-based articulation test are closely associated with speech intelligibility in children with hearing loss. If the scores are closely related, articulation testing results might be used to estimate intelligibility. If not, the importance of direct assessment of intelligibility would be reinforced. Methods Forty-four children with hearing losses produced words from the Goldman-Fristoe Test of Articulation-2 and sets of 10 short sentences. Correlation analyses were conducted between scores for seven word-based predictor variables and percent-intelligible scores derived from listener judgments of stimulus sentences. Results Six of seven predictor variables were significantly correlated with percent-intelligible scores. However, regression analysis revealed that no single predictor variable or multi- variable model accounted for more than 25% of the variability in intelligibility scores. Implications The findings confirm the importance of assessing connected speech intelligibility directly. PMID:20220022

  16. The Comparison of Accuracy Scores on the Paper and Pencil Testing vs. Computer-Based Testing

    ERIC Educational Resources Information Center

    Retnawati, Heri

    2015-01-01

    This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…

  17. Pain scores for intravenous cannulation and arterial blood gas test among emergency department patients.

    PubMed

    Ballesteros-Peña, Sendoa; Vallejo-De la Hoz, Gorka; Fernández-Aedo, Irrintzi

    2017-12-23

    To analyse vein catheterisation and blood gas test-related pain among adult patients in the emergency department and to explore pain score-related factors. An observational and multicentre research study was performed. Patients undergoing vein catheterisation or arterial puncture for gas test were included consecutively. After each procedure, patients scored the pain experienced using the NRS-11. 780 vein catheterisations and 101 blood gas tests were analysed. Venipuncture was scored with an average score of 2.8 (95% CI: 2.6-3), and arterial puncture with 3.6 (95%CI 3.1-4). Iatrogenic pain scores were associated with moderate - high difficulty procedures (P<.001); with the choice of the humeral rather than the radial artery (P=.02) in the gas test and correlated to baseline pain in venipunctures (P<.001). Pain scores related to other variables such as sex, place of origin or needle gauge did not present statistically significant differences. Vein catheterisation and blood gas test-related pain can be considered mild to moderately and moderately painful procedures, respectively. The pain score is associated with certain variables such as the difficulty of the procedure, the anatomic area of the puncture or baseline pain. A better understanding of painful effects related to emergency nursing procedures and the factors associated with pain self-perception could help to determine when and how to act to mitigate this undesired effect. Copyright © 2017 Elsevier España, S.L.U. All rights reserved.

  18. Effects of Classroom Ventilation Rate and Temperature on Students' Test Scores.

    PubMed

    Haverinen-Shaughnessy, Ulla; Shaughnessy, Richard J

    2015-01-01

    Using a multilevel approach, we estimated the effects of classroom ventilation rate and temperature on academic achievement. The analysis is based on measurement data from a 70 elementary school district (140 fifth grade classrooms) from Southwestern United States, and student level data (N = 3109) on socioeconomic variables and standardized test scores. There was a statistically significant association between ventilation rates and mathematics scores, and it was stronger when the six classrooms with high ventilation rates that were indicated as outliers were filtered (> 7.1 l/s per person). The association remained significant when prior year test scores were included in the model, resulting in less unexplained variability. Students' mean mathematics scores (average 2286 points) were increased by up to eleven points (0.5%) per each liter per second per person increase in ventilation rate within the range of 0.9-7.1 l/s per person (estimated effect size 74 points). There was an additional increase of 12-13 points per each 1°C decrease in temperature within the observed range of 20-25°C (estimated effect size 67 points). Effects of similar magnitude but higher variability were observed for reading and science scores. In conclusion, maintaining adequate ventilation and thermal comfort in classrooms could significantly improve academic achievement of students.

  19. Bi-Factor MIRT Observed-Score Equating for Mixed-Format Tests

    ERIC Educational Resources Information Center

    Lee, Guemin; Lee, Won-Chan

    2016-01-01

    The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…

  20. Optimal Scoring Methods of Hand-Strength Tests in Patients with Stroke

    ERIC Educational Resources Information Center

    Huang, Sheau-Ling; Hsieh, Ching-Lin; Lin, Jau-Hong; Chen, Hui-Mei

    2011-01-01

    The purpose of this study was to determine the optimal scoring methods for measuring strength of the more-affected hand in patients with stroke by examining the effect of reducing measurement errors. Three hand-strength tests of grip, palmar pinch, and lateral pinch were administered at two sessions in 56 patients with stroke. Five scoring methods…

  1. Score Reporting in Teacher Certification Testing: A Review, Design, and Interview/Focus Group Study

    ERIC Educational Resources Information Center

    Klesch, Heather S.

    2010-01-01

    The reporting of scores on educational tests is at times misunderstood, misinterpreted, and potentially confusing to examinees and other stakeholders who may need to interpret test scores. In reporting test results to examinees, there is a need for clarity in the message communicated. As pressure rises for students to demonstrate performance at a…

  2. Clinical experience of scoring criteria for Familial Hypercholesterolaemia (FH) genetic testing in Wales.

    PubMed

    Haralambos, K; Whatley, S D; Edwards, R; Gingell, R; Townsend, D; Ashfield-Watt, P; Lansberg, P; Datta, D B N; McDowell, I F W

    2015-05-01

    Familial Hypercholesterolaemia (FH) is caused by mutations in genes of the Low Density Lipoprotein (LDL) receptor pathway. A definitive diagnosis of FH can be made by the demonstration of a pathogenic mutation. The Wales FH service has developed scoring criteria to guide selection of patients for DNA testing, for those referred to clinics with hypercholesterolaemia. The criteria are based on a modification of the Dutch Lipid Clinic scoring criteria and utilise a combination of lipid values, physical signs, personal and family history of premature cardiovascular disease. They are intended to provide clinical guidance and enable resources to be targeted in a cost effective manner. 623 patients who presented to lipid clinics across Wales had DNA testing following application of these criteria. The proportion of patients with a pathogenic mutation ranged from 4% in those scoring 5 or less up to 85% in those scoring 15 or more. LDL-cholesterol was the strongest discriminatory factor. Scores gained from physical signs, family history, coronary heart disease, and triglycerides also showed a gradient in mutation pick-up rate according to the score. These criteria provide a useful tool to guide selection of patients for DNA testing when applied by health professionals who have clinical experience of FH. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  3. Increasing Racial Isolation and Test Score Gaps in Mathematics: A 30-Year Perspective

    ERIC Educational Resources Information Center

    Berends, Mark; Penaloza, Roberto V.

    2010-01-01

    Background/Context: Although there has been progress in closing the test score gaps among student groups over past decades, that progress has stalled. Many researchers have speculated why the test score gaps closed between the early 1970s and the early 1990s, but only a few have been able to empirically study how changes in school factors and…

  4. Peer Effects and the Indigenous/Non-Indigenous Early Test-Score Gap in Peru

    ERIC Educational Resources Information Center

    Sakellariou, Chris

    2008-01-01

    This paper assesses the magnitude of the non-indigenous/indigenous test-score gap for third-year and fourth-year primary school pupils in Peru, in relation to the main family, school and peer inputs contributing to the test-score gap using the estimation method of feasible generalized least squares. The article then decomposes the gap into its…

  5. The Uses and Misuses of Test Scores: Technical Assistance Perspective.

    ERIC Educational Resources Information Center

    Echternacht, Gary

    The uses and misuses of standardized test results used for program evaluation as seen by a staff member of an Elementary Secondary Education Act (ESEA) Title I Technical Assistance Center are described. In ESEA Title I, test scores are used to select students for the program. Although federal requirements do not require using standardized test…

  6. Motivating High School Students to Score Proficient on State Tests

    ERIC Educational Resources Information Center

    Brown, Sarah Lee

    2015-01-01

    The researcher interviewed two groups of eleventh grade students, in a rural Appalachian setting, who tended to score low on the state mandated high stakes/low stakes test to discover their efforts on the test, specifically in reading, and to obtain their opinions concerning the effects of a specific incentive or consequence. Before the eleventh…

  7. A Comparison of Standardized Achievement Test Scores on Right and Left Brain Dominant Fourth-Grade Students.

    ERIC Educational Resources Information Center

    Bell, Michael L.; Roubinek, Darrell L.

    1989-01-01

    Compares fourth-graders' subtest scores on the Stanford Achievement Test (SAT), the Iowa Test of Basic Skills (ITBS), and the Metropolitan Achievement Test (MAT). Finds right-brain dominant students scored better on four SAT subtests, and left-brain dominant students scored better on four ITBS subtests and two MAT subtests. (NH)

  8. Generation of GHS Scores from TEST and online sources ...

    EPA Pesticide Factsheets

    Alternatives assessment frameworks such as DfE (Design for the Environment) evaluate chemical alternatives in terms of human health effects, ecotoxicity, and fate. T.E.S.T. (Toxicity Estimation Software Tool) can be utilized to evaluate human health in terms of acute oral rat toxicity, developmental toxicity, endocrine activity, and mutagenicity. It can be used to evaluate ecotoxicity (in terms of acute fathead minnow toxicity) and fate (in terms of bioconcentration factor). It also be used to estimate a variety of key physicochemical properties such as melting point, boiling point, vapor pressure, water solubility, and bioconcentration factor. A web-based version of T.E.S.T. is currently being developed to allow predictions to be made from other web tools. Online data sources such as from NCCT’s Chemistry Dashboard, REACH dossiers, or from ChemHat.org can also be utilized to obtain GHS (Global Harmonization System) scores for comparing alternatives. The purpose of this talk is to show how GHS (Global Harmonization Score) data can be obtained from literature sources and from T.E.S.T. (Toxicity Estimation Software Tool). This data will be used to compare chemical alternatives in the alternatives assessment dashboard (a 2018 CSS product).

  9. High Test Scores: The Wrong Road to National Economic Success

    ERIC Educational Resources Information Center

    Baker, Keith

    2011-01-01

    A widely held view is that good schools are essential to a nation's international economic success and that high test scores on international tests of academic skills and knowledge indicate how good a nation's schools are. The widespread belief that good schools are an important contributor to a nation's economic success in the world is supported…

  10. Commentary: Student Cognition, the Situated Learning Context, and Test Score Interpretation

    ERIC Educational Resources Information Center

    La Marca, Paul M.

    2006-01-01

    Although it is assumed that student cognition contributes to student performance on achievement tests, it may be that current testing models lack the degree of specification necessary to warrant such inferences. With test score interpretations as the referent, the authors in this special issue address the role of student cognition in learning and…

  11. Relationships between spatial activities and scores on the mental rotation test as a function of sex.

    PubMed

    Ginn, Sheryl R; Pickens, Stefanie J

    2005-06-01

    Previous results suggested that female college students' scores on the Mental Rotations Test might be related to their prior experience with spatial tasks. For example, women who played video games scored better on the test than their non-game-playing peers, whereas playing video games was not related to men's scores. The present study examined whether participation in different types of spatial activities would be related to women's performance on the Mental Rotations Test. 31 men and 59 women enrolled at a small, private church-affiliated university and majoring in art or music as well as students who participated in intercollegiate athletics completed the Mental Rotations Test. Women's scores on the Mental Rotations Test benefitted from experience with spatial activities; the more types of experience the women had, the better their scores. Thus women who were athletes, musicians, or artists scored better than those women who had no experience with these activities. The opposite results were found for the men. Efforts are currently underway to assess how length of experience and which types of experience are related to scores.

  12. The value of Bayes' theorem for interpreting abnormal test scores in cognitively healthy and clinical samples.

    PubMed

    Gavett, Brandon E

    2015-03-01

    The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed.

  13. The Influence of Foreign Language Learning during Early Childhood on Standardized Test Scores

    ERIC Educational Resources Information Center

    Shaw, Tommetta

    2010-01-01

    Increasing standardized test scores in reading and math is of high importance to the California Department of Education to meet requirements mandated by the No Child Left Behind (NCLB) act of 2001. More research is needed to understand the best ways to improve tests scores to meet concerns of the NCLB act. The purpose of the study was to evaluate…

  14. More than Just Test Scores

    ERIC Educational Resources Information Center

    Levin, Henry M.

    2012-01-01

    Around the world we hear considerable talk about creating world-class schools. Usually the term refers to schools whose students get very high scores on the international comparisons of student achievement such as PISA or TIMSS. The practice of restricting the meaning of exemplary schools to the narrow criterion of achievement scores is usually…

  15. Predictive effects of teachers and schools on test scores, college attendance, and earnings

    PubMed Central

    Chamberlain, Gary E.

    2013-01-01

    I studied predictive effects of teachers and schools on test scores in fourth through eighth grade and outcomes later in life such as college attendance and earnings. For example, predict the fraction of a classroom attending college at age 20 given the test score for a different classroom in the same school with the same teacher and given the test score for a classroom in the same school with a different teacher. I would like to have predictive effects that condition on averages over many classrooms, with and without the same teacher. I set up a factor model that, under certain assumptions, makes this feasible. Administrative school district data in combination with tax data were used to calculate estimates and do inference. PMID:24101492

  16. Predictive effects of teachers and schools on test scores, college attendance, and earnings.

    PubMed

    Chamberlain, Gary E

    2013-10-22

    I studied predictive effects of teachers and schools on test scores in fourth through eighth grade and outcomes later in life such as college attendance and earnings. For example, predict the fraction of a classroom attending college at age 20 given the test score for a different classroom in the same school with the same teacher and given the test score for a classroom in the same school with a different teacher. I would like to have predictive effects that condition on averages over many classrooms, with and without the same teacher. I set up a factor model that, under certain assumptions, makes this feasible. Administrative school district data in combination with tax data were used to calculate estimates and do inference.

  17. Background Variables, Levels of Aggregation, and Standardized Test Scores

    ERIC Educational Resources Information Center

    Paulson, Sharon E.; Marchant, Gregory J.

    2009-01-01

    This article examines the role of student demographic characteristics in standardized achievement test scores at both the individual level and aggregated at the state, district, school levels. For several data sets, the majority of the variance among states, districts, and schools was related to demographic characteristics. Where these background…

  18. What's in a Teacher Test? Assessing the Relationship between Teacher Test Scores and Student Secondary STEM Achievement. CEDR Working Paper. WP #2016-4

    ERIC Educational Resources Information Center

    Goldhaber, Dan; Gratz, Trevor; Theobald, Roddy

    2016-01-01

    We investigate the predictive validity of teacher credential test scores for student performance in secondary STEM classrooms in Washington state. After replicating earlier findings that teacher basic skills licensure test scores are a modest and statistically significant predictor of student math test score gains in elementary grades, we focus on…

  19. Effects of Classroom Ventilation Rate and Temperature on Students’ Test Scores

    PubMed Central

    2015-01-01

    Using a multilevel approach, we estimated the effects of classroom ventilation rate and temperature on academic achievement. The analysis is based on measurement data from a 70 elementary school district (140 fifth grade classrooms) from Southwestern United States, and student level data (N = 3109) on socioeconomic variables and standardized test scores. There was a statistically significant association between ventilation rates and mathematics scores, and it was stronger when the six classrooms with high ventilation rates that were indicated as outliers were filtered (> 7.1 l/s per person). The association remained significant when prior year test scores were included in the model, resulting in less unexplained variability. Students’ mean mathematics scores (average 2286 points) were increased by up to eleven points (0.5%) per each liter per second per person increase in ventilation rate within the range of 0.9–7.1 l/s per person (estimated effect size 74 points). There was an additional increase of 12–13 points per each 1°C decrease in temperature within the observed range of 20–25°C (estimated effect size 67 points). Effects of similar magnitude but higher variability were observed for reading and science scores. In conclusion, maintaining adequate ventilation and thermal comfort in classrooms could significantly improve academic achievement of students. PMID:26317643

  20. The effects of calculator-based laboratories on standardized test scores

    NASA Astrophysics Data System (ADS)

    Stevens, Charlotte Bethany Rains

    Nationwide, the goal of providing a productive science and math education to our youth in today's educational institutions is centering itself around the technology being utilized in these classrooms. In this age of digital technology, educational software and calculator-based laboratories (CBL) have become significant devices in the teaching of science and math for many states across the United States. Among the technology, the Texas Instruments graphing calculator and Vernier Labpro interface, are among some of the calculator-based laboratories becoming increasingly popular among middle and high school science and math teachers in many school districts across this country. In Tennessee, however, it is reported that this type of technology is not regularly utilized at the student level in most high school science classrooms, especially in the area of Physical Science (Vernier, 2006). This research explored the effect of calculator based laboratory instruction on standardized test scores. The purpose of this study was to determine the effect of traditional teaching methods versus graphing calculator teaching methods on the state mandated End-of-Course (EOC) Physical Science exam based on ability, gender, and ethnicity. The sample included 187 total tenth and eleventh grade physical science students, 101 of which belonged to a control group and 87 of which belonged to the experimental group. Physical Science End-of-Course scores obtained from the Tennessee Department of Education during the spring of 2005 and the spring of 2006 were used to examine the hypotheses. The findings of this research study suggested the type of teaching method, traditional or calculator based, did not have an effect on standardized test scores. However, the students' ability level, as demonstrated on the End-of-Course test, had a significant effect on End-of-Course test scores. This study focused on a limited population of high school physical science students in the middle Tennessee

  1. Can Percentiles Replace Raw Scores in the Statistical Analysis of Test Data?

    ERIC Educational Resources Information Center

    Zimmerman, Donald W.; Zumbo, Bruno D.

    2005-01-01

    Educational and psychological testing textbooks typically warn of the inappropriateness of performing arithmetic operations and statistical analysis on percentiles instead of raw scores. This seems inconsistent with the well-established finding that transforming scores to ranks and using nonparametric methods often improves the validity and power…

  2. Decreasing scoring errors on Wechsler Scale Vocabulary, Comprehension, and Similarities subtests: a preliminary study.

    PubMed

    Linger, Michele L; Ray, Glen E; Zachar, Peter; Underhill, Andrea T; LoBello, Steven G

    2007-10-01

    Studies of graduate students learning to administer the Wechsler scales have generally shown that training is not associated with the development of scoring proficiency. Many studies report on the reduction of aggregated administration and scoring errors, a strategy that does not highlight the reduction of errors on subtests identified as most prone to error. This study evaluated the development of scoring proficiency specifically on the Wechsler (WISC-IV and WAIS-III) Vocabulary, Comprehension, and Similarities subtests during training by comparing a set of 'early test administrations' to 'later test administrations.' Twelve graduate students enrolled in an intelligence-testing course participated in the study. Scoring errors (e.g., incorrect point assignment) were evaluated on the students' actual practice administration test protocols. Errors on all three subtests declined significantly when scoring errors on 'early' sets of Wechsler scales were compared to those made on 'later' sets. However, correcting these subtest scoring errors did not cause significant changes in subtest scaled scores. Implications for clinical instruction and future research are discussed.

  3. Situational Effects May Account for Gain Scores in Cognitive Ability Testing: A Longitudinal SEM Approach

    ERIC Educational Resources Information Center

    Matton, Nadine; Vautier, Stephane; Raufaste, Eric

    2009-01-01

    Mean gain scores for cognitive ability tests between two sessions in a selection setting are now a robust finding, yet not fully understood. Many authors do not attribute such gain scores to an increase in the target abilities. Our approach consists of testing a longitudinal SEM model suitable to this view. We propose to model the scores' changes…

  4. Effects of Targeted Test Preparation on Scores of Two Tests of Oral English as a Second Language

    ERIC Educational Resources Information Center

    Farnsworth, Tim

    2013-01-01

    This study investigated the effect of targeted test preparation, or coaching, on oral English as a second language test scores. The tests in question were the Basic English Skills Test Plus (BEST Plus), a scripted oral interview published by the Center for Applied Linguistics, and the Versant English Test (VET), a computer-administered and…

  5. A knowledge-based theory of rising scores on "culture-free" tests.

    PubMed

    Fox, Mark C; Mitchum, Ainsley L

    2013-08-01

    Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  6. A Latent Class Approach to Estimating Test-Score Reliability

    ERIC Educational Resources Information Center

    van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas

    2011-01-01

    This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…

  7. Experiential Awareness of the Effects of Test Score Reports.

    ERIC Educational Resources Information Center

    Bender, Robert C.

    Because most counselors have experienced a significant amount of success, they often have difficulty understanding the impact of test scores on persons who do not perform well. Counselor educators must develop experiential awareness in an area normally outside the realm of their students. To provide such an experience, 25 counselor trainees took…

  8. What We Lose in Winning the Test Score Race

    ERIC Educational Resources Information Center

    Jorgenson, Olaf

    2012-01-01

    To achieve perpetually better test results each year as mandated by the No Child Left Behind Act (NCLB), teachers in successful schools such as Leroy Anderson Elementary in San Jose, California, will "try anything" to raise scores, as the school's principal stated in an interview with "The San Jose Mercury News." In schools…

  9. Benefits of Coaching on Test Scores Seen as Negligible.

    ERIC Educational Resources Information Center

    Report on Education Research, 1983

    1983-01-01

    THE FOLLOWING IS THE FULL TEXT OF THIS DOCUMENT: A new study by a pair of Harvard University researchers discounts earlier findings that coaching can substantially improve student performance on the Scholastic Aptitude Test (SAT). "There is simply insufficient evidence that large score increases are a result of a coaching program," write…

  10. Structured didactic teaching sessions improve medical student neurology clerkship test scores: a pilot study.

    PubMed

    Menkes, Daniel L; Reed, Mary

    2008-01-01

    To determine the effectiveness of didactic case-based instruction methodology to improve medical student comprehension of common neurological illnesses and neurological emergencies. Neurology department, academic university. 415 third and fourth year medical students performing a required four week neurology clerkship. Raw test scores on a 1 hour, 50-item clinical vignette based examination and open-ended questions in a post-clerkship feedback session. There was a statistically significant improvement in overall test scores (p<0.001). Didactic teaching sessions have a significant positive impact on neurology student clerkship test score performance and perception of their educational experience. Confirmation of these results across multiple specialties in a multi-center trial is warranted.

  11. Estimating Conditional Distributions of Scores on an Alternate Form of a Test. Research Report. ETS RR-15-18

    ERIC Educational Resources Information Center

    Livingston, Samuel A.; Chen, Haiwen H.

    2015-01-01

    Quantitative information about test score reliability can be presented in terms of the distribution of equated scores on an alternate form of the test for test takers with a given score on the form taken. In this paper, we describe a procedure for estimating that distribution, for any specified score on the test form taken, by estimating the joint…

  12. Computerized scoring algorithms for the Autobiographical Memory Test.

    PubMed

    Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip

    2018-02-01

    Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  13. A seven-year follow-up of intelligence test scores of foster grandparents.

    PubMed

    Troll, L E; Saltz, R; Dunin-Markiewicz, A

    1976-09-01

    After 7 years, a group of originally nonemployed poverty-level older people (over 60) who had been employed as foster grandparents were retested with the WAIS. Four WAIS subtests - Vocabulary Similarities, Digit Span, and Block Design - were employed. Of the original group of 39, complete data were available for 28; 18 of these were still working on the project, and the other 10 had dropped out. Dropouts as a group tested lower originally and also showed more deterioration in functional health ratings over time. For the total group of 32 foster grandparents, three subtest scores showed stability over the 7 years. Only Digit Span showed a statistically significant drop. Neither age nor the initial level of health or WAIS scores was related to test-score changes over time.

  14. Robust joint score tests in the application of DNA methylation data analysis.

    PubMed

    Li, Xuan; Fu, Yuejiao; Wang, Xiaogang; Qiu, Weiliang

    2018-05-18

    Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential methylation level. Anh and Wang (2013) proposed a joint score test (AW) to simultaneously detect for differential methylation and differential variability. However, AW's method seems to be quite conservative and has not been fully compared with existing joint tests. We proposed three improved joint score tests, namely iAW.Lev, iAW.BF, and iAW.TM, and have made extensive comparisons with the joint likelihood ratio test (jointLRT), the Kolmogorov-Smirnov (KS) test, and the AW test. Systematic simulation studies showed that: 1) the three improved tests performed better (i.e., having larger power, while keeping nominal Type I error rates) than the other three tests for data with outliers and having different variances between cases and controls; 2) for data from normal distributions, the three improved tests had slightly lower power than jointLRT and AW. The analyses of two Illumina HumanMethylation27 data sets GSE37020 and GSE20080 and one Illumina Infinium MethylationEPIC data set GSE107080 demonstrated that three improved tests had higher true validation rates than those from jointLRT, KS, and AW. The three proposed joint score tests are robust against the violation of normality assumption and presence of outlying observations in comparison with other three existing tests. Among the three proposed tests, iAW.BF seems to be the most robust and effective one for all simulated scenarios and also in real data analyses.

  15. The Relationship between Deductive Reasoning Ability, Test Anxiety, and Standardized Test Scores in a Latino Sample

    ERIC Educational Resources Information Center

    Rich, John D., Jr.; Fullard, William; Overton, Willis

    2011-01-01

    One Hundred and Twelve Latino students from Philadelphia participated in this study, which examined the development of deductive reasoning across adolescence, and the relation of reasoning to test anxiety and standardized test scores. As predicted, 11th and ninth graders demonstrated significantly more advanced reasoning than seventh graders.…

  16. A Diet Score Assessing Norwegian Adolescents’ Adherence to Dietary Recommendations—Development and Test-Retest Reproducibility of the Score

    PubMed Central

    Handeland, Katina; Kjellevold, Marian; Wik Markhus, Maria; Eide Graff, Ingvild; Frøyland, Livar; Lie, Øyvind; Skotheim, Siv; Stormark, Kjell Morten; Dahl, Lisbeth; Øyen, Jannike

    2016-01-01

    Assessment of adolescents’ dietary habits is challenging. Reliable instruments to monitor dietary trends are required to promote healthier behaviours in this group. The purpose of this cross-sectional study was to assess adolescents’ adherence to Norwegian dietary recommendations with a diet score and to report results from, and test-retest reliability of, the score. The diet score involved seven food groups and one physical activity indicator, and was applied to answers from a semi-quantitative food frequency questionnaire (FFQ) administered twice. Reproducibility of the score was assessed with Cohen’s Kappa (κ statistics) at an interval of three months. The setting was eight lower-secondary schools in Hordaland County, Norway, and subjects were adolescents (n = 472) aged 14–15 years and their caregivers. Results showed that the proportion of adolescents consistently classified by the diet score was 87.6% (κ = 0.465). For food groups, proportions ranged from 74.0% to 91.6% (κ = 0.249 to κ = 0.573). Less than 40% of the participants were found to adhere to recommendations for frequencies of eating fruits, vegetables, added sugar, and fish. Highest compliance to recommendations was seen for choosing water as beverage and limit the intake of red meat. The score was associated with parental socioeconomic status. The diet score was found to be reproducible at an acceptable level. Health promoting work targeting adolescents should emphasize to increase the intake of recommended foods to approach nutritional guidelines. PMID:27483312

  17. A Diet Score Assessing Norwegian Adolescents' Adherence to Dietary Recommendations-Development and Test-Retest Reproducibility of the Score.

    PubMed

    Handeland, Katina; Kjellevold, Marian; Wik Markhus, Maria; Eide Graff, Ingvild; Frøyland, Livar; Lie, Øyvind; Skotheim, Siv; Stormark, Kjell Morten; Dahl, Lisbeth; Øyen, Jannike

    2016-07-29

    Assessment of adolescents' dietary habits is challenging. Reliable instruments to monitor dietary trends are required to promote healthier behaviours in this group. The purpose of this cross-sectional study was to assess adolescents' adherence to Norwegian dietary recommendations with a diet score and to report results from, and test-retest reliability of, the score. The diet score involved seven food groups and one physical activity indicator, and was applied to answers from a semi-quantitative food frequency questionnaire (FFQ) administered twice. Reproducibility of the score was assessed with Cohen's Kappa (κ statistics) at an interval of three months. The setting was eight lower-secondary schools in Hordaland County, Norway, and subjects were adolescents (n = 472) aged 14-15 years and their caregivers. Results showed that the proportion of adolescents consistently classified by the diet score was 87.6% (κ = 0.465). For food groups, proportions ranged from 74.0% to 91.6% (κ = 0.249 to κ = 0.573). Less than 40% of the participants were found to adhere to recommendations for frequencies of eating fruits, vegetables, added sugar, and fish. Highest compliance to recommendations was seen for choosing water as beverage and limit the intake of red meat. The score was associated with parental socioeconomic status. The diet score was found to be reproducible at an acceptable level. Health promoting work targeting adolescents should emphasize to increase the intake of recommended foods to approach nutritional guidelines.

  18. A Bad Idea: National Standards Based on Test Scores

    ERIC Educational Resources Information Center

    Baker, Keith

    2010-01-01

    The justification for national standards is that test scores predict a nation's future economic success. There is no evidence that supports this assumption. There is evidence that it is wrong. For more than half a century, reformers have been trying to fix our schools with little success. The obvious conclusion is that something that can't be…

  19. America's Mediocre Test Scores: Education Crisis or Poverty Crisis?

    ERIC Educational Resources Information Center

    Petrilli, Michael J.; Wright, Brandon L.

    2016-01-01

    At a time when the national conversation is focused on lagging upward mobility, it is no surprise that many educators point to poverty as the explanation for mediocre test scores among U.S. students compared to those of students in other countries. If American teachers in struggling U.S. schools taught in Finland, says Finnish educator Pasi…

  20. Using Heteroskedastic Ordered Probit Models to Recover Moments of Continuous Test Score Distributions from Coarsened Data

    ERIC Educational Resources Information Center

    Reardon, Sean F.; Shear, Benjamin R.; Castellano, Katherine E.; Ho, Andrew D.

    2017-01-01

    Test score distributions of schools or demographic groups are often summarized by frequencies of students scoring in a small number of ordered proficiency categories. We show that heteroskedastic ordered probit (HETOP) models can be used to estimate means and standard deviations of multiple groups' test score distributions from such data. Because…

  1. Power and sample size evaluation for the Cochran-Mantel-Haenszel mean score (Wilcoxon rank sum) test and the Cochran-Armitage test for trend.

    PubMed

    Lachin, John M

    2011-11-10

    The power of a chi-square test, and thus the required sample size, are a function of the noncentrality parameter that can be obtained as the limiting expectation of the test statistic under an alternative hypothesis specification. Herein, we apply this principle to derive simple expressions for two tests that are commonly applied to discrete ordinal data. The Wilcoxon rank sum test for the equality of distributions in two groups is algebraically equivalent to the Mann-Whitney test. The Kruskal-Wallis test applies to multiple groups. These tests are equivalent to a Cochran-Mantel-Haenszel mean score test using rank scores for a set of C-discrete categories. Although various authors have assessed the power function of the Wilcoxon and Mann-Whitney tests, herein it is shown that the power of these tests with discrete observations, that is, with tied ranks, is readily provided by the power function of the corresponding Cochran-Mantel-Haenszel mean scores test for two and R > 2 groups. These expressions yield results virtually identical to those derived previously for rank scores and also apply to other score functions. The Cochran-Armitage test for trend assesses whether there is an monotonically increasing or decreasing trend in the proportions with a positive outcome or response over the C-ordered categories of an ordinal independent variable, for example, dose. Herein, it is shown that the power of the test is a function of the slope of the response probabilities over the ordinal scores assigned to the groups that yields simple expressions for the power of the test. Copyright © 2011 John Wiley & Sons, Ltd.

  2. The Weighted Airman Promotion System: Standardizing Test Scores

    DTIC Science & Technology

    2008-01-01

    This document and trademark( s ) contained herein are protected by law as indicated in a notice appearing later in this work. This electronic...SUBTITLE The Weighted Airman Promotion System. Standardizing Test Scores 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR( S ) 5d...PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME( S ) AND ADDRESS(ES) Rand Corporation,PO Box 2138,Santa Monica

  3. Qualitative Dimensions in Scoring the Rey Visual Memory Test of Malingering.

    ERIC Educational Resources Information Center

    Griffin, G. A. Elmer; And Others

    1996-01-01

    A new qualitative scoring system for the Rey Visual Memory Test was tested for its ability to distinguish between malingerers and nonmalingerers. The new system, based on the types of errors made, was able to distinguish between 53 psychiatrically disabled and 64 normal nonmalingerers, and between nonmalingerers and 91 possible malingerers. (SLD)

  4. Aligning Scales of Certification Tests. Research Report. ETS RR-10-07

    ERIC Educational Resources Information Center

    Dorans, Neil J.; Liang, Longjuan; Puhan, Gautam

    2010-01-01

    Scores are the most visible and widely used products of a testing program. The choice of score scale has implications for test specifications, equating, and test reliability and validity, as well as for test interpretation. At the same time, the score scale should be viewed as infrastructure likely to require repair at some point. In this report…

  5. Correlations between the Hand Test Pathology score and Personality Assessment Inventory scales for pain clinic patients.

    PubMed

    George, J M; Wagner, E E

    1995-06-01

    Pearson correlations between the Hand Test Pathology (PATH) score and Personality Assessment Inventory scales produced a cluster of relationships characteristic of an antisocial orientation. Likewise, PATH significantly differentiated between a "P" (Pathology) group flagged by a high Negative Impression score on the inventory, and an "N" (Normal) group of 100 pain patients. It was suggested that the interpretive simplicity of Hand Test scores renders the scores amenable to further correlational studies involving the inventory.

  6. Improving Test Score Reporting: Perspectives from the ETS Score Reporting Conference. Research Report. ETS RR-11-45

    ERIC Educational Resources Information Center

    Zapata-Rivera, Diego, Ed.; Zwick, Rebecca, Ed.

    2011-01-01

    This volume includes 3 papers based on presentations at a workshop on communicating assessment information to particular audiences, held at Educational Testing Service (ETS) on November 4th, 2010, to explore some issues that influence score reports and new advances that contribute to the effectiveness of these reports. Jessica Hullman, Rebecca…

  7. What No Child Left Behind Leaves Behind: The Roles of IQ and Self-Control in Predicting Standardized Achievement Test Scores and Report Card Grades

    PubMed Central

    Duckworth, Angela L.; Quinn, Patrick D.; Tsukayama, Eli

    2013-01-01

    The increasing prominence of standardized testing to assess student learning motivated the current investigation. We propose that standardized achievement test scores assess competencies determined more by intelligence than by self-control, whereas report card grades assess competencies determined more by self-control than by intelligence. In particular, we suggest that intelligence helps students learn and solve problems independent of formal instruction, whereas self-control helps students study, complete homework, and behave positively in the classroom. Two longitudinal, prospective studies of middle school students support predictions from this model. In both samples, IQ predicted changes in standardized achievement test scores over time better than did self-control, whereas self-control predicted changes in report card grades over time better than did IQ. As expected, the effect of self-control on changes in report card grades was mediated in Study 2 by teacher ratings of homework completion and classroom conduct. In a third study, ratings of middle school teachers about the content and purpose of standardized achievement tests and report card grades were consistent with the proposed model. Implications for pedagogy and public policy are discussed. PMID:24072936

  8. Standardized Testing of Special Education Students: A Comparison of Service Type and Test Scores

    ERIC Educational Resources Information Center

    Hogan-Young, Christine

    2013-01-01

    The purpose of this study was to determine if there was a difference in Tennessee Comprehensive Assessment Program Modified Academic Achievement Standards (TCAP MAAS) achievement test scores for special education students who receive their instruction in the resource classroom or in an inclusion classroom. The study involved third, fourth, and…

  9. Validity of Alternative Cut-Off Scores for the Back-Saver Sit and Reach Test

    ERIC Educational Resources Information Center

    Looney, Marilyn A.; Gilbert, Jennie

    2012-01-01

    The purpose of the study was to determine if currently used FITNESSGRAM[R] cut-off scores for the Back Saver Sit and Reach Test had the best criterion-referenced validity evidence for 6-12 year old children. Secondary analyses of an existing data set focused on the passive straight leg raise and Back Saver Sit and Reach Test flexibility scores of…

  10. Interpretation and Utilization of Scores on the Air Force Officer Qualifying Test.

    ERIC Educational Resources Information Center

    Miller, Robert E.

    The report summarizes a large body of data relevant to the proper interpretation and use of aptitude scores on the Air Force Officer Qualifying Test (AFOQT). Included are descriptions of the AFOQT testing program and the test itself. Technical data include an extensive sampling of validation studies covering predictors of success in pilot…

  11. Computer-Adaptive Testing: Implications for Students' Achievement, Motivation, Engagement, and Subjective Test Experience

    ERIC Educational Resources Information Center

    Martin, Andrew J.; Lazendic, Goran

    2018-01-01

    The present study investigated the implications of computer-adaptive testing (operationalized by way of multistage adaptive testing; MAT) and "conventional" fixed order computer testing for various test-relevant outcomes in numeracy, including achievement, test-relevant motivation and engagement, and subjective test experience. It did so…

  12. Stochastic Processes as True-Score Models for Highly Speeded Mental Tests.

    ERIC Educational Resources Information Center

    Moore, William E.

    The previous theoretical development of the Poisson process as a strong model for the true-score theory of mental tests is discussed, and additional theoretical properties of the model from the standpoint of individual examinees are developed. The paper introduces the Erlang process as a family of test theory models and shows in the context of…

  13. A physical function test for use in the intensive care unit: validity, responsiveness, and predictive utility of the physical function ICU test (scored).

    PubMed

    Denehy, Linda; de Morton, Natalie A; Skinner, Elizabeth H; Edbrooke, Lara; Haines, Kimberley; Warrillow, Stephen; Berney, Sue

    2013-12-01

    Several tests have recently been developed to measure changes in patient strength and functional outcomes in the intensive care unit (ICU). The original Physical Function ICU Test (PFIT) demonstrates reliability and sensitivity. The aims of this study were to further develop the original PFIT, to derive an interval score (the PFIT-s), and to test the clinimetric properties of the PFIT-s. A nested cohort study was conducted. One hundred forty-four and 116 participants performed the PFIT at ICU admission and discharge, respectively. Original test components were modified using principal component analysis. Rasch analysis examined the unidimensionality of the PFIT, and an interval score was derived. Correlations tested validity, and multiple regression analyses investigated predictive ability. Responsiveness was assessed using the effect size index (ESI), and the minimal clinically important difference (MCID) was calculated. The shoulder lift component was removed. Unidimensionality of combined admission and discharge PFIT-s scores was confirmed. The PFIT-s displayed moderate convergent validity with the Timed "Up & Go" Test (r=-.60), the Six-Minute Walk Test (r=.41), and the Medical Research Council (MRC) sum score (rho=.49). The ESI of the PFIT-s was 0.82, and the MCID was 1.5 points (interval scale range=0-10). A higher admission PFIT-s score was predictive of: an MRC score of ≥48, increased likelihood of discharge home, reduced likelihood of discharge to inpatient rehabilitation, and reduced acute care hospital length of stay. Scoring of sit-to-stand assistance required is subjective, and cadence cutpoints used may not be generalizable. The PFIT-s is a safe and inexpensive test of physical function with high clinical utility. It is valid, responsive to change, and predictive of key outcomes. It is recommended that the PFIT-s be adopted to test physical function in the ICU.

  14. Univariate and Bivariate Loglinear Models for Discrete Test Score Distributions.

    ERIC Educational Resources Information Center

    Holland, Paul W.; Thayer, Dorothy T.

    2000-01-01

    Applied the theory of exponential families of distributions to the problem of fitting the univariate histograms and discrete bivariate frequency distributions that often arise in the analysis of test scores. Considers efficient computation of the maximum likelihood estimates of the parameters using Newton's Method and computationally efficient…

  15. Allele-sharing models: LOD scores and accurate linkage tests.

    PubMed

    Kong, A; Cox, N J

    1997-11-01

    Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested.

  16. Allele-sharing models: LOD scores and accurate linkage tests.

    PubMed Central

    Kong, A; Cox, N J

    1997-01-01

    Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested. PMID:9345087

  17. Beyond Correlations: Usefulness of High School GPA and Test Scores in Making College Admissions Decisions

    ERIC Educational Resources Information Center

    Sawyer, Richard

    2013-01-01

    Correlational evidence suggests that high school GPA is better than admission test scores in predicting first-year college GPA, although test scores have incremental predictive validity. The usefulness of a selection variable in making admission decisions depends in part on its predictive validity, but also on institutions' selectivity and…

  18. Graduate Students' Administration and Scoring Errors on the Woodcock-Johnson III Tests of Cognitive Abilities

    ERIC Educational Resources Information Center

    Ramos, Erica; Alfonso, Vincent C.; Schermerhorn, Susan M.

    2009-01-01

    The interpretation of cognitive test scores often leads to decisions concerning the diagnosis, educational placement, and types of interventions used for children. Therefore, it is important that practitioners administer and score cognitive tests without error. This study assesses the frequency and types of examiner errors that occur during the…

  19. Evaluating the Stability of Test Score Means for the "TOEIC"® Speaking and Writing Tests. Research Report. ETS RR-17-50

    ERIC Educational Resources Information Center

    Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew

    2017-01-01

    For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…

  20. Effects of handcuffs on neuropsychological testing: Implications for criminal forensic evaluations.

    PubMed

    Biddle, Christine M; Fazio, Rachel L; Dyshniku, Fiona; Denney, Robert L

    2018-01-01

    Neuropsychological evaluations are increasingly performed in forensic contexts, including in criminal settings where security sometimes cannot be compromised to facilitate evaluation according to standardized procedures. Interpretation of nonstandardized assessment results poses significant challenges for the neuropsychologist. Research is limited in regard to the validation of neuropsychological test accommodation and modification practices that deviate from standard test administration; there is no published research regarding the effects of hand restraints upon neuropsychological evaluation results. This study provides preliminary results regarding the impact of restraints on motor functioning and common neuropsychological tests with a motor component. When restrained, performance on nearly all tests utilized was significantly impacted, including Trail Making Test A/B, a coding test, and several tests of motor functioning. Significant performance decline was observed in both raw scores and normative scores. Regression models are also provided in order to help forensic neuropsychologists adjust for the effect of hand restraints on raw scores of these tests, as the hand restraints also resulted in significant differences in normative scores; in the most striking case there was nearly a full standard deviation of discrepancy.

  1. The effect of human immunodeficiency virus type 1 antibody status on military applicant aptitude test scores.

    PubMed

    Arday, D R; Brundage, J F; Gardner, L I; Goldenbaum, M; Wann, F; Wright, S

    1991-06-15

    The authors conducted a population-based study to attempt to estimate the effect of human immunodeficiency virus type 1 (HIV-1) seropositivity on Armed Services Vocational Aptitude Battery test scores in otherwise healthy individuals with early HIV-1 infection. The Armed Services Vocational Aptitude Battery is a 10-test written multiple aptitude battery administered to all civilian applicants for military enlistment prior to serologic screening for HIV-1 antibodies. A total of 975,489 induction testing records containing both Armed Services Vocational Aptitude Battery and HIV-1 results from October 1985 through March 1987 were examined. An analysis data set (n = 7,698) was constructed by choosing five controls for each of the 1,283 HIV-1-positive cases, matched on five-digit ZIP code, and a multiple linear regression analysis was performed to control for demographic and other factors that might influence test scores. Years of education was the strongest predictor of test scores, raising an applicant's score on a composite test nearly 0.16 standard deviation per year. The HIV-1-positive effect on the composite score was -0.09 standard deviation (99% confidence interval -0.17 to -0.02). Separate regressions on each component test within the battery showed HIV-1 effects between -0.39 and +0.06 standard deviation. The two Armed Services Vocational Aptitude Battery component tests felt a priori to be the most sensitive to HIV-1-positive status showed the least decrease with seropositivity. Much of the variability in test scores was not predicted by either HIV-1 serostatus or the demographic and other factors included in the model. There appeared to be little evidence of a strong HIV-1 effect.

  2. Low aerobic fitness and obesity are associated with lower standardized test scores in children.

    PubMed

    Roberts, Christian K; Freed, Benjamin; McCarthy, William J

    2010-05-01

    To investigate whether aerobic fitness and obesity in school children are associated with standardized test performance. Ethnically diverse (n = 1989) 5th, 7th, and 9th graders attending California schools comprised the sample. Aerobic fitness was determined by a 1-mile run/walk test; body mass index (BMI) was obtained from state-mandated measurements. California standardized test scores were obtained from the school district. Students whose mile run/walk times exceeded California Fitnessgram standards or whose BMI exceeded Centers for Disease Control sex- and age-specific body weight standards scored lower on California standardized math, reading, and language tests than students with desirable BMI status or fitness level, even after controlling for parent education among other covariates. Ethnic differences in standardized test scores were consistent with ethnic differences in obesity status and aerobic fitness. BMI-for-age was no longer a significant multivariate predictor when covariates included fitness level. Low aerobic fitness is common among youth and varies among ethnic groups, and aerobic fitness level predicts performance on standardized tests across ethnic groups. More research is needed to uncover the physiological mechanisms by which aerobic fitness may contribute to performance on standardized academic tests.

  3. The Emphasis of Student Test Scores in Teacher Appraisal Systems

    ERIC Educational Resources Information Center

    Smith, William C.; Kubacka, Katarzyna

    2017-01-01

    Over the past 30 years teachers have been held increasingly accountable for the quality of education in their classroom. During this transition, the line between teacher appraisals, traditionally an instrument for continuous formative teacher feedback, and summative teacher evaluations has blurred. Student test scores, as an "objective"…

  4. Rising Stars: High School's Change Process Produces Higher Test Scores.

    ERIC Educational Resources Information Center

    McCown, Claire; Runnebaum, Robert

    2001-01-01

    Presents Bishop Ward High School (Kansas) as a case study that has seen great improvements in standardized testing results by changing its approach. States that realignment of curriculum, adjusting instructional strategies, and accommodating students with special needs are important aspects of raising assessment scores in high schools. (CJW)

  5. Comparing the Effects of Elementary Music and Visual Arts Lessons on Standardized Mathematics Test Scores

    ERIC Educational Resources Information Center

    King, Molly Elizabeth

    2016-01-01

    The purpose of this quantitative, causal-comparative study was to compare the effect elementary music and visual arts lessons had on third through sixth grade standardized mathematics test scores. Inferential statistics were used to compare the differences between test scores of students who took in-school, elementary, music instruction during the…

  6. Many Children Left Behind? Textbooks and Test Scores in Kenya. NBER Working Paper No. 13300

    ERIC Educational Resources Information Center

    Glewwe, Paul; Kremer, Michael; Moulin, Sylvie

    2007-01-01

    A randomized evaluation suggests that a program which provided official textbooks to randomly selected rural Kenyan primary schools did not increase test scores for the average student. In contrast, the previous literature suggests that textbook provision has a large impact on test scores. Disaggregating the results by students' initial academic…

  7. Relationship of Elementary and Secondary School Achievement Test Scores to Later Academic Success.

    ERIC Educational Resources Information Center

    Loyd, Brenda H.; And Others

    1980-01-01

    This study investigated the relationship between achievement test scores on the Iowa Tests of Basic Skills (ITBS) and Iowa Tests of Educational Development (ITED), and high school and college grade point average. Support for the predictive validity of the ITBS and ITED achievement test batteries is provided. (Author/GK)

  8. Time and Performance on the California Critical Thinking Skills Test.

    ERIC Educational Resources Information Center

    Frisby, Craig L.; Traffanstedt, Bobby K.

    2003-01-01

    Investigates the relationship between total scores on the California Critical Thinking Skills Test (CCTST) and the time taken to complete it. Finds that slower test takers obtained significantly higher scores. Discusses implications of these findings for college instruction. (SG)

  9. Trends in Classroom Observation Scores

    ERIC Educational Resources Information Center

    Casabianca, Jodi M.; Lockwood, J. R.; McCaffrey, Daniel F.

    2015-01-01

    Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System-Secondary (CLASS-S) protocol from…

  10. The Impact of Inclusion and Resource Instruction on Standardized Test Scores of Special Education Students

    ERIC Educational Resources Information Center

    Derico, Vontrice L.

    2017-01-01

    The purpose of the proposed quasi-experimental quantitative study was to determine if students who were taught in the inclusive setting yielded higher standardized test scores compared to students who were taught in the resource setting. The researcher analyzed the standardized test scores, in the areas of Language Arts, Reading, and Mathematics…

  11. STABILITY OF ACADEMIC APTITUDE AND READING TEST SCORES OF MOBILE AND NON-MOBILE DISADVANTAGED CHILDREN.

    ERIC Educational Resources Information Center

    JUSTMAN, JOSEPH

    CHANGES IN ACADEMIC APTITUDE AND ACHIEVEMENT TEST SCORES OF PUPILS ATTENDING PUBLIC SCHOOLS IN DISADVANTAGED AREAS IN NEW YORK CITY WERE INVESTIGATED. AN ATTEMPT WAS MADE TO DETERMINE WHETHER VARYING DEGREES OF MOBILITY WERE ASSOCIATED WITH VARIATION IN CHANGES IN TEST SCORES. THE CUMULATIVE RECORD CARDS OF SIXTH-GRADE PUPILS WERE EXAMINED TO…

  12. Kindergarten Black-White Test Score Gaps: Replicating and Updating Previous Findings with New National Data

    ERIC Educational Resources Information Center

    Quinn, David

    2014-01-01

    A substantial body of evidence has shown large academic test score gaps between black and white students in early childhood. These gaps remain, and probably grow, as students progress through school. Many researchers have sought to explain these persistent test score gaps, and particularly, to understand the role of students' socio-economic status…

  13. The Influence of an NCLB Accountability Plan on the Distribution of Student Test Score Gains

    ERIC Educational Resources Information Center

    Springer, Matthew G.

    2008-01-01

    Previous research on the effect of accountability programs on the distribution of student test score gains is decidedly mixed. This study examines the issue by estimating an educational production function in which test score gains are a function of the incentives schools have to focus instruction on below-proficient students. NCLB's threat of…

  14. Test and Score Data Summary for TOEFL[R] Internet-Based and Paper-Based Tests. January 2008-December 2008 Test Data

    ERIC Educational Resources Information Center

    Educational Testing Service, 2008

    2008-01-01

    The Test of English as a Foreign Language[TM], better known as TOEFL[R], is designed to measure the English-language proficiency of people whose native language is not English. TOEFL scores are accepted by more than 6,000 colleges, universities, and licensing agencies in 130 countries. The test is also used by governments, and scholarship and…

  15. Use of Standardized Test Scores to Predict Success in a Computer Applications Course

    ERIC Educational Resources Information Center

    Harris, Robert V.; King, Stephanie B.

    2016-01-01

    The purpose of this study was to see if a relationship existed between American College Testing (ACT) scores (i.e., English, reading, mathematics, science reasoning, and composite) and student success in a computer applications course at a Mississippi community college. The study showed that while the ACT scores were excellent predictors of…

  16. A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

    ERIC Educational Resources Information Center

    Lee, Guemin; Park, In-Yong

    2012-01-01

    Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

  17. How Changes in Families and Schools Are Related to Trends in Black-White Test Scores

    ERIC Educational Resources Information Center

    Berends, Mark; Lucas, Samuel R.; Penaloza, Roberto V.

    2008-01-01

    Through several decades of research, a great deal has been written about trends in black-white test scores and the factors that may explain the gaps in different subject areas. Only a few studies have examined the changing relationships between gaps in students' test scores and family and school measures in nationally representative data over…

  18. Clock Drawing Test and the diagnosis of amnestic mild cognitive impairment: can more detailed scoring systems do the work?

    PubMed

    Rubínová, Eva; Nikolai, Tomáš; Marková, Hana; Siffelová, Kamila; Laczó, Jan; Hort, Jakub; Vyhnálek, Martin

    2014-01-01

    The Clock Drawing Test is a frequently used cognitive screening test with several scoring systems in elderly populations. We compare simple and complex scoring systems and evaluate the usefulness of the combination of the Clock Drawing Test with the Mini-Mental State Examination to detect patients with mild cognitive impairment. Patients with amnestic mild cognitive impairment (n = 48) and age- and education-matched controls (n = 48) underwent neuropsychological examinations, including the Clock Drawing Test and the Mini-Mental State Examination. Clock drawings were scored by three blinded raters using one simple (6-point scale) and two complex (17- and 18-point scales) systems. The sensitivity and specificity of these scoring systems used alone and in combination with the Mini-Mental State Examination were determined. Complex scoring systems, but not the simple scoring system, were significant predictors of the amnestic mild cognitive impairment diagnosis in logistic regression analysis. At equal levels of sensitivity (87.5%), the Mini-Mental State Examination showed higher specificity (31.3%, compared with 12.5% for the 17-point Clock Drawing Test scoring scale). The combination of Clock Drawing Test and Mini-Mental State Examination scores increased the area under the curve (0.72; p < .001) and increased specificity (43.8%), but did not increase sensitivity, which remained high (85.4%). A simple 6-point scoring system for the Clock Drawing Test did not differentiate between healthy elderly and patients with amnestic mild cognitive impairment in our sample. Complex scoring systems were slightly more efficient, yet still were characterized by high rates of false-positive results. We found psychometric improvement using combined scores from the Mini-Mental State Examination and the Clock Drawing Test when complex scoring systems were used. The results of this study support the benefit of using combined scores from simple methods.

  19. The Relationship between Academic Averages of Primary School Science and Technology Class and Test Sub-Test Scores of Placement Test of Science

    ERIC Educational Resources Information Center

    Guzeller, Cem Oktay

    2012-01-01

    In this research, the relationship between written exam scores of science and technology class of 6th, 7th, and 8th grades, project, participation in class activities and performance work, year-end academic success point averages and sub-test raw scores of LDT science of 6th, 7th and 8th grades. Academic success point averages were used as…

  20. Racial Differences in Mathematics Test Scores for Advanced Mathematics Students

    ERIC Educational Resources Information Center

    Minor, Elizabeth Covay

    2016-01-01

    Research on achievement gaps has found that achievement gaps are larger for students who take advanced mathematics courses compared to students who do not. Focusing on the advanced mathematics student achievement gap, this study found that African American advanced mathematics students have significantly lower test scores and are less likely to be…

  1. Commentary on "Validating the Interpretations and Uses of Test Scores"

    ERIC Educational Resources Information Center

    Brennan, Robert L.

    2013-01-01

    Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…

  2. Using Test Scores from Students with Disabilities in Teacher Evaluation

    ERIC Educational Resources Information Center

    Buzick, Heather M.; Jones, Nathan D.

    2015-01-01

    Much of the recent focus of educational policymakers has been on improving the measurement of teacher effectiveness. Linking student growth to teacher effects has been a large part of reform efforts. To date, neither researchers nor practitioners have arrived at a consensus on how to treat test scores from students with disabilities in…

  3. Piloting a Polychotomous Partial-Credit Scoring Procedure in a Multiple-Choice Test

    ERIC Educational Resources Information Center

    Tsopanoglou, Antonios; Ypsilandis, George S.; Mouti, Anna

    2014-01-01

    Multiple-choice (MC) tests are frequently used to measure language competence because they are quick, economical and straightforward to score. While degrees of correctness have been investigated for partially correct responses in combined-response MC tests, degrees of incorrectness in distractors and the role they play in determining the…

  4. What's in a Teacher Test? Assessing the Relationship between Teacher Licensure Test Scores and Student STEM Achievement and Course-Taking. Working Paper 158

    ERIC Educational Resources Information Center

    Goldhaber, Dan; Gratz, Trevor; Theobald, Roddy

    2016-01-01

    We investigate the relationship between teacher licensure test scores and student test achievement and high school course-taking. We focus on three subject/grade combinations--middle school math, ninth-grade algebra and geometry, and ninth-grade biology--and find evidence that a teacher's basic skills test scores are modestly predictive of student…

  5. The Bender Gestalt Test with the Human Figure Drawing Test for Young School Children. A Manual for Use with the Koppitz Scoring System.

    ERIC Educational Resources Information Center

    Koppitz, Elizabeth Munsterberg

    Presented is a manual for scoring the Bender Gestalt Test and the Human Figure Drawing Test for screening and diagnostic uses with emotionally disturbed, brain damaged, or perceptually handicapped 5- to 11-year-old children. Given are suggestions for administering and scoring the Bender test which examines distortion of shape, rotation,…

  6. A pretest prognostic score to assess patients undergoing exercise or pharmacological stress testing.

    PubMed

    Morise, Anthony; Evans, Matthew; Jalisi, Farrukh; Shetty, Rajendra; Stauffer, Marc

    2007-02-01

    A previously developed pretest score was validated to stratify patients presenting for exercise testing with suspected coronary disease according to the presence of angiographic coronary disease. Our goal was to determine how well this pretest score risk stratified patients undergoing pharmacological and exercise stress tests concerning prognostic endpoints. Retrospective cohort analysis. University hospital stress laboratory. 7452 unselected ambulatory patients with symptoms of suspected coronary disease undergoing stress testing between 1995 and 2004. All-cause death, cardiac death and non-fatal myocardial infarction. The rate of all-cause death was 5.5% (CI 5.0 to 6.1) with 4.3 (SD 2.4) years of follow-up (Exercise 2.8% (CI 2.3 to 3.2) v Pharmacological group 11.9% (CI 10.5 to 13.3); p<0.001). The rate of cardiac death/myocardial infarction was 2.6% (CI 2.2 to 3.0) (Exercise 1.4% (CI 1.1 to 1.8) v Pharmacological group 5.3% (CI 4.3 to 6.2); p<0.001). In both groups, stratification by pretest score was significant for all-cause death and the combined endpoint. However, stratification was more effective in the pharmacological group using the combined endpoint rather than all-cause death. Pharmacological stress patients in intermediate and high risk groups were at higher risk than their respective exercise test cohorts. Referral for pharmacological stress testing was found to be an independent predictor of time to death (2.7 (CI 2.0 to 3.6); p<0.001). A pretest score previously validated to stratify according to angiographic outcomes, effectively risk stratified pharmacological and exercise stress patients according to the combined endpoint of cardiac death/myocardial infarction.

  7. TOEFL iBT Speaking Test Scores as Indicators of Oral Communicative Language Proficiency

    ERIC Educational Resources Information Center

    Bridgeman, Brent; Powers, Donald; Stone, Elizabeth; Mollaun, Pamela

    2012-01-01

    Scores assigned by trained raters and by an automated scoring system (SpeechRater[TM]) on the speaking section of the TOEFL iBT[TM] were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language…

  8. Association between the gait pattern characteristics of older people and their two-step test scores.

    PubMed

    Kobayashi, Yoshiyuki; Ogata, Toru

    2018-04-27

    The Two-Step test is one of three official tests authorized by the Japanese Orthopedic Association to evaluate the risk of locomotive syndrome (a condition of reduced mobility caused by an impairment of the locomotive organs). It has been reported that the Two-Step test score has a good correlation with one's walking ability; however, its association with the gait pattern of older people during normal walking is still unknown. Therefore, this study aims to clarify the associations between the gait patterns of older people observed during normal walking and their Two-Step test scores. We analyzed the whole waveforms obtained from the lower-extremity joint angles and joint moments of 26 older people in various stages of locomotive syndrome using principal component analysis (PCA). The PCA was conducted using a 260 × 2424 input matrix constructed from the participants' time-normalized pelvic and right-lower-limb-joint angles along three axes (ten trials of 26 participants, 101 time points, 4 angles, 3 axes, and 2 variable types per trial). The Pearson product-moment correlation coefficient between the scores of the principal component vectors (PCVs) and the scores of the Two-Step test revealed that only one PCV (PCV 2) among the 61 obtained relevant PCVs is significantly related to the score of the Two-Step test. We therefore concluded that the joint angles and joint moments related to PCV 2-ankle plantar-flexion, ankle plantar-flexor moments during the late stance phase, ranges of motion and moments on the hip, knee, and ankle joints in the sagittal plane during the entire stance phase-are the motions associated with the Two-Step test.

  9. Mixed handedness and achievement test scores of middle school boys.

    PubMed

    Sarma, P S B

    2008-10-01

    The purpose of the study was to replicate findings of an earlier study of fourth grade boys manifesting mixed handedness with a sample. Among 32 mixed-handed boys in Grades 6 to 8, the right-handed writer, left-handed thrower group obtained low spelling scores (Normal Curve Equivalent Scores) on the California Achievement Test significantly more frequently than the left-handed writer, right-handed thrower group. These findings are consistent with data for Grade 4 boys in the earlier study. Findings strengthen the hypotheses that mixed handedness is not a unitary neuropsychological entity and that boys who write with the right hand and throw with the left hand might be at risk for certain academic deficits.

  10. Validity and reliability of Abbreviated Mental Test Score (AMTS) among older Iranian.

    PubMed

    Foroughan, Mahshid; Wahlund, Lars-Olof; Jafari, Zahra; Rahgozar, Mehdi; Farahani, Ida G; Rashedi, Vahid

    2017-11-01

    Cognitive impairment is common among older people and is associated with increased morbidity and mortality. The main aim of this study was to evaluate the validity of the Persian version of the Abbreviated Mental Test Score (AMTS) as a screening tool for dementia. Data were obtained from a cross-sectional study. One hundred and one older adults who were members of Iranian Alzheimer Association and 101 of their siblings were entered into this study by convenient sampling. The Diagnostic and Statistical Manual of Mental Disorders, 4th edition, criteria for diagnosing dementia and the Mini-Mental State Examination were used as the study tools. The gathered data were analyzed by the Mann-Whitney U-test, the Kruskal-Wallis test, Spearman's rank correlation coefficient, and the receiver-operating characteristic. The AMTS could successfully differentiate the dementia group from the non-dementia group. Scores were significantly correlated with Diagnostic and Statistical Manual of Mental Disorders diagnosis for dementia and Mini-Mental State Examination scores (P < 0.001). Educational level (P < 0.001) and male sex (P = 0.015) were positively associated with AMTS, whereas (P < 0.001) was negatively associated with AMTS. Total Cronbach's α coefficient was 0.90. The scores 6 and 7 showed the optimum balance between sensitivity (99% and 94%, respectively) and specificity (85% and 86%, respectively). The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults and can be used for dementia screening in Iran. © 2017 Japanese Psychogeriatric Society.

  11. A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

    PubMed

    Bersabé, Rosa; Rivas, Teresa

    2010-05-01

    The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.

  12. Genetic study of multimodal imaging Alzheimer's disease progression score implicates novel loci.

    PubMed

    Scelsi, Marzia A; Khan, Raiyan R; Lorenzi, Marco; Christopher, Leigh; Greicius, Michael D; Schott, Jonathan M; Ourselin, Sebastien; Altmann, Andre

    2018-05-30

    Identifying genetic risk factors underpinning different aspects of Alzheimer's disease has the potential to provide important insights into pathogenesis. Moving away from simple case-control definitions, there is considerable interest in using quantitative endophenotypes, such as those derived from imaging as outcome measures. Previous genome-wide association studies of imaging-derived biomarkers in sporadic late-onset Alzheimer's disease focused only on phenotypes derived from single imaging modalities. In contrast, we computed a novel multi-modal neuroimaging phenotype comprising cortical amyloid burden and bilateral hippocampal volume. Both imaging biomarkers were used as input to a disease progression modelling algorithm, which estimates the biomarkers' long-term evolution curves from population-based longitudinal data. Among other parameters, the algorithm computes the shift in time required to optimally align a subjects' biomarker trajectories with these population curves. This time shift serves as a disease progression score and it was used as a quantitative trait in a discovery genome-wide association study with n = 944 subjects from the Alzheimer's Disease Neuroimaging Initiative database diagnosed as Alzheimer's disease, mild cognitive impairment or healthy at the time of imaging. We identified a genome-wide significant locus implicating LCORL (rs6850306, chromosome 4; P = 1.03 × 10-8). The top variant rs6850306 was found to act as an expression quantitative trait locus for LCORL in brain tissue. The clinical role of rs6850306 in conversion from healthy ageing to mild cognitive impairment or Alzheimer's disease was further validated in an independent cohort comprising healthy, older subjects from the National Alzheimer's Coordinating Center database. Specifically, possession of a minor allele at rs6850306 was protective against conversion from mild cognitive impairment to Alzheimer's disease in the National Alzheimer's Coordinating Center cohort (hazard

  13. School accountability and the black-white test score gap.

    PubMed

    Gaddis, S Michael; Lauen, Douglas Lee

    2014-03-01

    Since at least the 1960s, researchers have closely examined the respective roles of families, neighborhoods, and schools in producing the black-white achievement gap. Although many researchers minimize the ability of schools to eliminate achievement gaps, the No Child Left Behind Act (NCLB) increased pressure on schools to do so by 2014. In this study, we examine the effects of NCLB's subgroup-specific accountability pressure on changes in black-white math and reading test score gaps using a school-level panel dataset on all North Carolina public elementary and middle schools between 2001 and 2009. Using difference-in-difference models with school fixed effects, we find that accountability pressure reduces black-white achievement gaps by raising mean black achievement without harming mean white achievement. We find no differential effects of accountability pressure based on the racial composition of schools, but schools with more affluent populations are the most successful at reducing the black-white math achievement gap. Thus, our findings suggest that school-based interventions have the potential to close test score gaps, but differences in school composition and resources play a significant role in the ability of schools to reduce racial inequality. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Report: States See Test-Score Gains

    ERIC Educational Resources Information Center

    Viadero, Debra

    2004-01-01

    This article discusses a report from Education Trust, a Washington-based research and advocacy group. The report says almost half the states have seen rising math scores on their state exams for elementary school pupils since the federal No Child Left Behind law was enacted. It also states that reading scores have improved among 4th and 5th…

  15. A "Nonbiased Assessment" of Intelligence Testing.

    ERIC Educational Resources Information Center

    Vandivier, Phillip L.; Vandivier, Stella Sue

    1979-01-01

    Arguments and prejudices against the use of individually administered intelligence tests are considered and compared with possible values that may be obtained. Cautions about test score interpretation are discussed. Implications of abolishing intelligence testing are considered and recommendations for effective testing policies are presented. (CTM)

  16. School Choice in Suburbia: Test Scores, Race, and Housing Markets

    ERIC Educational Resources Information Center

    Dougherty, Jack; Harelson, Jeffrey; Maloney, Laura; Murphy, Drew; Smith, Russell; Snow, Michael; Zannoni, Diane

    2009-01-01

    Home buyers exercise school choice when shopping for a private residence due to its location in a public school district or attendance area. In this quantitative study of one Connecticut suburban district, we measure the effect of elementary school test scores and racial composition on home buyers' willingness to purchase single-family homes over…

  17. The Effect of Mobility on Texas Assessment of Knowledge and Skills Test Scores

    ERIC Educational Resources Information Center

    Alvarez, Ray

    2006-01-01

    This research studies the effects of mobility on the high-stakes test scores of a Title I South Central Texas school district. The study involved 10, 5th-grade elementary feeder school populations graduating to the 6th grade in 3 middle schools. The researcher compared the 1st administration scores of the Texas Assessment of Knowledge and Skills…

  18. Effects of correcting for prematurity on cognitive test scores in childhood.

    PubMed

    Wilson-Ching, Michelle; Pascoe, Leona; Doyle, Lex W; Anderson, Peter J

    2014-03-01

    The American Academy of Pediatrics recommends that test scores should be corrected for prematurity up to 3 years of age, but this practice varies greatly in both clinical and research settings. The aim of this study was to contrast the effects of using chronological age and those of using corrected age on measures of cognitive outcome across childhood. A theoretical model was constructed using norms from the Bayley Scales of Infant and Toddler Development, Third Edition; the Wechsler Preschool and Primary Scale of Intelligence, Third Edition Australian; and the Wechsler Intelligence Scales for Children, Fourth Edition Australian. Baseline scores representing different levels of functioning (70, below average; 85, borderline; and 100, average) were recalculated using the normative data for ages 6 months to 16 years to account for 1, 2, 3 and 4 months of prematurity. The model created depicted the difference in standardised scores between chronological and corrected age. Compared with scores corrected for prematurity, the absolute reduction in scores using chronological age was greater for increasing degree of prematurity, younger ages at assessment and higher baseline scores and was substantial even beyond 3 years of age. However, the pattern was erratic, with considerable fluctuation evident across different ages and baseline scores. Chronological age results in a lowering of scores at all ages for preterm-born subjects that is greater in the first few years and in those born at earlier gestational ages. Whether or not to correct for prematurity depends upon the context of the assessment. © 2014 The Authors. Journal of Paediatrics and Child Health © 2014 Paediatrics and Child Health Division (Royal Australasian College of Physicians).

  19. How Parents Can Help Kids Improve Test Scores: Taking the Stakes out of Literacy Testing

    ERIC Educational Resources Information Center

    Schneider, Steven

    2006-01-01

    In order to meet the goals of No Child Left Behind, standardized testing is preeminent as the sole indicator determining whether states all across America demonstrate adequate yearly progress regarding the improvement of student achievement in literacy education. This book will help teachers and parents raise children's scores on standardized…

  20. The Effects of Group Members' Personalities on a Test Taker's L2 Group Oral Discussion Test Scores

    ERIC Educational Resources Information Center

    Ockey, Gary J.

    2009-01-01

    The second language group oral is a test of second language speaking proficiency, in which a group of three or more English language learners discuss an assigned topic without interaction with interlocutors. Concerns expressed about the extent to which test takers' personal characteristics affect the scores of others in the group have limited its…

  1. The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

    ERIC Educational Resources Information Center

    Bennett, Randy Elliot; And Others

    1990-01-01

    The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)

  2. Noncognitive Skills and the Gender Disparities in Test Scores and Teacher Assessments: Evidence from Primary School

    ERIC Educational Resources Information Center

    Cornwell, Christopher; Mustard, David B.; Van Parys, Jessica

    2013-01-01

    Using data from the 1998-99 ECLS-K cohort, we show that the grades awarded by teachers are not aligned with test scores. Girls in every racial category outperform boys on reading tests, while boys score at least as well on math and science tests as girls. However, boys in all racial categories across all subject areas are not represented in…

  3. Web-based training and interrater reliability testing for scoring the Hamilton Depression Rating Scale.

    PubMed

    Rosen, Jules; Mulsant, Benoit H; Marino, Patricia; Groening, Christopher; Young, Robert C; Fox, Debra

    2008-10-30

    Despite the importance of establishing shared scoring conventions and assessing interrater reliability in clinical trials in psychiatry, these elements are often overlooked. Obstacles to rater training and reliability testing include logistic difficulties in providing live training sessions, or mailing videotapes of patients to multiple sites and collecting the data for analysis. To address some of these obstacles, a web-based interactive video system was developed. It uses actors of diverse ages, gender and race to train raters how to score the Hamilton Depression Rating Scale and to assess interrater reliability. This system was tested with a group of experienced and novice raters within a single site. It was subsequently used to train raters of a federally funded multi-center clinical trial on scoring conventions and to test their interrater reliability. The advantages and limitations of using interactive video technology to improve the quality of clinical trials are discussed.

  4. Opportunity to learn: Investigating possible predictors for pre-course Test Of Astronomy STandards TOAST scores

    NASA Astrophysics Data System (ADS)

    Berryhill, Katie J.

    As astronomy education researchers become more interested in experimentally testing innovative teaching strategies to enhance learning in introductory astronomy survey courses ("ASTRO 101"), scholars are placing increased attention toward better understanding factors impacting student gain scores on the widely used Test Of Astronomy STandards (TOAST). Usually used in a pre-test and post-test research design, one might naturally assume that the pre-course differences observed between high- and low-scoring college students might be due in large part to their pre-existing motivation, interest, experience in science, and attitudes about astronomy. To explore this notion, 11 non-science majoring undergraduates taking ASTRO 101 at west coast community colleges were interviewed in the first few weeks of the course to better understand students' pre-existing affect toward learning astronomy with an eye toward predicting student success. In answering this question, we hope to contribute to our understanding of the incoming knowledge of students taking undergraduate introductory astronomy classes, but also gain insight into how faculty can best meet those students' needs and assist them in achieving success. Perhaps surprisingly, there was only weak correlation between students' motivation toward learning astronomy and their pre-test scores. Instead, the most fruitful predictor of TOAST pre-test scores was the quantity of pre-existing, informal, self-directed astronomy learning experiences.

  5. Individual Differences in Digit Span, Susceptibility to Proactive Interference, and Aptitude/Achievement Test Scores.

    ERIC Educational Resources Information Center

    Dempster, Frank N.; Cooney, John B.

    1982-01-01

    Individual differences in digit span, susceptibility to proactive interference, and various aptitude/achievement test scores were investigated in two experiments with college students. Results indicated that digit span was strongly correlated with aptitude/achievement scores, but did not indicate that susceptibility to proactive interference…

  6. A pretest prognostic score to assess patients undergoing exercise or pharmacological stress testing

    PubMed Central

    Morise, Anthony; Evans, Matthew; Jalisi, Farrukh; Shetty, Rajendra; Stauffer, Marc

    2007-01-01

    Objective A previously developed pretest score was validated to stratify patients presenting for exercise testing with suspected coronary disease according to the presence of angiographic coronary disease. Our goal was to determine how well this pretest score risk stratified patients undergoing pharmacological and exercise stress tests concerning prognostic endpoints. Design Retrospective cohort analysis. Setting University hospital stress laboratory. Patients 7452 unselected ambulatory patients with symptoms of suspected coronary disease undergoing stress testing between 1995 and 2004. Main outcomes measures All‐cause death, cardiac death and non‐fatal myocardial infarction. Results The rate of all‐cause death was 5.5% (CI 5.0 to 6.1) with 4.3 (SD 2.4) years of follow‐up (Exercise 2.8% (CI 2.3 to 3.2) v Pharmacological group 11.9% (CI 10.5 to 13.3); p<0.001). The rate of cardiac death/myocardial infarction was 2.6% (CI 2.2 to 3.0) (Exercise 1.4% (CI 1.1 to 1.8) v Pharmacological group 5.3% (CI 4.3 to 6.2); p<0.001). In both groups, stratification by pretest score was significant for all‐cause death and the combined endpoint. However, stratification was more effective in the pharmacological group using the combined endpoint rather than all‐cause death. Pharmacological stress patients in intermediate and high risk groups were at higher risk than their respective exercise test cohorts. Referral for pharmacological stress testing was found to be an independent predictor of time to death (2.7 (CI 2.0 to 3.6); p<0.001). Conclusion A pretest score previously validated to stratify according to angiographic outcomes, effectively risk stratified pharmacological and exercise stress patients according to the combined endpoint of cardiac death/myocardial infarction. PMID:17228070

  7. Construction of an Exome-Wide Risk Score for Schizophrenia Based on a Weighted Burden Test.

    PubMed

    Curtis, David

    2018-01-01

    Polygenic risk scores obtained as a weighted sum of associated variants can be used to explore association in additional data sets and to assign risk scores to individuals. The methods used to derive polygenic risk scores from common SNPs are not suitable for variants detected in whole exome sequencing studies. Rare variants, which may have major effects, are seen too infrequently to judge whether they are associated and may not be shared between training and test subjects. A method is proposed whereby variants are weighted according to their frequency, their annotations and the genes they affect. A weighted sum across all variants provides an individual risk score. Scores constructed in this way are used in a weighted burden test and are shown to be significantly different between schizophrenia cases and controls using a five-way cross-validation procedure. This approach represents a first attempt to summarise exome sequence variation into a summary risk score, which could be combined with risk scores from common variants and from environmental factors. It is hoped that the method could be developed further. © 2017 John Wiley & Sons Ltd/University College London.

  8. Can Machine Scoring Deal with Broad and Open Writing Tests as Well as Human Readers?

    ERIC Educational Resources Information Center

    McCurry, Doug

    2010-01-01

    This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…

  9. Pediatric residents' learning styles and temperaments and their relationships to standardized test scores.

    PubMed

    Tuli, Sanjeev Y; Thompson, Lindsay A; Saliba, Heidi; Black, Erik W; Ryan, Kathleen A; Kelly, Maria N; Novak, Maureen; Mellott, Jane; Tuli, Sonal S

    2011-12-01

    Board certification is an important professional qualification and a prerequisite for credentialing, and the Accreditation Council for Graduate Medical Education (ACGME) assesses board certification rates as a component of residency program effectiveness. To date, research has shown that preresidency measures, including National Board of Medical Examiners scores, Alpha Omega Alpha Honor Medical Society membership, or medical school grades poorly predict postresidency board examination scores. However, learning styles and temperament have been identified as factors that 5 affect test-taking performance. The purpose of this study is to characterize the learning styles and temperaments of pediatric residents and to evaluate their relationships to yearly in-service and postresidency board examination scores. This cross-sectional study analyzed the learning styles and temperaments of current and past pediatric residents by administration of 3 validated tools: the Kolb Learning Style Inventory, the Keirsey Temperament Sorter, and the Felder-Silverman Learning Style test. These results were compared with known, normative, general and medical population data and evaluated for correlation to in-service examination and postresidency board examination scores. The predominant learning style for pediatric residents was converging 44% (33 of 75 residents) and the predominant temperament was guardian 61% (34 of 56 residents). The learning style and temperament distribution of the residents was significantly different from published population data (P  =  .002 and .04, respectively). Learning styles, with one exception, were found to be unrelated to standardized test scores. The predominant learning style and temperament of pediatric residents is significantly different than that of the populations of general and medical trainees. However, learning styles and temperament do not predict outcomes on standardized in-service and board examinations in pediatric residents.

  10. Spinal appearance questionnaire: factor analysis, scoring, reliability, and validity testing.

    PubMed

    Carreon, Leah Y; Sanders, James O; Polly, David W; Sucato, Daniel J; Parent, Stefan; Roy-Beaudry, Marjolaine; Hopkins, Jeffrey; McClung, Anna; Bratcher, Kelly R; Diamond, Beverly E

    2011-08-15

    Cross sectional. This study presents the factor analysis of the Spinal Appearance Questionnaire (SAQ) and its psychometric properties. Although the SAQ has been administered to a large sample of patients with adolescent idiopathic scoliosis (AIS) treated surgically, its psychometric properties have not been fully evaluated. This study presents the factor analysis and scoring of the SAQ and evaluates its psychometric properties. The SAQ and the Scoliosis Research Society-22 (SRS-22) were administered to AIS patients who were being observed, braced or scheduled for surgery. Standard demographic data and radiographic measures including Lenke type and curve magnitude were also collected. Of the 1802 patients, 83% were female; with a mean age of 14.8 years and mean initial Cobb angle of 55.8° (range, 0°-123°). From the 32 items of the SAQ, 15 loaded on two factors with consistent and significant correlations across all Lenke types. There is an Appearance (items 1-10) and an Expectations factor (items 12-15). Responses are summed giving a range of 5 to 50 for the Appearance domain and 5 to 20 for the Expectations domain. The Cronbach's α was 0.88 for both domains and Total score with a test-retest reliability of 0.81 for Appearance and 0.91 for Expectations. Correlations with major curve magnitude were higher for the SAQ Appearance and SAQ Total scores compared to correlations between the SRS Appearance and SRS Total scores. The SAQ and SRS-22 Scores were statistically significantly different in patients who were scheduled for surgery compared to those who were observed or braced. The SAQ is a valid measure of self-image in patients with AIS with greater correlation to curve magnitude than SRS Appearance and Total score. It also discriminates between patients who require surgery from those who do not.

  11. Drug Testing in the Schools. Implications for Policy.

    ERIC Educational Resources Information Center

    Bozeman, William C.; And Others

    Drug testing of district employees and students is examined from several perspectives: implications for school policy, legality, administration and protocol, and test reliability and accuracy. Substance abuse has become a major concern for educators, parents, and citizens as illegal drugs are more readily available. It is also pointed out that the…

  12. Effect of Item Arrangement, Knowledge of Arrangement, and Test Anxiety on Two Scoring Methods.

    ERIC Educational Resources Information Center

    Plake, Barbara S.; And Others

    1981-01-01

    Number right and elimination scores were analyzed on a college level mathematics exam assembled from pretest data. Anxiety measures were administered along with the experimental forms to undergraduates. Results suggest that neither test scores nor attitudes are influenced by item order knowledge thereof, or anxiety level. (Author/GK)

  13. ACER Mathematics Profile Series: Number Test. (Test Booklet, Answer and Record Sheet, Score Key, and Teachers Handbook).

    ERIC Educational Resources Information Center

    Cornish, Greg; Wines, Robin

    The Number Test of the ACER Mathematics Profile Series, contains 30 items, for each of three suggested grade levels: 7-8, 8-9, and 9-10. Raw scores on all tests in the ACER Mathematics Profile Series (Number, Operations, Space and Measurement) are converted to a common scale called MAPS, a major feature of the Series. Based on the Rasch Model,…

  14. Linking Scores from Tests of Similar Content Given in Different Languages: An Illustration Involving Methodological Alternatives

    ERIC Educational Resources Information Center

    Cascallar, Alicia S.; Dorans, Neil J.

    2005-01-01

    This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…

  15. Credit scores, cardiovascular disease risk, and human capital.

    PubMed

    Israel, Salomon; Caspi, Avshalom; Belsky, Daniel W; Harrington, HonaLee; Hogan, Sean; Houts, Renate; Ramrakha, Sandhya; Sanders, Seth; Poulton, Richie; Moffitt, Terrie E

    2014-12-02

    Credit scores are the most widely used instruments to assess whether or not a person is a financial risk. Credit scoring has been so successful that it has expanded beyond lending and into our everyday lives, even to inform how insurers evaluate our health. The pervasive application of credit scoring has outpaced knowledge about why credit scores are such useful indicators of individual behavior. Here we test if the same factors that lead to poor credit scores also lead to poor health. Following the Dunedin (New Zealand) Longitudinal Study cohort of 1,037 study members, we examined the association between credit scores and cardiovascular disease risk and the underlying factors that account for this association. We find that credit scores are negatively correlated with cardiovascular disease risk. Variation in household income was not sufficient to account for this association. Rather, individual differences in human capital factors—educational attainment, cognitive ability, and self-control—predicted both credit scores and cardiovascular disease risk and accounted for ∼45% of the correlation between credit scores and cardiovascular disease risk. Tracing human capital factors back to their childhood antecedents revealed that the characteristic attitudes, behaviors, and competencies children develop in their first decade of life account for a significant portion (∼22%) of the link between credit scores and cardiovascular disease risk at midlife. We discuss the implications of these findings for policy debates about data privacy, financial literacy, and early childhood interventions.

  16. Credit scores, cardiovascular disease risk, and human capital

    PubMed Central

    Israel, Salomon; Caspi, Avshalom; Belsky, Daniel W.; Harrington, HonaLee; Hogan, Sean; Houts, Renate; Ramrakha, Sandhya; Sanders, Seth; Poulton, Richie; Moffitt, Terrie E.

    2014-01-01

    Credit scores are the most widely used instruments to assess whether or not a person is a financial risk. Credit scoring has been so successful that it has expanded beyond lending and into our everyday lives, even to inform how insurers evaluate our health. The pervasive application of credit scoring has outpaced knowledge about why credit scores are such useful indicators of individual behavior. Here we test if the same factors that lead to poor credit scores also lead to poor health. Following the Dunedin (New Zealand) Longitudinal Study cohort of 1,037 study members, we examined the association between credit scores and cardiovascular disease risk and the underlying factors that account for this association. We find that credit scores are negatively correlated with cardiovascular disease risk. Variation in household income was not sufficient to account for this association. Rather, individual differences in human capital factors—educational attainment, cognitive ability, and self-control—predicted both credit scores and cardiovascular disease risk and accounted for ∼45% of the correlation between credit scores and cardiovascular disease risk. Tracing human capital factors back to their childhood antecedents revealed that the characteristic attitudes, behaviors, and competencies children develop in their first decade of life account for a significant portion (∼22%) of the link between credit scores and cardiovascular disease risk at midlife. We discuss the implications of these findings for policy debates about data privacy, financial literacy, and early childhood interventions. PMID:25404329

  17. Effect of vowel context on test-retest nasalance score variability in children with and without cleft palate.

    PubMed

    Ha, Seunghee; Jung, Seungeun; Koh, Kyung S

    2018-06-01

    The purpose of this study was to determine whether test-retest nasalance score variability differs between Korean children with and without cleft palate (CP) and vowel context influences variability in nasalance score. Thirty-four 3-to-5-year-old children with and without CP participated in the study. Three 8-syllable speech stimuli devoid of nasal consonants were used for data collection. Each stimulus was loaded with high, low, or mixed vowels, respectively. All participants were asked to repeat the speech stimuli twice after the examiner, and an immediate test-retest nasalance score was assessed with no headgear change. Children with CP exhibited significantly greater absolute difference in nasalance scores than children without CP. Variability in nasalance scores was significantly different for the vowel context, and the high vowel sentence showed a significantly larger difference in nasalance scores than the low vowel sentence. The cumulative frequencies indicated that, for children with CP in the high vowel sentence, only 8 of 17 (47%) repeated nasalance scores were within 5 points. Test-retest nasalance score variability was greater for children with CP than children without CP, and there was greater variability for the high vowel sentence(s) for both groups. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Scoring Method of a Situational Judgment Test: Influence on Internal Consistency Reliability, Adverse Impact and Correlation with Personality?

    ERIC Educational Resources Information Center

    De Leng, W. E.; Stegers-Jager, K. M.; Husbands, A.; Dowell, J. S.; Born, M. Ph.; Themmen, A. P.

    2017-01-01

    Situational Judgment Tests (SJTs) are increasingly used for medical school selection. Scoring an SJT is more complicated than scoring a knowledge test, because there are no objectively correct answers. The scoring method of an SJT may influence the construct and concurrent validity and the adverse impact with respect to non-traditional students.…

  19. Decision making under internal uncertainty: the case of multiple-choice tests with different scoring rules.

    PubMed

    Bereby-Meyer, Yoella; Meyer, Joachim; Budescu, David V

    2003-02-01

    This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT. Copyright 2002 Elsevier Science B.V.

  20. Effects of Scoring by Section and Independent Scorers' Patterns on Scorer Reliability in Biology Essay Tests

    ERIC Educational Resources Information Center

    Ebuoh, Casmir N.; Ezeudu, S. A.

    2015-01-01

    The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…

  1. The Effects of Process Oriented Guided Inquiry Learning on Secondary Student ACT Science Scores

    NASA Astrophysics Data System (ADS)

    Judd, William Lindsey

    The purpose of this study was to examine any significant difference on secondary school chemistry students' ACT Science Test scores between students taught by the Process Oriented Guided Inquiry Learning (POGIL) method versus students taught by traditional, teacher-centered pedagogy. This study also examined any difference between students taught by the POGIL method versus students taught by traditional, teacher-centered pedagogy in regard to the three different types of questions on the ACT Science Test: data representation, research summaries, and conflicting viewpoints. The sample consisted of sophomore-level students at two private, suburban Christian schools. A pretest-posttest design was used to compare the mean difference in scores from ACT issued sample test booklets before and after each group had received instruction via the POGIL method or more traditional methods. This study found that there was no significant difference in the mean difference of test scores between the two groups. This study also found that there was not a significant difference in the mean difference of scores in regard to the three different types of questions on the ACT Science Test. Further implications of this study are discussed.

  2. An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

    ERIC Educational Resources Information Center

    Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

    2013-01-01

    Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…

  3. Pediatric Residents' Learning Styles and Temperaments and Their Relationships to Standardized Test Scores

    PubMed Central

    Tuli, Sanjeev Y.; Thompson, Lindsay A.; Saliba, Heidi; Black, Erik W.; Ryan, Kathleen A.; Kelly, Maria N.; Novak, Maureen; Mellott, Jane; Tuli, Sonal S.

    2011-01-01

    Background Board certification is an important professional qualification and a prerequisite for credentialing, and the Accreditation Council for Graduate Medical Education (ACGME) assesses board certification rates as a component of residency program effectiveness. To date, research has shown that preresidency measures, including National Board of Medical Examiners scores, Alpha Omega Alpha Honor Medical Society membership, or medical school grades poorly predict postresidency board examination scores. However, learning styles and temperament have been identified as factors that 5 affect test-taking performance. The purpose of this study is to characterize the learning styles and temperaments of pediatric residents and to evaluate their relationships to yearly in-service and postresidency board examination scores. Methods This cross-sectional study analyzed the learning styles and temperaments of current and past pediatric residents by administration of 3 validated tools: the Kolb Learning Style Inventory, the Keirsey Temperament Sorter, and the Felder-Silverman Learning Style test. These results were compared with known, normative, general and medical population data and evaluated for correlation to in-service examination and postresidency board examination scores. Results The predominant learning style for pediatric residents was converging 44% (33 of 75 residents) and the predominant temperament was guardian 61% (34 of 56 residents). The learning style and temperament distribution of the residents was significantly different from published population data (P  =  .002 and .04, respectively). Learning styles, with one exception, were found to be unrelated to standardized test scores. Conclusions The predominant learning style and temperament of pediatric residents is significantly different than that of the populations of general and medical trainees. However, learning styles and temperament do not predict outcomes on standardized in-service and board

  4. Fine-Tuning Cross-Battery Assessment Procedures: After Follow-Up Testing, Use All Valid Scores, Cohesive or Not

    ERIC Educational Resources Information Center

    Schneider, W. Joel; Roman, Zachary

    2018-01-01

    We used data simulations to test whether composites consisting of cohesive subtest scores are more accurate than composites consisting of divergent subtest scores. We demonstrate that when multivariate normality holds, divergent and cohesive scores are equally accurate. Furthermore, excluding divergent scores results in biased estimates of…

  5. Just as smart but not as successful: obese students obtain lower school grades but equivalent test scores to nonobese students.

    PubMed

    MacCann, C; Roberts, R D

    2013-01-01

    The obesity epidemic in industrialized nations has important implications for education, as research demonstrates lower academic achievement among obese students. The current paper compares the test scores and school grades of obese, overweight and normal-weight students in secondary and further education, controlling for demographic variables, personality, ability and well-being confounds. This study included 383 eighth-grade students (49% female; study 1) and 1036 students from 24 community colleges and universities (64% female, study 2), both drawn from five regions across the United States. In study 1, body mass index (BMI) was calculated using self-reports and parent reports of weight and height. In study 2, BMI was calculated from self-reported weight and height only. Both samples completed age-appropriate assessments of mathematics, vocabulary and the personality trait conscientiousness. Eighth-grade students additionally completed a measure of life satisfaction, with both self-reports and parent reports of their grades from the previous semester also obtained. Higher education students additionally completed measures of positive and negative affect, and self-reported their grades and college entrance scores. Obese students receive significantly lower grades in middle school (d=0.83), community college (d=0.34) and university (d=0.36), but show no statistically significant differences in intelligence or achievement test scores. Even after controlling for demographic variables, intelligence, personality and well-being, obese students obtain significantly lower grades than normal-weight students in the eighth grade (d=0.39), community college (d=0.42) and university (d=0.31). Lower grades may reflect peer and teacher prejudice against overweight and obese students rather than lack of ability among these students.

  6. How Should Colleges Treat Multiple Admissions Test Scores? ACT Working Paper 2017-4

    ERIC Educational Resources Information Center

    Mattern, Krista; Radunzel, Justine; Bertling, Maria; Ho, Andrew

    2017-01-01

    The percentage of students retaking college admissions tests is rising (Harmston & Crouse, 2016). Researchers and college admissions offices currently use a variety of methods for summarizing these multiple scores. Testing companies, interested in validity evidence like correlations with college first-year grade-point averages (FYGPA), often…

  7. European Society of Cardiology-Recommended Coronary Artery Disease Consortium Pretest Probability Scores More Accurately Predict Obstructive Coronary Disease and Cardiovascular Events Than the Diamond and Forrester Score: The Partners Registry.

    PubMed

    Bittencourt, Marcio Sommer; Hulten, Edward; Polonsky, Tamar S; Hoffman, Udo; Nasir, Khurram; Abbara, Suhny; Di Carli, Marcelo; Blankstein, Ron

    2016-07-19

    The most appropriate score for evaluating the pretest probability of obstructive coronary artery disease (CAD) is unknown. We sought to compare the Diamond-Forrester (DF) score with the 2 CAD consortium scores recently recommended by the European Society of Cardiology. We included 2274 consecutive patients (age, 56±13 years; 57% male) without prior CAD referred for coronary computed tomographic angiography. Computed tomographic angiography findings were used to determine the presence or absence of obstructive CAD (≥50% stenosis). We compared the DF score with the 2 CAD consortium scores with respect to their ability to predict obstructive CAD and the potential implications of these scores on the downstream use of testing for CAD, as recommended by current guidelines. The DF score did not satisfactorily fit the data and resulted in a significant overestimation of the prevalence of obstructive CAD (P<0.001); the CAD consortium basic score had no significant lack of fitness; and the CAD consortium clinical provided adequate goodness of fit (P=0.39). The DF score had a lower discrimination for obstructive CAD, with an area under the receiver-operating characteristics curve of 0.713 versus 0.752 and 0.791 for the CAD consortium models (P<0.001 for both). Consequently, the use of the DF score was associated with fewer individuals being categorized as requiring no additional testing (8.3%) compared with the CAD consortium models (24.6% and 30.0%; P<0.001). The proportion of individuals with a high pretest probability was 18% with the DF and only 1.1% with the CAD consortium scores (P<0.001) CONCLUSIONS: Among contemporary patients referred for noninvasive testing, the DF risk score overestimates the risk of obstructive CAD. On the other hand, the CAD consortium scores offered improved goodness of fit and discrimination; thus, their use could decrease the need for noninvasive or invasive testing while increasing the yield of such tests. © 2016 American Heart

  8. EXPLORATION OF SCORE AGREEMENT ON A MODIFIED UPPER QUARTER Y-BALANCE TEST KIT AS COMPARED TO THE UPPER QUARTER Y-BALANCE TEST.

    PubMed

    Cramer, Josh; Quintero, Miguel; Rhinehart, Alex; Rutherford, Caitlin; Nasypany, Alan; May, James; Baker, Russell T

    2017-02-01

    Physical performance measures (PPMs) such as The Star Excursion Balance Test (SEBT) and the Y-Balance Test (YBT) are functional movement tests used to assess participants' dynamic balance, which can be a vital component in physical exams to identify predisposing factors for risk of injury. The YBT is a functional assessment tool for the upper and lower body. It evolved from the SEBT, which has been previously used in research as a lower body functional assessment. It is comprised of fewer movement directions, which help limit fatigue. The YBT kit is a commercialized tool, which may pose barriers for clinicians with limited budgets and/or strict approval process for purchasing capital items in their clinics, especially healthcare providers in the secondary school setting. The cost may also pose a barrier for researchers with limited budgets. A less expensive, easy to make kit, may provide clinicians an opportunity to integrate functional testing into their evaluation or research. The purpose of this pilot study was to describe a cost efficient method to gather participant's upper quarter YBT (UQYBT) measurements and examine the inter- and intra-rater score agreement between this method and the commercial YBT measurements. A convenience sample of 20 physically active participants volunteered to participate in a comparison study of the of Upper Quarter Y-Balance Test (UQYBT) using the commercialized kit and the Modified Upper Quarter Y-Balance Test kit (mUQYBT) made with three cloth tape measures, athletic tape, a goniometer and three 2x4x8 wood blocks. A Pearson Product Moment correlation and Bland-Altman analyses were used to examine the relationship between intra-rater scores comparing the UQYBT and mUQYBT. Inter-rater scores were analyzed using intraclass correlation coefficients (ICC) (2,1) and Bland-Altman analyses. All Pearson Product Moment r-values for intra-rater scores were greater than .96 and statistically significant at p<0.05. Coefficients of

  9. Distribution and magnitude of type I error of model-based multipoint lod scores: implications for multipoint mod scores.

    PubMed

    Xing, Chao; Elston, Robert C

    2006-07-01

    The multipoint lod score and mod score methods have been advocated for their superior power in detecting linkage. However, little has been done to determine the distribution of multipoint lod scores or to examine the properties of mod scores. In this paper we study the distribution of multipoint lod scores both analytically and by simulation. We also study by simulation the distribution of maximum multipoint lod scores when maximized over different penetrance models. The multipoint lod score is approximately normally distributed with mean and variance that depend on marker informativity, marker density, specified genetic model, number of pedigrees, pedigree structure, and pattern of affection status. When the multipoint lod scores are maximized over a set of assumed penetrances models, an excess of false positive indications of linkage appear under dominant analysis models with low penetrances and under recessive analysis models with high penetrances. Therefore, caution should be taken in interpreting results when employing multipoint lod score and mod score approaches, in particular when inferring the level of linkage significance and the mode of inheritance of a trait.

  10. Student Neighborhoods, Schools, and Test Score Growth: Evidence from Milwaukee, Wisconsin

    ERIC Educational Resources Information Center

    Carlson, Deven; Cowen, Joshua M.

    2015-01-01

    Schools and neighborhoods are thought to be two of the most important contextual influences on student academic outcomes. Drawing on a unique data set that permits simultaneous estimation of neighborhood and school contributions to student test score gains, we analyze the distributions of these contributions to consider the relative importance of…

  11. Teachers' Perceptions and Expectations and the Black-White Test Score Gap.

    ERIC Educational Resources Information Center

    Ferguson, Ronald F.

    2003-01-01

    Evaluates how schools can positively affect the test score gap between black and white students by examining two potential sources for this difference: teachers and students. Offers evidence for the proposition that teachers' perceptions, expectations, and behaviors interact with students' beliefs, behaviors, and work habits in ways that help to…

  12. The Effect of Stakes on Accountability Test Scores and Pass Rates

    ERIC Educational Resources Information Center

    Steedle, Jeffrey T.; Grochowalski, Joseph

    2017-01-01

    Students may not fully demonstrate their knowledge and skills on accountability tests if there are no stakes attached to individual performance. In that case, assessment results may not accurately reflect student achievement, so the validity of score interpretations and uses suffers. For this study, matched samples of students taking state…

  13. Effects of Analytical and Holistic Scoring Patterns on Scorer Reliability in Biology Essay Tests

    ERIC Educational Resources Information Center

    Ebuoh, Casmir N.

    2018-01-01

    Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…

  14. Impact of a standardized test package on exit examination scores and NCLEX-RN outcomes.

    PubMed

    Homard, Catherine M

    2013-03-01

    The purpose of this ex post facto correlational study was to compare exit examination scores and NCLEX-RN(®) pass rates of baccalaureate nursing students who differed in level of participation in a standardized test package. Three cohort groups emerged as a standardized test package was introduced: (a) students who did not participate in a standardized test package; (b) students with two semesters of a standardized test package; and (c) students with four semesters of a standardized test package. Benner's novice-to-expert theory framed the study in the belief that students best acquire knowledge and skills through practice and reflection. Students participating in four semesters of a standardized test package demonstrated higher exit examination scores and NCLEX-RN pass rates compared with students who did not participate in this package. This study's results could inform nurse educators about strategies to facilitate nursing student success on exit examinations and the NCLEX-RN. Copyright 2013, SLACK Incorporated.

  15. [Relationship between unipedal stance test score and center of pressure velocity in elderly].

    PubMed

    Rodrigo Antonio, Guzmán; Rony, Silvestre; Francisco Aniceto, Rodríguez; David Andrés, Arriagada; Pablo Andrés, Ortega

    2011-01-01

    Frequent falls are one of the most important health problems in the elderly population. The unipedal stance test (UPST), asses postural stability and is used in fall risk measures. Despite this, there is little information about its relationship with posturographic parameters (PP) that characterizes postural stability. Center of pressure velocity (CoPV) is one of the best PP that describes postural stability. The aim of this study was to analyze the relation between UST score and CoPV in elderly population. A sample of 38 healthy elderly subjects where divided in two groups according to their UPST score, low performance (LP, n=11) and high performance (HP, n=27). The correlation between UPST score and COP mean velocity (CoPmV), recorded from a posturographic test, was analyzed between both groups. An inverse correlation between UPST score and CoPmV was found in both groups. However, this was higher in the LP group (r=-0.69, P=.02) compared to the HP (r=-0.39, P=.04). Based on the results of this investigation, it may be concluded that the achievement on UPST has an inverse relationship with CoPmV, especially in subjects with low performance in the UPST. Copyright © 2010 SEGG. Published by Elsevier Espana. All rights reserved.

  16. Linkage analysis in nuclear families. 2: Relationship between affected sib-pair tests and lod score analysis.

    PubMed

    Knapp, M; Seuchter, S A; Baur, M P

    1994-01-01

    It is believed that the main advantage of affected sib-pair tests is that their application requires no information about the underlying genetic mechanism of the disease. However, here it is proved that the mean test, which can be considered the most prominent of the affected sib-pair tests, is equivalent to lod score analysis for an assumed recessive mode of inheritance, irrespective of the true mode of the disease. Further relationships of certain sib-pair tests and lod score analysis under specific assumed genetic modes are investigated.

  17. Test-Retest Reliability and Minimal Detectable Change of Randomized Dichotic Digits in Learning-Disabled Children: Implications for Dichotic Listening Training.

    PubMed

    Mahdavi, Mohammad Ebrahim; Pourbakht, Akram; Parand, Akram; Jalaie, Shohreh

    2018-03-01

    children were 1.46 and 1.44% for the right ear and 4.68 and 5.47% for the left ear. SEM and SEM% of the ear scores in LD children were 4.55 and 5.88% for the right ear to 7.56 and 12.81% for the left ear. MDC and MDC% of the ear scores in TA children varied from 4.03 and 3.99% for the right ear to 12.93 and 15.13% for the left ear. MDC and MDC% of the ear scores in LD children varied from 12.57 and 16.25% for the right ear to 20.89 and 35.39% for the left ear. The LD children indicated test-retest relative reliability as high as TA children in the ear scores measured by PRDDT. However, within-subject variations of the ear scores calculated by indices of absolute reliability were considerably higher in LD children versus TA children. The results of the current study could have implications for detecting real training-related changes in the ear scores. American Academy of Audiology

  18. An Update of "Implications of Changing Answers on Objective Test Items".

    ERIC Educational Resources Information Center

    Mercer, Maryann

    In a 1977 review of the literature on test answer changing, Mueller and Wasser (EJ 163 236) cited 17 studies and concluded that students changing answers on objective tests gain more points than they lost by so doing. Higher scoring students tend to gain more than do the lower scoring students. Six additional studies not reported in the Mueller…

  19. Association of Health Sciences Reasoning Test scores with academic and experiential performance.

    PubMed

    Cox, Wendy C; McLaughlin, Jacqueline E

    2014-05-15

    To assess the association of scores on the Health Sciences Reasoning Test (HSRT) with academic and experiential performance in a doctor of pharmacy (PharmD) curriculum. The HSRT was administered to 329 first-year (P1) PharmD students. Performance on the HSRT and its subscales was compared with academic performance in 29 courses throughout the curriculum and with performance in advanced pharmacy practice experiences (APPEs). Significant positive correlations were found between course grades in 8 courses and HSRT overall scores. All significant correlations were accounted for by pharmaceutical care laboratory courses, therapeutics courses, and a law and ethics course. There was a lack of moderate to strong correlation between HSRT scores and academic and experiential performance. The usefulness of the HSRT as a tool for predicting student success may be limited.

  20. Consistency of SAT® I: Reasoning Test Score Conversions. Research Report. ETS RR-08-67

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Guo, Hongwen; Liu, Jinghua; Dorans, Neil J.

    2008-01-01

    This study uses historical data to explore the consistency of SAT® I: Reasoning Test score conversions and to examine trends in scaled score means. During the period from April 1995 to December 2003, both Verbal (V) and Math (M) means display substantial seasonality, and a slight increasing trend for both is observed. SAT Math means increase more…

  1. Do Standardized Tests Penalize Deep-Thinking, Creative, or Conscientious Students?: Some Personality Correlates of Graduate Record Examinations Test Scores

    ERIC Educational Resources Information Center

    Powers, Donald E.; Kaufman, James C.

    2004-01-01

    The objective of the study reported here was to explore the relationship of Graduate Record Examinations (GRE) General Test scores to selected personality traits--conscientiousness, rationality, ingenuity, quickness, creativity, and depth. A sample of 342 GRE test takers completed short personality inventory scales for each trait. Analyses…

  2. Demographically Adjusted Groups for Equating Test Scores. Research Report. ETS RR-14-30

    ERIC Educational Resources Information Center

    Livingston, Samuel A.

    2014-01-01

    In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…

  3. Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) predictors of police officer problem behavior and collateral self-report test scores.

    PubMed

    Tarescavage, Anthony M; Fischler, Gary L; Cappo, Bruce M; Hill, David O; Corey, David M; Ben-Porath, Yossef S

    2015-03-01

    The current study examined the predictive validity of Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) scores in police officer screenings. We utilized a sample of 712 police officer candidates (82.6% male) from 2 Midwestern police departments. The sample included 426 hired officers, most of whom had supervisor ratings of problem behaviors and human resource records of civilian complaints. With the full sample, we calculated zero-order correlations between MMPI-2-RF scale scores and scale scores from the California Psychological Inventory (Gough, 1956) and Inwald Personality Inventory (Inwald, 2006) by gender. In the hired sample, we correlated MMPI-2-RF scale scores with the outcome data for males only, owing to the relatively small number of hired women. Several scales demonstrated meaningful correlations with the criteria, particularly in the thought dysfunction and behavioral/externalizing dysfunction domains. After applying a correction for range restriction, the correlation coefficient magnitudes were generally in the moderate to large range. The practical implications of these findings were explored by means of risk ratio analyses, which indicated that officers who produced elevations at cutscores lower than the traditionally used 65 T-score level were as much as 10 times more likely than those scoring below the cutoff to exhibit problem behaviors. Overall, the results supported the validity of the MMPI-2-RF in this setting. Implications and limitations of this study are discussed. 2015 APA, all rights reserved

  4. CaPTHUS scoring model in primary hyperparathyroidism: can it eliminate the need for ioPTH testing?

    PubMed

    Elfenbein, Dawn M; Weber, Sara; Schneider, David F; Sippel, Rebecca S; Chen, Herbert

    2015-04-01

    The CaPTHUS model was reported to have a positive predictive value of 100 % to correctly predict single-gland disease in patients with primary hyperparathyroidism, thus obviating the need for intraoperative parathyroid hormone (ioPTH) testing. We sought to apply the CaPTHUS scoring model in our patient population and assess its utility in predicting long-term biochemical cure. We retrospective reviewed all parathyroidectomies for primary hyperparathyroidism performed at our university hospital from 2003 to 2012. We routinely perform ioPTH testing. Biochemical cure was defined as a normal calcium level at 6 months. A total of 1,421 patients met the inclusion criteria: 78 % of patients had a single adenoma at the time of surgery, 98 % had a normal serum calcium at 1 week postoperatively, and 96 % had a normal serum calcium level 6 months postoperatively. Using the CaPTHUS scoring model, 307 patients (22.5 %) had a score of ≥ 3, with a positive predictive value of 91 % for single adenoma. A CaPTHUS score of ≥ 3 had a positive predictive value of 98 % for biochemical cure at 1 week as well as at 6 months. In our population, where ioPTH testing is used routinely to guide use of bilateral exploration, patients with a preoperative CaPTHUS score of ≥ 3 had good long-term biochemical cure rates. However, the model only predicted adenoma in 91 % of cases. If minimally invasive parathyroidectomy without ioPTH testing had been done for these patients, the cure rate would have dropped from 98 % to an unacceptable 89 %. Even in these patients with high CaPTHUS scores, multigland disease is present in almost 10 %, and ioPTH testing is necessary.

  5. ADAMTS13 test and/or PLASMIC clinical score in management of acquired thrombotic thrombocytopenic purpura: a cost-effective analysis.

    PubMed

    Kim, Chong H; Simmons, Sierra C; Williams, Lance A; Staley, Elizabeth M; Zheng, X Long; Pham, Huy P

    2017-11-01

    The ADAMTS13 test distinguishes thrombotic thrombocytopenic purpura (TTP) from other thrombotic microangiopathies (TMAs). The PLASMIC score helps determine the pretest probability of ADAMTS13 deficiency. Due to inherent limitations of both tests, and potential adverse effects and cost of unnecessary treatments, we performed a cost-effectiveness analysis (CEA) investigating the benefits of incorporating an in-hospital ADAMTS13 test and/or PLASMIC score into our clinical practice. A CEA model was created to compare four scenarios for patients with TMAs, utilizing either an in-house or a send-out ADAMTS13 assay with or without prior risk stratification using PLASMIC scoring. Model variables, including probabilities and costs, were gathered from the medical literature, except for the ADAMTS13 send-out and in-house tests, which were obtained from our institutional data. If only the cost is considered, in-house ADAMTS13 test for patients with intermediate- to high-risk PLASMIC score is the least expensive option ($4,732/patient). If effectiveness is assessed as measured by the number of averted deaths, send-out ADAMTS13 test is the most effective. Considering the cost/effectiveness ratio, the in-house ADAMTS13 test in patients with intermediate- to high-risk PLASMIC score is the best option, followed by the in-house ADAMTS13 test without the PLASMIC score. In patients with clinical presentations of TMAs, having an in-hospital ADAMTS13 test to promptly establish the diagnosis of TTP appears to be cost-effective. Utilizing the PLASMIC score further increases the cost-effectiveness of the in-house ADAMTS13 test. Our findings indicate the benefit of having a rapid and reliable in-house ADAMTS13 test, especially in the tertiary medical center. © 2017 AABB.

  6. The Effects of Using Selected Metacognitive Strategies on ACT Mathematics Sub-Test Scores

    ERIC Educational Resources Information Center

    LeMay, Jeffrey W.

    2016-01-01

    This quasi-experimental post-test only control group designed quantitative study examined whether or not members of an experimental group of participants who utilized two metacognitive strategy training regimens experienced a significant increase in their ACT mathematics sub-test scores compared to a group of students who did not utilize either of…

  7. Interpreting the "g" Loadings of Intelligence Test Composite Scores in Light of Spearman's Law of Diminishing Returns

    ERIC Educational Resources Information Center

    Reynolds, Matthew R.

    2013-01-01

    The linear loadings of intelligence test composite scores on a general factor ("g") have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the "g" loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of…

  8. School Readiness and the Draw-a-Man Test: An Empiricaly Derived Alternative to Harris' Scoring System.

    ERIC Educational Resources Information Center

    Simner, Marvin L.

    1985-01-01

    An abbreviated scoring system for the Goodenough-Harris Draw-A-Man Test found that three items had the same overall potential for correctly identifying at-risk kindergarteners as more time-consuming scoring methods. (CL)

  9. Economic impact of 21-gene recurrence score testing on early-stage breast cancer in Ireland.

    PubMed

    Smyth, Lillian; Watson, Geoff; Walsh, Elaine M; Kelly, Catherine M; Keane, Maccon; Kennedy, M John; Grogan, Liam; Hennessy, Bryan T; O'Reilly, Seamus; Coate, Linda E; O'Connor, Miriam; Quinn, Cecily; Verleger, Katharina; Schoeman, Olaf; O'Reilly, Susan; Walshe, Janice M

    2015-10-01

    The 21-gene test is a validated multi-gene diagnostic test that predicts chemotherapy (CT) benefit in oestrogen receptor positive (ER+), lymph node-negative (N0) breast cancer (BC) patients (pts). Ireland was the first public health care system to reimburse this test in Europe. Study objectives were to assess the impact of this test on decision-making and to analyse the economic impact of testing. Between October 2011 and February 2013, a national, retrospective, cross-sectional observational study of ER+, N0 BC pts tested with the 21-gene test was conducted. Surveyed breast medical oncologists, provided the assumption for the decision impact analysis that grade (G) 1 pts would not have received CT before testing and G2/3 pts would have received CT before testing. Descriptive statistical analyses were performed. 592 pts were identified; Low, intermediate and high recurrence score were identified in 53, 36 and 10 % pts, respectively. 384 (70 %) pts had G2, 129 (22 %) G3 and 76 (13 %) G1 tumours. Post testing, 345 pts (59 %) experienced a change in CT decision; 339 changed to hormone therapy alone and 6 advised to receive CT. 172 (30 %) pts received CT, 12 (3.9 %) of pts with low scores, 108 (50.9 %) of intermediate risk and 50 (90.9 %) of pts with high risk scores. Net reduction in CT use was 58 % and net savings achieved were €793,565. Since public reimbursement, the introduction of the 21-gene test has resulted in a significant reduction in chemotherapy administration and cost savings for the Irish public healthcare system.

  10. Association between the Medical College Admission Test scores and Alpha Omega Alpha Medical Honors Society membership.

    PubMed

    Gauer, Jacqueline L; Jackson, J Brooks

    2017-01-01

    Medical schools worldwide are faced with the challenge of selecting from among many qualified applicants. One factor that might help admissions committees identify future exceptional medical students is scores on standardized entrance exams. The purpose of this study was to determine the association between scores on the most commonly used standardized medical school entrance exam in the USA, the Medical College Admission Test (MCAT), and election to the US medical honors society, Alpha Omega Alpha (AOA). MCAT scores and AOA membership data were analyzed for all the students pursuing Doctor of Medicine degrees at the University of Minnesota Medical School and who graduated between 2012-2016 (n=1,309). An independent-samples t -test found a significant difference (t=6.132, p <0.001) in MCAT scores between those who were elected to AOA (n=179) and those who were not (n=1,130). On average, students who were elected to AOA had composite MCAT scores of 1.65 points higher than those who were not. Percentages of students elected to AOA gradually but inconsistently increased with MCAT score. No student who scored <27 on the MCAT was elected to AOA. Among students with MCAT scores at the 99th percentile or above (scores of ≥38), 13 of 48 (27.1%) were elected to AOA. Election to AOA during medical school was significantly associated with higher MCAT scores. Admissions committees should carefully consider the role of standardized entrance exam scores, in the context of a holistic review, when selecting for exceptional medical students.

  11. Association between the Medical College Admission Test scores and Alpha Omega Alpha Medical Honors Society membership

    PubMed Central

    Gauer, Jacqueline L; Jackson, J Brooks

    2017-01-01

    Introduction Medical schools worldwide are faced with the challenge of selecting from among many qualified applicants. One factor that might help admissions committees identify future exceptional medical students is scores on standardized entrance exams. The purpose of this study was to determine the association between scores on the most commonly used standardized medical school entrance exam in the USA, the Medical College Admission Test (MCAT), and election to the US medical honors society, Alpha Omega Alpha (AOA). Method MCAT scores and AOA membership data were analyzed for all the students pursuing Doctor of Medicine degrees at the University of Minnesota Medical School and who graduated between 2012–2016 (n=1,309). Results An independent-samples t-test found a significant difference (t=6.132, p<0.001) in MCAT scores between those who were elected to AOA (n=179) and those who were not (n=1,130). On average, students who were elected to AOA had composite MCAT scores of 1.65 points higher than those who were not. Percentages of students elected to AOA gradually but inconsistently increased with MCAT score. No student who scored <27 on the MCAT was elected to AOA. Among students with MCAT scores at the 99th percentile or above (scores of ≥38), 13 of 48 (27.1%) were elected to AOA. Discussion Election to AOA during medical school was significantly associated with higher MCAT scores. Admissions committees should carefully consider the role of standardized entrance exam scores, in the context of a holistic review, when selecting for exceptional medical students. PMID:28979178

  12. Two for One: Using QAR to Increase Reading Comprehension and Improve Test Scores

    ERIC Educational Resources Information Center

    Green, Susan

    2016-01-01

    This teaching tip describes an intervention used in a third-grade classroom implemented to help students pass an end-of-grade reading comprehension test. Low scores on a practice end-of-grade comprehension test prompted a re-examination of classroom reading instruction and a plan for intervention. This teaching tip describes the phases implemented…

  13. Estimating Teacher Effectiveness from Two-Year Changes in Students' Test Scores

    ERIC Educational Resources Information Center

    Leigh, Andrew

    2010-01-01

    Using a dataset covering over 10,000 Australian school teachers and over 90,000 pupils, I estimate how effective teachers are in raising students' test scores. Since the exams are biennial, it is necessary to take account of the teacher's work in the intervening year. Even adjusting for measurement error, the teacher fixed effects are widely…

  14. Can Tracking Raise the Test Scores of High-Ability Minority Students?

    ERIC Educational Resources Information Center

    Card, David; Giuliano, Laura

    2016-01-01

    We evaluate a tracking program in a large urban district where schools with at least one gifted fourth grader create a separate "gifted/high achiever" classroom. Most seats are filled by non-gifted high achievers, ranked by previous-year test scores. We study the program's effects on the high achievers using (1) a rank-based regression…

  15. Test-retest reliability and minimal detectable change scores for the timed "up & go" test, the six-minute walk test, and gait speed in people with Alzheimer disease.

    PubMed

    Ries, Julie D; Echternach, John L; Nof, Leah; Gagnon Blodgett, Michelle

    2009-06-01

    With the increasing incidence of Alzheimer disease (AD), determining the validity and reliability of outcome measures for people with this disease is necessary. The goals of this study were to assess test-retest reliability of data for the Timed "Up & Go" Test (TUG), the Six-Minute Walk Test (6MWT), and gait speed and to calculate minimal detectable change (MDC) scores for each outcome measure. Performance differences between groups with mild to moderate AD and moderately severe to severe AD (as determined by the Functional Assessment Staging [FAST] scale) were studied. This was a prospective, nonexperimental, descriptive methodological study. Background data collected for 51 people with AD included: use of an assistive device, Mini-Mental Status Examination scores, and FAST scale scores. Each participant engaged in 2 test sessions, separated by a 30- to 60-minute rest period, which included 2 TUG trials, 1 6MWT trial, and 2 gait speed trials using a computerized gait assessment system. A specific cuing protocol was followed to achieve optimal performance during test sessions. Test-retest reliability values for the TUG, the 6MWT, and gait speed were high for all participants together and for the mild to moderate AD and moderately severe to severe AD groups separately (intraclass correlation coefficients > or = .973); however, individual variability of performance also was high. Calculated MDC scores at the 90% confidence interval were: TUG=4.09 seconds, 6MWT=33.5 m (110 ft), and gait speed=9.4 cm/s. The 2 groups were significantly different in performance of clinical tests, with the participants who were more cognitively impaired being more physically and functionally impaired. A single researcher for data collection limited sample numbers and prohibited blinding to dementia level. The TUG, the 6MWT, and gait speed are reliable outcome measures for use with people with AD, recognizing that individual variability of performance is high. Minimal detectable change

  16. Estimation and test for linkage between markers: a comparison of lod score and χ (2) test in a linkage study of maritime pine (Pinus pinaster Ait.).

    PubMed

    Gerber, S; Rodolphe, F

    1994-06-01

    The first step in the construction of a linkage map involves the estimation and test for linkage between all possible pairs of markers. The lod score method is used in many linkage studies for the latter purpose. In contrast with classical statistical tests, this method does not rely on the choice of a first-type error level. We thus provide a comparison between the lod score and a χ (2) test on linkage data from a gymnosperm, the maritime pine. The lod score appears to be a very conservative test with the usual thresholds. Its severity depends on the type of data used.

  17. Validation of undergraduate medical student script concordance test (SCT) scores on the clinical assessment of the acute abdomen.

    PubMed

    Goos, Matthias; Schubach, Fabian; Seifert, Gabriel; Boeker, Martin

    2016-08-17

    Health professionals often manage medical problems in critical situations under time pressure and on the basis of vague information. In recent years, dual process theory has provided a framework of cognitive processes to assist students in developing clinical reasoning skills critical especially in surgery due to the high workload and the elevated stress levels. However, clinical reasoning skills can be observed only indirectly and the corresponding constructs are difficult to measure in order to assess student performance. The script concordance test has been established in this field. A number of studies suggest that the test delivers a valid assessment of clinical reasoning. However, different scoring methods have been suggested. They reflect different interpretations of the underlying construct. In this work we want to shed light on the theoretical framework of script theory and give an idea of script concordance testing. We constructed a script concordance test in the clinical context of "acute abdomen" and compared previously proposed scores with regard to their validity. A test comprising 52 items in 18 clinical scenarios was developed, revised along the guidelines and administered to 56 4(th) and 5(th) year medical students at the end of a blended-learning seminar. We scored the answers using five different scoring methods (distance (2×), aggregate (2×), single best answer) and compared the scoring keys, the resulting final scores and Cronbach's α after normalization of the raw scores. All scores except the single best answers calculation achieved acceptable reliability scores (>= 0.75), as measured by Cronbach's α. Students were clearly distinguishable from the experts, whose results were set to a mean of 80 and SD of 5 by the normalization process. With the two aggregate scoring methods, the students' means values were between 62.5 (AGGPEN) and 63.9 (AGG) equivalent to about three expert SD below the experts' mean value (Cronbach's α : 0.76 (AGGPEN

  18. The Effect of School Poverty on Racial Gaps in Tests Scores: The Case of the Minnesota Basic Standards Tests

    ERIC Educational Resources Information Center

    Myers, Samuel L.; Kim, Hyeoneui; Mandala, Cheryl

    2004-01-01

    A data from 1996,1998 and 1999 Minnesota comprehensive statewide testing on eight graders is used to analyze whether African American students perform worse than the white students who attend the poverty schools. The analyses conclude that African American-White test score gap is attributed more to the racial discriminations and racial treatments…

  19. What's in a Teacher Test? Assessing the Relationship between Teacher Licensure Test Scores and Student STEM Achievement and Course-Taking. CEDR Working Paper. WP #2016-11

    ERIC Educational Resources Information Center

    Goldhaber, Dan; Gratz, Trevor; Theobald, Roddy

    2016-01-01

    We investigate the relationship between teacher licensure test scores and student test achievement and high school course-taking. We focus on three subject/grade combinations-- middle school math, ninth-grade algebra and geometry, and ninth-grade biology--and find evidence that a teacher's basic skills test scores are modestly predictive of…

  20. Cognitive test scores in male adolescent cigarette smokers compared to non-smokers: a population-based study.

    PubMed

    Weiser, Mark; Zarka, Salman; Werbeloff, Nomi; Kravitz, Efrat; Lubin, Gad

    2010-02-01

    Although previous studies indicate that people with lower intelligence quotient (IQ) scores are more likely to become cigarette smokers, IQ scores of siblings discordant for smoking and of adolescents who began smoking between ages 18-21 years have not been studied systematically. Each year a random sample of Israeli military recruits complete a smoking questionnaire. Cognitive functioning is assessed by the military using standardized tests equivalent to IQ. Of 20 221 18-year-old males, 28.5% reported smoking at least one cigarette a day (smokers). An unadjusted comparison found that smokers scored 0.41 effect sizes (ES, P < 0.001) lower than non-smokers; adjusted analyses remained significant (adjusted ES = 0.27, P < 0.001). Adolescents smoking one to five, six to 10, 11-20 and 21+ cigarettes/day had cognitive test scores 0.14, 0.22, 0.33 and 0.5 adjusted ES poorer than those of non-smokers (P < 0.001). Adolescents who did not smoke by age 18, and then began to smoke between ages 18-21 had lower cognitive test scores compared to never-smokers (adjusted ES = 0.14, P < 0.001). An analysis of brothers discordant for smoking found that smoking brothers had lower cognitive scores than non-smoking brothers (adjusted ES = 0.27; P = 0.014). Controlled analyses from this large population-based cohort of male adolescents indicate that IQ scores are lower in male adolescents who smoke compared to non-smokers and in brothers who smoke compared to their non-smoking brothers. The IQs of adolescents who began smoking between ages 18-21 are lower than those of non-smokers. Adolescents with poorer IQ scores might be targeted for programmes designed to prevent smoking.

  1. Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

    ERIC Educational Resources Information Center

    Han, Chao

    2016-01-01

    As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…

  2. The reliability and validity of qualitative scores for the Controlled Oral Word Association Test.

    PubMed

    Ross, Thomas P; Calhoun, Emily; Cox, Tara; Wenner, Carolyn; Kono, Whitney; Pleasant, Morgan

    2007-05-01

    The reliability and validity of two qualitative scoring systems for the Controlled Oral Word Association Test [Benton, A. L., Hamsher, de S. K., & Sivan, A. B. (1983). Multilingual aplasia examination (2nd ed.). Iowa City, IA: AJA Associates] were examined in 108 healthy young adults. The scoring systems developed by Troyer et al. [Troyer, A. K., Moscovich, M., & Winocur, G. (1997). Clustering and switching as two components of verbal fluency: Evidence from younger and older healthy adults. Neuropsychology, 11, 138-146] and by Abwender et al. [Abwender, D. A., Swan, J. G., Bowerman, J. T., & Connolly, S. W. (2001a). Qualitative analysis of verbal fluency output: Review and comparison of several scoring methods. Assessment, 8, 323-336] each demonstrated excellent interrater reliability (all indices at or above r(icc)=.9). Consistent with previous research [e.g., Ross, T. P. (2003). The reliability of cluster and switch scores for the COWAT. Archives of Clinical Psychology, 18, 153-164), test-retest reliability coefficients (N=53; M interval 44.6 days) for the qualitative scores were modest to poor (r(icc)=.6 to .4 range). Correlations among COWAT scores, measures of executive functioning, verbal learning, working memory, and vocabulary were examined. The idea that qualitative scores represent distinct executive functions such as cognitive flexibility or strategy utilization was not supported. We offer the interpretation that COWAT performance may require the ability to retrieve words in a non-routine manner while suppressing habitual responses and associated processing interference, presumably due to a spread of activation across semantic or lexical networks. This interpretation, though speculative at present, implies that clustering and switching on the COWAT may not be entirely deliberate, but rather an artifact of a passive (i.e., state-dependent) process. Ideas for future research, most noticeably experimental studies using cognitive methods (e.g., priming), are

  3. Student Test Scores: How the Sausage Is Made and Why You Should Care. Evidence Speaks Reports, Vol 1, #25

    ERIC Educational Resources Information Center

    Jacob, Brian A.

    2016-01-01

    Contrary to popular belief, modern cognitive assessments--including the new Common Core tests--produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. While there are good reasons for this, it means that reported test scores depend on many decisions made by test designers,…

  4. Exploring Validity of Computer-Based Test Scores with Examinees' Response Behaviors and Response Times

    ERIC Educational Resources Information Center

    Sahin, Füsun

    2017-01-01

    Examining the testing processes, as well as the scores, is needed for a complete understanding of validity and fairness of computer-based assessments. Examinees' rapid-guessing and insufficient familiarity with computers have been found to be major issues that weaken the validity arguments of scores. This study has three goals: (a) improving…

  5. Target-specific support vector machine scoring in structure-based virtual screening: computational validation, in vitro testing in kinases, and effects on lung cancer cell proliferation.

    PubMed

    Li, Liwei; Khanna, May; Jo, Inha; Wang, Fang; Ashpole, Nicole M; Hudmon, Andy; Meroueh, Samy O

    2011-04-25

    We assess the performance of our previously reported structure-based support vector machine target-specific scoring function across 41 targets, 40 among them from the Directory of Useful Decoys (DUD). The area under the curve of receiver operating characteristic plots (ROC-AUC) revealed that scoring with SVM-SP resulted in consistently better enrichment over all target families, outperforming Glide and other scoring functions, most notably among kinases. In addition, SVM-SP performance showed little variation among protein classes, exhibited excellent performance in a test case using a homology model, and in some cases showed high enrichment even with few structures used to train a model. We put SVM-SP to the test by virtual screening 1125 compounds against two kinases, EGFR and CaMKII. Among the top 25 EGFR compounds, three compounds (1-3) inhibited kinase activity in vitro with IC₅₀ of 58, 2, and 10 μM. In cell cultures, compounds 1-3 inhibited nonsmall cell lung carcinoma (H1299) cancer cell proliferation with similar IC₅₀ values for compound 3. For CaMKII, one compound inhibited kinase activity in a dose-dependent manner among 20 tested with an IC₅₀ of 48 μM. These results are encouraging given that our in-house library consists of compounds that emerged from virtual screening of other targets with pockets that are different from typical ATP binding sites found in kinases. In light of the importance of kinases in chemical biology, these findings could have implications in future efforts to identify chemical probes of kinases within the human kinome.

  6. The effect of instructional methodology on high school students natural sciences standardized tests scores

    NASA Astrophysics Data System (ADS)

    Powell, P. E.

    Educators have recently come to consider inquiry based instruction as a more effective method of instruction than didactic instruction. Experience based learning theory suggests that student performance is linked to teaching method. However, research is limited on inquiry teaching and its effectiveness on preparing students to perform well on standardized tests. The purpose of the study to investigate whether one of these two teaching methodologies was more effective in increasing student performance on standardized science tests. The quasi experimental quantitative study was comprised of two stages. Stage 1 used a survey to identify teaching methods of a convenience sample of 57 teacher participants and determined level of inquiry used in instruction to place participants into instructional groups (the independent variable). Stage 2 used analysis of covariance (ANCOVA) to compare posttest scores on a standardized exam by teaching method. Additional analyses were conducted to examine the differences in science achievement by ethnicity, gender, and socioeconomic status by teaching methodology. Results demonstrated a statistically significant gain in test scores when taught using inquiry based instruction. Subpopulation analyses indicated all groups showed improved mean standardized test scores except African American students. The findings benefit teachers and students by presenting data supporting a method of content delivery that increases teacher efficacy and produces students with a greater cognition of science content that meets the school's mission and goals.

  7. The validity of ACT-PEP test scores for predicting academic performance of registered nurses in BSN programs.

    PubMed

    Yang, J C; Noble, J

    1990-01-01

    This study investigated the validity of three American College Testing-Proficiency Examination Program (ACT-PEP) tests (Maternal and Child Nursing, Psychiatric/Mental Health Nursing, Adult Nursing) for predicting the academic performance of registered nurses (RNs) enrolled in bachelor's degree BSN programs nationwide. This study also examined RN students' performance on the ACT-PEP tests by their demographic characteristics: student's age, sex, race, student status (full- or part-time), and employment status (full- or part-time). The total sample for the three tests comprised 2,600 students from eight institutions nationwide. The median correlation coefficients between the three ACT-PEP tests and the semester grade point averages ranged from .36 to .56. Median correlation coefficients increased over time, supporting the stability of ACT-PEP test scores for predicting academic performance over time. The relative importance of selected independent variables for predicting academic performance was also examined; the most important variable for predicting academic performance was typically the ACT-PEP test score. Across the institutions, student demographic characteristics did not contribute significantly to explaining academic performance, over and above ACT-PEP scores.

  8. Maximal exercise testing variables and 10-year survival: fitness risk score derivation from the FIT Project.

    PubMed

    Ahmed, Haitham M; Al-Mallah, Mouaz H; McEvoy, John W; Nasir, Khurram; Blumenthal, Roger S; Jones, Steven R; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J

    2015-03-01

    To determine which routinely collected exercise test variables most strongly correlate with survival and to derive a fitness risk score that can be used to predict 10-year survival. This was a retrospective cohort study of 58,020 adults aged 18 to 96 years who were free of established heart disease and were referred for an exercise stress test from January 1, 1991, through May 31, 2009. Demographic, clinical, exercise, and mortality data were collected on all patients as part of the Henry Ford ExercIse Testing (FIT) Project. Cox proportional hazards models were used to identify exercise test variables most predictive of survival. A "FIT Treadmill Score" was then derived from the β coefficients of the model with the highest survival discrimination. The median age of the 58,020 participants was 53 years (interquartile range, 45-62 years), and 28,201 (49%) were female. Over a median of 10 years (interquartile range, 8-14 years), 6456 patients (11%) died. After age and sex, peak metabolic equivalents of task and percentage of maximum predicted heart rate achieved were most highly predictive of survival (P<.001). Subsequent addition of baseline blood pressure and heart rate, change in vital signs, double product, and risk factor data did not further improve survival discrimination. The FIT Treadmill Score, calculated as [percentage of maximum predicted heart rate + 12(metabolic equivalents of task) - 4(age) + 43 if female], ranged from -200 to 200 across the cohort, was near normally distributed, and was found to be highly predictive of 10-year survival (Harrell C statistic, 0.811). The FIT Treadmill Score is easily attainable from any standard exercise test and translates basic treadmill performance measures into a fitness-related mortality risk score. The FIT Treadmill Score should be validated in external populations. Copyright © 2015 Mayo Foundation for Medical Education and Research. Published by Elsevier Inc. All rights reserved.

  9. Relationships between the handball-specific complex test, non-specific field tests and the match performance score in elite professional handball players.

    PubMed

    Hermassi, Souhail; Chelly, Mohamed-Souhaiel; Wollny, Rainer; Hoffmeyer, Birgit; Fieseler, Georg; Schulze, Stephan; Irlenbusch, Lars; Delank, Karl-Stefan; Shephard, Roy J; Bartels, Thomas; Schwesig, René

    2018-06-01

    This study assessed the validity of the handball-specific complex test (HBCT) and two non-specific field tests in professional elite handball athletes, using the match performance score (MPS) as the gold standard of performance. Thirteen elite male handball players (age: 27.4±4.8 years; premier German league) performed the HBCT, the Yo-Yo Intermittent Recovery (YYIR) test and a repeated shuttle sprint ability (RSA) test at the beginning of pre-season training. The RSA results were evaluated in terms of best time, total time, and fatigue decrement. Heart rates (HR) were assessed at selected times throughout all tests; the recovery HR was measured immediately post-test and 10 minutes later. The match performance score was based on various handball specific parameters (e.g., field goals, assists, steals, blocks, and technical mistakes) as seen during all matches of the immediately subsequent season (2015/2016). The parameters of run 1, run 2, and HR recovery at minutes 6 and 10 of the RSA test all showed a variance of more than 10% (range: 11-15%). However, the variance of scores for the YYIR test was much smaller (range: 1-7%). The resting HR (r2=0.18), HR recovery at minute 10 (r2=0.10), lactate concentration at rest (r2=0.17), recovery of heart rate from 0 to 10 minutes (r2=0.15), and velocity of second throw at first trial (r2=0.37) were the most valid HBCT parameters. Much effort is necessary to assess MPS and to develop valid tests. Speed and the rate of functional recovery seem the best predictors of competitive performance for elite handball players.

  10. Automated Essay Scoring versus Human Scoring: A Comparative Study

    ERIC Educational Resources Information Center

    Wang, Jinhao; Brown, Michelle Stallone

    2007-01-01

    The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…

  11. College Math Assessment: SAT Scores vs. College Math Placement Scores

    ERIC Educational Resources Information Center

    Foley-Peres, Kathleen; Poirier, Dawn

    2008-01-01

    Many colleges and university's use SAT math scores or math placement tests to place students in the appropriate math course. This study compares the use of math placement scores and SAT scores for 188 freshman students. The student's grades and faculty observations were analyzed to determine if the SAT scores and/or college math assessment scores…

  12. A Comparison of Scores on the WISC-R and Lorge-Thorndike Intelligence Test for Disadvantaged Black Elementary School Children

    ERIC Educational Resources Information Center

    Lowe, James D.; Karnes, Frances A.

    1976-01-01

    It is indicated that, although the scores [obtained on both tests] are significantly correlated, the tests yield significantly different scores with the Lorge-Thorndike consistently overestimating the WISC-R full scale I.Q. (Author)

  13. The Effect of Four Intervention Programs on Standardized Test Scores by Gender

    ERIC Educational Resources Information Center

    Cryder, Rebecca E.

    2012-01-01

    This quantitative correlational study involved the analysis, by gender, of the effect of four intervention programs at an Arizona middle school as seen on Arizona's Instrument to Measure Standards (AIMS) test scores. These four intervention programs included: Advancement Via Individual Determination (AVID), a planner stamping system, a World…

  14. International Test Score Comparisons and Educational Policy: A Review of the Critiques

    ERIC Educational Resources Information Center

    Carnoy, Martin

    2015-01-01

    Stanford education professor Martin Carnoy examines four main critiques of how international test results are used in policymaking. Of particular interest are critiques of the policy analyses published by the Program for International Student Assessment (PISA). Using average PISA scores as a comparative measure of student achievement is misleading…

  15. Using the EZ-Diffusion Model to Score a Single-Category Implicit Association Test of Physical Activity

    PubMed Central

    Rebar, Amanda L.; Ram, Nilam; Conroy, David E.

    2014-01-01

    Objective The Single-Category Implicit Association Test (SC-IAT) has been used as a method for assessing automatic evaluations of physical activity, but measurement artifact or consciously-held attitudes could be confounding the outcome scores of these measures. The objective of these two studies was to address these measurement concerns by testing the validity of a novel SC-IAT scoring technique. Design Study 1 was a cross-sectional study, and study 2 was a prospective study. Method In study 1, undergraduate students (N = 104) completed SC-IATs for physical activity, flowers, and sedentary behavior. In study 2, undergraduate students (N = 91) completed a SC-IAT for physical activity, self-reported affective and instrumental attitudes toward physical activity, physical activity intentions, and wore an accelerometer for two weeks. The EZ-diffusion model was used to decompose the SC-IAT into three process component scores including the information processing efficiency score. Results In study 1, a series of structural equation model comparisons revealed that the information processing score did not share variability across distinct SC-IATs, suggesting it does not represent systematic measurement artifact. In study 2, the information processing efficiency score was shown to be unrelated to self-reported affective and instrumental attitudes toward physical activity, and positively related to physical activity behavior, above and beyond the traditional D-score of the SC-IAT. Conclusions The information processing efficiency score is a valid measure of automatic evaluations of physical activity. PMID:25484621

  16. The Apgar score has survived the test of time.

    PubMed

    Finster, Mieczyslaw; Wood, Margaret

    2005-04-01

    In 1953, Virginia Apgar, M.D. published her proposal for a new method of evaluation of the newborn infant. The avowed purpose of this paper was to establish a simple and clear classification of newborn infants which can be used to compare the results of obstetric practices, types of maternal pain relief and the results of resuscitation. Having considered several objective signs pertaining to the condition of the infant at birth she selected five that could be evaluated and taught to the delivery room personnel without difficulty. These signs were heart rate, respiratory effort, reflex irritability, muscle tone and color. Sixty seconds after the complete birth of the baby a rating of zero, one or two was given to each sign, depending on whether it was absent or present. Virginia Apgar reviewed anesthesia records of 1025 infants born alive at Columbia Presbyterian Medical Center during the period of this report. All had been rated by her method. Infants in poor condition scored 0-2, infants in fair condition scored 3-7, while scores 8-10 were achieved by infants in good condition. The most favorable score 1 min after birth was obtained by infants delivered vaginally with the occiput the presenting part (average 8.4). Newborns delivered by version and breech extraction had the lowest score (average 6.3). Infants delivered by cesarean section were more vigorous (average score 8.0) when spinal was the method of anesthesia versus an average score of 5.0 when general anesthesia was used. Correlating the 60 s score with neonatal mortality, Virginia found that mature infants receiving 0, 1 or 2 scores had a neonatal death rate of 14%; those scoring 3, 4, 5, 6 or 7 had a death rate of 1.1%; and those in the 8-10 score group had a death rate of 0.13%. She concluded that the prognosis of an infant is excellent if he receives one of the upper three scores, and poor if one of the lowest three scores.

  17. Implications for social policy of variability in racial groups.

    PubMed

    Helms, Janet E

    2008-11-01

    Social policy and federal and state legislation require the use of single cut scores when tests of cognitive ability, knowledge, or skills (CAKS) are used to make high-stakes assessment decisions, such as whether students or employees may be promoted. Rationales offered for the requirement are that cut scores provide objective standards and are fairer than using subjective criteria, such as racial group membership. It is argued that failure to consider threats to statistical conclusion validity, such as differences in variability between groups, obscures the differential impact of using a common cut score as the basis for highstakes decisions. Analyses of 40 Black and White samples revealed that (a) Whites might be considerably advantaged and Blacks might be considerably disadvantaged by the same cut score and (b) depending on where the cut score is set, decisions based on ratios of numbers of Whites numbers of Blacks might be fairer than use of CAKS test cut scores. Implications for assessment practice and social policy are discussed.

  18. Performance of EuroSCORE II in a large US database: implications for transcatheter aortic valve implantation.

    PubMed

    Osnabrugge, Ruben L; Speir, Alan M; Head, Stuart J; Fonner, Clifford E; Fonner, Edwin; Kappetein, A Pieter; Rich, Jeffrey B

    2014-09-01

    Validation studies of European system for cardiac operative risk evaluation II (EuroSCORE II) have been limited to European datasets. Therefore, the aims of this study were to assess the performance of EuroSCORE II in a large multicentre US database, and compare it with the Society of Thoracic Surgeons Predicted Risk of Mortality (STS-PROM). In addition, implications for patient selection for transcatheter aortic valve implantation (TAVI) were explored. EuroSCORE II and the STS-PROM were calculated for 50 588 patients from a multi-institutional statewide database of all cardiac surgeries performed since 2003. Model performance was assessed using the area under the receiver operator curve (AUC), observed vs expected (O:E) ratios and calibration plots. Analyses were performed for isolated coronary artery bypass grafting (CABG) (n = 40 871), aortic valve replacement (AVR) (n = 4107), AVR + CABG (n = 3480), mitral valve (MV) replacement (n = 1071) and MV repair (n = 1059). The overall in-hospital mortality rate was 2.1%. EuroSCORE II was outperformed by the STS-PROM in the overall cohort with regard to discrimination (AUC = 0.77 vs 0.81, respectively; P < 0.001) and calibration (O:E = 0.68 vs 0.80, respectively). Discrimination for CABG was worse with EuroSCORE II (AUC = 0.77 vs STS-PROM: 0.81, P < 0.001). For other procedures discrimination was similar: AVR (AUC = 0.71 vs STS-PROM: 0.74, P = 0.40), AVR + CABG (AUC = 0.72 vs STS-PROM: 0.74, P = 0.47), MV repair (AUC = 0.82 vs STS-PROM: 0.86, P = 0.55) and MV replacement (AUC = 0.78 vs STS-PROM: 0.79, P = 0.69). Calibration of EuroSCORE II was worse for CABG (O:E = 0.68 vs STS-PROM: 0.80), similar in AVR + CABG (O:E = 0.76 vs STS-PROM: 0.70) and MV repair (O:E = 0.64 vs STS-PROM: 0.67), while EuroSCORE II may be more accurate in AVR (O:E = 0.96 vs STS-PROM: 0.76). Performance of both models improved when only recent cases (after 1 January 2008) were used. Ongoing TAVI trials aimed at patients with an estimated 4

  19. Methods for Improving Test Scores: The Good, the Bad, and the Ugly

    ERIC Educational Resources Information Center

    Wright, Robert J.

    2009-01-01

    The No Child Left Behind Act (NCLB 2001) has the faculties of every public and charter school scrambling to drive test scores of seven identified groups of children (African-American children, Anglo-White children, children with disabilities, Hispanic children, children of poverty, children with English language limitations, and Native-American…

  20. An approach to analyzing a single subject's scores obtained in a standardized test with application to the Aachen Aphasia Test (AAT).

    PubMed

    Willmes, K

    1985-08-01

    Methods for the analysis of a single subject's test profile(s) proposed by Huber (1973) are applied to the Aachen Aphasia Test (AAT). The procedures are based on the classical test theory model (Lord & Novick, 1968) and are suited for any (achievement) test with standard norms from a large standardization sample and satisfactory reliability estimates. Two test profiles of a Wernicke's aphasic, obtained before and after a 3-month period of speech therapy, are analyzed using inferential comparisons between (groups of) subtest scores on one test application and between two test administrations for single (groups of) subtests. For each of these comparisons, the two aspects of (i) significant (reliable) differences in performance beyond measurement error and (ii) the diagnostic validity of that difference in the reference population of aphasic patients are assessed. Significant differences between standardized subtest scores and a remarkably better preserved reading and writing ability could be found for both test administrations using the multiple test procedure of Holm (1979). Comparison of both profiles revealed an overall increase in performance for each subtest as well as changes in level of performance relations between pairs of subtests.

  1. From Test Scores to Language Use: Emergent Bilinguals Using English to Accomplish Academic Tasks

    ERIC Educational Resources Information Center

    Rodriguez-Mojica, Claudia

    2018-01-01

    Prominent discourses about emergent bilinguals' academic abilities tend to focus on performance as measured by test scores and perpetuate the message that emergent bilinguals trail far behind their peers. When we remove the constraints of formal testing situations, what can emergent bilinguals do in English as they engage in naturally occurring…

  2. Comparison of Standardized Test Scores from Traditional Classrooms and Those Using Problem-Based Learning

    ERIC Educational Resources Information Center

    Needham, Martha Elaine

    2010-01-01

    This research compares differences between standardized test scores in problem-based learning (PBL) classrooms and a traditional classroom for 6th grade students using a mixed-method, quasi-experimental and qualitative design. The research shows that problem-based learning is as effective as traditional teaching methods on standardized tests. The…

  3. Co-Educational Tutorial Classes and Their Significance on Gendered Test Scores of Wollo University Students: A Before-After Analyses

    ERIC Educational Resources Information Center

    Gidey, Mu'uz

    2015-01-01

    This action research is carried out in a practical class room setting to devise an innovative way of administering tutorial classes to improve students' learning competence with particular reference to gendered test scores. A before-after test score analyses of mean and standard deviations along with t-statistical tests of hypotheses of second…

  4. CK-MM Polymorphism is Associated With Physical Fitness Test Scores in Military Recruits.

    PubMed

    Sprouse, Courtney; Tosi, Laura L; Gordish-Dressman, Heather; Abdel-Ghani, Mai S; Panchapakesan, Karuna; Niederberger, Brenda; Devaney, Joseph M; Kelly, Karen R

    2015-09-01

    Muscle-specific creatine kinase is thought to play an integral role in maintaining energy homeostasis by providing a supply of creatine phosphate. The genetic variant, rs8111989, contributes to individual differences in physical performance, and thus the purpose of this study was to determine if rs8111989 variant is predictive of Physical Fitness Test (PFT) scores in male, military infantry recruits. DNA was extracted from whole blood, and genotyping was performed in 176 Marines. Relationships between PFT measures (run, sit-ups, and pull-ups) and genotype were determined. Participants with 2 copies of the T allele for rs8111989 variant had higher PFT scores for run time, pull-ups, and total PFT score. Specifically, participants with 2 copies of the TT allele (variant) (n = 97) demonstrated an overall higher total PFT score as compared with those with one copy of the C allele (n = 79) (TT: 250 ± 31 vs. 238 ± 31; p = 0.02), run score (TT: 82 ± 10 vs. 78 ± 11; p = 0.04) and pull-up score (TT: 78 ± 11 vs. 65 ± 21; p = 0.04) or those with the CC/CT genotype. These results demonstrate an association between physical performance measures and genetic variation in the muscle-specific creatine kinase gene (rs8111989). Reprint & Copyright © 2015 Association of Military Surgeons of the U.S.

  5. Linking U.S. School District Test Score Distributions to a Common Scale. CEPA Working Paper No. 16-09

    ERIC Educational Resources Information Center

    Reardon, Sean F.; Kalogrides, Demetra; Ho, Andrew D.

    2017-01-01

    There is no comprehensive database of U.S. district-level test scores that is comparable across states. We describe and evaluate a method for constructing such a database. First, we estimate linear, reliability-adjusted linking transformations from state test score scales to the scale of the National Assessment of Educational Progress (NAEP). We…

  6. The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses

    ERIC Educational Resources Information Center

    Walstad, William B.; Wagner, Jamie

    2016-01-01

    This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…

  7. A Guide for Setting the Cut-Scores to Minimize Weighted Classification Errors in Test Batteries

    ERIC Educational Resources Information Center

    Grabovsky, Irina; Wainer, Howard

    2017-01-01

    In this article, we extend the methodology of the Cut-Score Operating Function that we introduced previously and apply it to a testing scenario with multiple independent components and different testing policies. We derive analytically the overall classification error rate for a test battery under the policy when several retakes are allowed for…

  8. Participation in a coteaching classroom and students' end-of-course test scores

    NASA Astrophysics Data System (ADS)

    Debro, Ava

    General education students consistently perform poorly on standardized science tests. Coteaching is an instructional strategy that improves the achievement of students with disabilities, but very little research exists that examines the effect of coteaching classrooms on the performance of general education students. The purpose of this study was to examine the effect of coteaching classrooms on the performance of general education students. The constructivist theoretical framework provided the foundation for this research. The research question examined the effect that coteaching classrooms had on the performance of general education biology students. In this experimental design utilizing a posttest-only control group, coteaching instructional strategy was the treatment, and student performance was measured using the scores obtained from the biology end-of-course test. Data for this study was analyzed using an independent t-test. The results of this study revealed that there was not a statistically significant difference in student performance on the biology end-of-course test between treatment and control groups. More than half of the general education biology students enrolled in coteaching classrooms failed the end-of-course test. Researchers may use this study as a catalyst to examine other instructional practices that may improve student performance in science courses. The results of this study may be used to persuade coteachers of the importance of attending frequent professional development opportunities that examine a variety of coteaching instructional strategies. Improving the performance of general education students in science may improve standardized test scores, afford more students the opportunity to attend college, and ensure that students are able to compete on a global level.

  9. Polygenic Risk Score for Alzheimer's Disease: Implications for Memory Performance and Hippocampal Volumes in Early Life.

    PubMed

    Axelrud, Luiza K; Santoro, Marcos L; Pine, Daniel S; Talarico, Fernanda; Gadelha, Ary; Manfro, Gisele G; Pan, Pedro M; Jackowski, Andrea; Picon, Felipe; Brietzke, Elisa; Grassi-Oliveira, Rodrigo; Bressan, Rodrigo A; Miguel, Eurípedes C; Rohde, Luis A; Hakonarson, Hakon; Pausova, Zdenka; Belangero, Sintia; Paus, Tomas; Salum, Giovanni A

    2018-06-01

    Alzheimer's disease is a heritable neurodegenerative disorder in which early-life precursors may manifest in cognition and brain structure. The authors evaluate this possibility by examining, in youths, associations among polygenic risk score for Alzheimer's disease, cognitive abilities, and hippocampal volume. Participants were children 6-14 years of age in two Brazilian cities, constituting the discovery (N=364) and replication samples (N=352). As an additional replication, data from a Canadian sample (N=1,029), with distinct tasks, MRI protocol, and genetic risk, were included. Cognitive tests quantified memory and executive function. Reading and writing abilities were assessed by standardized tests. Hippocampal volumes were derived from the Multiple Automatically Generated Templates (MAGeT) multi-atlas segmentation brain algorithm. Genetic risk for Alzheimer's disease was quantified using summary statistics from the International Genomics of Alzheimer's Project. Analyses showed that for the Brazilian discovery sample, each one-unit increase in z-score for Alzheimer's polygenic risk score significantly predicted a 0.185 decrement in z-score for immediate recall and a 0.282 decrement for delayed recall. Findings were similar for the Brazilian replication sample (immediate and delayed recall, β=-0.259 and β=-0.232, both significant). Quantile regressions showed lower hippocampal volumes bilaterally for individuals with high polygenic risk scores. Associations fell short of significance for the Canadian sample. Genetic risk for Alzheimer's disease may affect early-life cognition and hippocampal volumes, as shown in two independent samples. These data support previous evidence that some forms of late-life dementia may represent developmental conditions with roots in childhood. This result may vary depending on a sample's genetic risk and may be specific to some types of memory tasks.

  10. Raise Test Scores without Selling Your Soul: An Interview with Scott Mandel

    ERIC Educational Resources Information Center

    Curriculum Review, 2006

    2006-01-01

    With his 10th book, Improving Test Scores: A Practical Approach for Teachers and Administrators, Scott Mandel outlines steps educators can take to boost achievement on standardized exams while maintaining the integrity of their day-to-day teaching. Mandel, who holds a Ph.D. in curriculum and instruction from USC, teaches history and English at…

  11. Linear score tests for variance components in linear mixed models and applications to genetic association studies.

    PubMed

    Qu, Long; Guennel, Tobias; Marshall, Scott L

    2013-12-01

    Following the rapid development of genome-scale genotyping technologies, genetic association mapping has become a popular tool to detect genomic regions responsible for certain (disease) phenotypes, especially in early-phase pharmacogenomic studies with limited sample size. In response to such applications, a good association test needs to be (1) applicable to a wide range of possible genetic models, including, but not limited to, the presence of gene-by-environment or gene-by-gene interactions and non-linearity of a group of marker effects, (2) accurate in small samples, fast to compute on the genomic scale, and amenable to large scale multiple testing corrections, and (3) reasonably powerful to locate causal genomic regions. The kernel machine method represented in linear mixed models provides a viable solution by transforming the problem into testing the nullity of variance components. In this study, we consider score-based tests by choosing a statistic linear in the score function. When the model under the null hypothesis has only one error variance parameter, our test is exact in finite samples. When the null model has more than one variance parameter, we develop a new moment-based approximation that performs well in simulations. Through simulations and analysis of real data, we demonstrate that the new test possesses most of the aforementioned characteristics, especially when compared to existing quadratic score tests or restricted likelihood ratio tests. © 2013, The International Biometric Society.

  12. Maintenance of Wakefulness Test scores and driving performance in sleep disorder patients and controls.

    PubMed

    Philip, Pierre; Chaufton, Cyril; Taillard, Jacques; Sagaspe, Patricia; Léger, Damien; Raimondi, Monika; Vakulin, Andrew; Capelli, Aurore

    2013-08-01

    Sleepiness at the wheel is a risk factor for traffic accidents. Past studies have demonstrated the validity of the Maintenance of Wakefulness Test (MWT) scores as a predictor of driving impairment in untreated patients with obstructive sleep apnea syndrome (OSAS), but there is limited information on the validity of the maintenance of wakefulness test by MWT in predicting driving impairment in patients with hypersomnias of central origin (narcolepsy or idiopathic hypersomnia). The aim of this study was to compare the MWT scores with driving performance in sleep disorder patients and controls. 19 patients suffering from hypersomnias of central origin (9 narcoleptics and 10 idiopathic hypersomnia), 17 OSAS patients and 14 healthy controls performed a MWT (4×40-minute trials) and a 40-minute driving session on a real car driving simulator. Participants were divided into 4 groups defined by their MWT sleep latency scores. The groups were pathological (sleep latency 0-19 min), intermediate (20-33 min), alert (34-40 min) and control (>34 min). The main driving performance outcome was the number of inappropriate line crossings (ILCs) during the 40 minute drive test. Patients with pathological MWT sleep latency scores (0-19 min) displayed statistically significantly more ILC than patients from the intermediate, alert and control groups (F (3, 46)=7.47, p<0.001). Pathological sleep latencies on the MWT predicted driving impairment in patients suffering from hypersomnias of central origin as well as in OSAS patients. MWT is an objective measure of daytime sleepiness that appears to be useful in estimating the driving performance in sleepy patients. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Self Adapted Testing as Formative Assessment: Effects of Feedback and Scoring on Engagement and Performance

    ERIC Educational Resources Information Center

    Arieli-Attali, Meirav

    2016-01-01

    This dissertation investigated the feasibility of self-adapted testing (SAT) as a formative assessment tool with the focus on learning. Under two different orientation goals--to excel on a test (performance goal) or to learn from the test (learning goal)--I examined the effect of different scoring rules provided as interactive feedback, on test…

  14. Do Neurocognitive SCAT3 Baseline Test Scores Differ Between Footballers (Soccer) Living With and Without Disability? A Cross-Sectional Study.

    PubMed

    Weiler, Richard; van Mechelen, Willem; Fuller, Colin; Ahmed, Osman Hassan; Verhagen, Evert

    2018-01-01

    To determine if baseline Sport Concussion Assessment Tool, third Edition (SCAT3) scores differ between athletes with and without disability. Cross-sectional comparison of preseason baseline SCAT3 scores for a range of England international footballers. Team doctors and physiotherapists supporting England football teams recorded players' SCAT 3 baseline tests from August 1, 2013 to July 31, 2014. A convenience sample of 249 England footballers, of whom 185 were players without disability (male: 119; female: 66) and 64 were players with disability (male learning disability: 17; male cerebral palsy: 28; male blind: 10; female deaf: 9). Between-group comparisons of median SCAT3 total and section scores were made using nonparametric Mann-Whitney-Wilcoxon ranked-sum test. All footballers with disability scored higher symptom severity scores compared with male players without disability. Male footballers with learning disability demonstrated no significant difference in the total number of symptoms, but recorded significantly lower scores on immediate memory and delayed recall compared with male players without disability. Male blind footballers' scored significantly higher for total concentration and delayed recall, and male footballers with cerebral palsy scored significantly higher on balance testing and immediate memory, when compared with male players without disability. Female footballers with deafness scored significantly higher for total concentration and balance testing than female footballers without disability. This study suggests that significant differences exist between SCAT3 baseline section scores for footballers with and without disability. Concussion consensus guidelines should recognize these differences and produce guidelines that are specific for the growing number of athletes living with disability.

  15. See It, Be It, Write It: Using Performing Arts to Improve Writing Skills and Test Scores

    ERIC Educational Resources Information Center

    Blecher-Sass, Hope Sara; Moffitt, Maryellen

    2010-01-01

    Improve students' writing skills and boost their assessment scores while adding arts education, creativity, and fun to your writing curriculum. With this vibrant resource, improving writing skills goes hand-in-hand with improving test scores. Students learn how to use acting and visualization as prewriting activities to help them connect writing…

  16. Improving Personality Facet Scores with Multidimensional Computer Adaptive Testing: An Illustration with the Neo Pi-R

    ERIC Educational Resources Information Center

    Makransky, Guido; Mortensen, Erik Lykke; Glas, Cees A. W.

    2013-01-01

    Narrowly defined personality facet scores are commonly reported and used for making decisions in clinical and organizational settings. Although these facets are typically related, scoring is usually carried out for a single facet at a time. This method can be ineffective and time consuming when personality tests contain many highly correlated…

  17. Are students' impressions of improved learning through active learning methods reflected by improved test scores?

    PubMed

    Everly, Marcee C

    2013-02-01

    To report the transformation from lecture to more active learning methods in a maternity nursing course and to evaluate whether student perception of improved learning through active-learning methods is supported by improved test scores. The process of transforming a course into an active-learning model of teaching is described. A voluntary mid-semester survey for student acceptance of the new teaching method was conducted. Course examination results, from both a standardized exam and a cumulative final exam, among students who received lecture in the classroom and students who had active learning activities in the classroom were compared. Active learning activities were very acceptable to students. The majority of students reported learning more from having active-learning activities in the classroom rather than lecture-only and this belief was supported by improved test scores. Students who had active learning activities in the classroom scored significantly higher on a standardized assessment test than students who received lecture only. The findings support the use of student reflection to evaluate the effectiveness of active-learning methods and help validate the use of student reflection of improved learning in other research projects. Copyright © 2011 Elsevier Ltd. All rights reserved.

  18. Changes in Student Populations and Average Test Scores of Dutch Primary Schools

    ERIC Educational Resources Information Center

    Luyten, Hans; de Wolf, Inge

    2011-01-01

    This article focuses on the relation between student population characteristics and average test scores per school in the final grade of primary education from a dynamic perspective. Aggregated data of over 5,000 Dutch primary schools covering a 6-year period were used to study the relation between changes in school populations and shifts in mean…

  19. States Eyeing Expense of Hand-Scored Tests in Light of NCLB Rules

    ERIC Educational Resources Information Center

    Archer, Jeff

    2005-01-01

    When students put down their pencils at the end of Connecticut's testing each year, another intensive process begins. Hundreds of trained evaluators work day and night for about a month to score the written responses. Although expensive, the use of open-ended questions drives the kind of instruction that state leaders say they want in their…

  20. Updating prognosis of cirrhosis by Cox's regression model using Child-Pugh score and aminopyrine breath test as time-dependent covariates.

    PubMed

    Merkel, C; Morabito, A; Sacerdoti, D; Bolognesi, M; Angeli, P; Gatta, A

    1998-06-01

    The determination of aminopyrine breath test on entry into the study was recently shown to improve the accuracy of prediction of death based on the Child-Pugh classification, but the possible usefulness of serial determinations of both parameters has not been assessed. In the present study, we aimed at evaluating whether serial determinations of aminopyrine breath test and Child-Pugh score improve prognostic accuracy in patients with cirrhosis, compared with determinations obtained only on admission. In 74 patients with liver cirrhosis aminopyrine breath test and Child-Pugh score were obtained upon entry into the study. Patients were followed with sequential aminopyrine breath tests and assessments of the Child-Pugh score every 4-6 months. A total number of 232 determinations were obtained. During follow-up 45 patients died, on average after 12 months of follow-up. Child-Pugh score improved in the beginning of follow-up, and then remained fairly constant; aminopyrine breath test showed no improvement in the beginning of follow-up, but rather a slowly progressive decline. In patients who died, both the Child-Pugh score and the metabolism of aminopyrine were significantly more impaired in the last year preceding death (p < 0.05). Applying Cox's regression model with time-dependent covariates, Child-Pugh score and aminopyrine breath test were independent significant predictors of survival. The model with time-dependent covariates explained the observed survival much better than the model with time-fixed covariates (chi-sq. explained by regression = 31.45 vs 11.97; d.f. = 2; p = 0.0000001 vs 0.003). These data suggest that serial determinations of Child-Pugh score and aminopyrine breath test can be used to efficiently update prognosis of cirrhosis.

  1. Health System Implications of Direct-to-Consumer Personal Genome Testing

    PubMed Central

    McGuire, Amy L.; Burke, Wylie

    2010-01-01

    Direct-to-consumer personal genome testing is now widely available to consumers. Proponents argue that knowledge is power but critics worry about consumer safety and potential harms resulting from misinterpretation of test information. In this article, we consider the health system implications of direct-to-consumer personal genome testing, focusing on issues of accountability, both corporate and professional. PMID:21071927

  2. Assessing Growth in Young Children: A Comparison of Raw, Age-Equivalent, and Standard Scores Using the Peabody Picture Vocabulary Test

    ERIC Educational Resources Information Center

    Sullivan, Jeremy R.; Winter, Suzanne M.; Sass, Daniel A.; Svenkerud, Nicole

    2014-01-01

    Many tests provide users with several different types of scores to facilitate interpretation and description of students' performance. Common examples include raw scores, age- and grade-equivalent scores, and standard scores. However, when used within the context of assessing growth among young children, these scores should not be interchangeable…

  3. Lower Quarter Y-Balance Test Scores and Lower Extremity Injury in NCAA Division I Athletes.

    PubMed

    Lai, Wilson C; Wang, Dean; Chen, James B; Vail, Jeremy; Rugg, Caitlin M; Hame, Sharon L

    2017-08-01

    Functional movement tests that are predictive of injury risk in National Collegiate Athletic Association (NCAA) athletes are useful tools for sports medicine professionals. The Lower Quarter Y-Balance Test (YBT-LQ) measures single-leg balance and reach distances in 3 directions. To assess whether the YBT-LQ predicts the laterality and risk of sports-related lower extremity (LE) injury in NCAA athletes. Case-control study; Level of evidence, 3. The YBT-LQ was administered to 294 NCAA Division I athletes from 21 sports during preparticipation physical examinations at a single institution. Athletes were followed prospectively over the course of the corresponding season. Correlation analysis was performed between the laterality of reach asymmetry and composite scores (CS) versus the laterality of injury. Receiver operating characteristic (ROC) analysis was used to determine the optimal asymmetry cutoff score for YBT-LQ. A multivariate regression analysis adjusting for sex, sport type, body mass index, and history of prior LE surgery was performed to assess predictors of earlier and higher rates of injury. Neither the laterality of reach asymmetry nor the CS correlated with the laterality of injury. ROC analysis found optimal cutoff scores of 2, 9, and 3 cm for anterior, posteromedial, and posterolateral reach, respectively. All of these potential cutoff scores, along with a cutoff score of 4 cm used in the majority of prior studies, were associated with poor sensitivity and specificity. Furthermore, none of the asymmetric cutoff scores were associated with earlier or increased rate of injury in the multivariate analyses. YBT-LQ scores alone do not predict LE injury in this collegiate athlete population. Sports medicine professionals should be cautioned against using the YBT-LQ alone to screen for injury risk in collegiate athletes.

  4. Education is associated with higher later life IQ scores, but not with faster cognitive processing speed.

    PubMed

    Ritchie, Stuart J; Bates, Timothy C; Der, Geoff; Starr, John M; Deary, Ian J

    2013-06-01

    Recent reports suggest a causal relationship between education and IQ, which has implications for cognitive development and aging-education may improve cognitive reserve. In two longitudinal cohorts, we tested the association between education and lifetime cognitive change. We then tested whether education is linked to improved scores on processing-speed variables such as reaction time, which are associated with both IQ and longevity. Controlling for childhood IQ score, we found that education was positively associated with IQ at ages 79 (Sample 1) and 70 (Sample 2), and more strongly for participants with lower initial IQ scores. Education, however, showed no significant association with processing speed, measured at ages 83 and 70. Increased education may enhance important later life cognitive capacities, but does not appear to improve more fundamental aspects of cognitive processing. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  5. Trends in Classroom Observation Scores

    PubMed Central

    Lockwood, J. R.; McCaffrey, Daniel F.

    2014-01-01

    Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System–Secondary (CLASS-S) protocol from 458 middle school teachers over a 2-year period to study changes over time in (a) the average quality of teaching for the population of teachers, (b) the average severity of the population of raters, and (c) the severity of individual raters. To obtain these estimates and assess them in the context of other factors that contribute to the variability in scores, we develop an augmented G study model that is broadly applicable for modeling sources of variability in classroom observation ratings data collected over time. In our data, we found that trends in teaching quality were small. Rater drift was very large during raters’ initial days of observation and persisted throughout nearly 2 years of scoring. Raters did not converge to a common level of severity; using our model we estimate that variability among raters actually increases over the course of the study. Variance decompositions based on the model find that trends are a modest source of variance relative to overall rater effects, rater errors on specific lessons, and residual error. The discussion provides possible explanations for trends and rater divergence as well as implications for designs collecting ratings over time. PMID:29795823

  6. Trends in Classroom Observation Scores.

    PubMed

    Casabianca, Jodi M; Lockwood, J R; McCaffrey, Daniel F

    2015-04-01

    Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System-Secondary (CLASS-S) protocol from 458 middle school teachers over a 2-year period to study changes over time in (a) the average quality of teaching for the population of teachers, (b) the average severity of the population of raters, and (c) the severity of individual raters. To obtain these estimates and assess them in the context of other factors that contribute to the variability in scores, we develop an augmented G study model that is broadly applicable for modeling sources of variability in classroom observation ratings data collected over time. In our data, we found that trends in teaching quality were small. Rater drift was very large during raters' initial days of observation and persisted throughout nearly 2 years of scoring. Raters did not converge to a common level of severity; using our model we estimate that variability among raters actually increases over the course of the study. Variance decompositions based on the model find that trends are a modest source of variance relative to overall rater effects, rater errors on specific lessons, and residual error. The discussion provides possible explanations for trends and rater divergence as well as implications for designs collecting ratings over time.

  7. Does Wechsler Intelligence Scale administration and scoring proficiency improve during assessment training?

    PubMed

    Platt, Tyson L; Zachar, Peter; Ray, Glen E; Lobello, Steven G; Underhill, Andrea T

    2007-04-01

    Studies have found that Wechsler scale administration and scoring proficiency is not easily attained during graduate training. These findings may be related to methodological issues. Using a single-group repeated measures design, this study documents statistically significant, though modest, error reduction on the WAIS-III and WISC-III during a graduate course in assessment. The study design does not permit the isolation of training factors related to error reduction, or assessment of whether error reduction is a function of mere practice. However, the results do indicate that previous study findings of no or inconsistent improvement in scoring proficiency may have been the result of methodological factors. Implications for teaching individual intelligence testing and further research are discussed.

  8. Improvement in intelligence test scores from 6 to 10 years in children of teenage mothers.

    PubMed

    Cornelius, Marie D; Goldschmidt, Lidush; De Genna, Natacha M; Richardson, Gale A; Leech, Sharon L; Day, Richard

    2010-06-01

    This study investigates change in IQ scores among 290 children born to teenage mothers and identifies social, economic, and environmental variables that may be associated with change in intelligence test performance. The children of 290 teenage mothers (72% African-American and 28% European American) were assessed with the Stanford-Binet Intelligence Scale-4th Edition at ages 6 and 10. The mean composite score at age 6 was 84.8 and 91.2 at age 10, an improvement of 6.4 points. Significant cross-sectional predictors at both ages 6 and 10 of higher Stanford-Binet Intelligence Scale scores were maternal cognitive ability, school grade, white ethnicity, and caregiver education. Having more children in the household significantly predicted lower Stanford-Binet Intelligence Scale scores at age 6. Higher satisfaction with maternal social support predicted higher Stanford-Binet Intelligence Scale scores at age 10. Change in IQ scores was not related to maternal socioeconomic status, social support, home environment, ethnicity, or family interactions. Custodial stability was associated with an improvement in IQ scores, whereas increase in caregiver depression was related to decline in IQ scores. Our findings suggest that improvement in IQ scores of offspring of teenage mothers may be related to stability of maternal custody. More research is needed to determine the impact of the maturation of adolescent mothers' parenting and the role of early education on improvement in cognitive abilities.

  9. Sequential Neighborhood Effects: The Effect of Long-Term Exposure to Concentrated Disadvantage on Children's Reading and Math Test Scores.

    PubMed

    Hicks, Andrew L; Handcock, Mark S; Sastry, Narayan; Pebley, Anne R

    2018-02-01

    Prior research has suggested that children living in a disadvantaged neighborhood have lower achievement test scores, but these studies typically have not estimated causal effects that account for neighborhood choice. Recent studies used propensity score methods to account for the endogeneity of neighborhood exposures, comparing disadvantaged and nondisadvantaged neighborhoods. We develop an alternative propensity function approach in which cumulative neighborhood effects are modeled as a continuous treatment variable. This approach offers several advantages. We use our approach to examine the cumulative effects of neighborhood disadvantage on reading and math test scores in Los Angeles. Our substantive results indicate that recency of exposure to disadvantaged neighborhoods may be more important than average exposure for children's test scores. We conclude that studies of child development should consider both average cumulative neighborhood exposure and the timing of this exposure.

  10. Sequential Neighborhood Effects: The Effect of Long-Term Exposure to Concentrated Disadvantage on Children's Reading and Math Test Scores

    PubMed Central

    Hicks, Andrew L.; Handcock, Mark S.; Sastry, Narayan

    2018-01-01

    Prior research has suggested that children living in a disadvantaged neighborhood have lower achievement test scores, but these studies typically have not estimated causal effects that account for neighborhood choice. Recent studies used propensity score methods to account for the endogeneity of neighborhood exposures, comparing disadvantaged and nondisadvantaged neighborhoods. We develop an alternative propensity function approach in which cumulative neighborhood effects are modeled as a continuous treatment variable. This approach offers several advantages. We use our approach to examine the cumulative effects of neighborhood disadvantage on reading and math test scores in Los Angeles. Our substantive results indicate that recency of exposure to disadvantaged neighborhoods may be more important than average exposure for children's test scores. We conclude that studies of child development should consider both average cumulative neighborhood exposure and the timing of this exposure. PMID:29192386

  11. Refining Ovarian Cancer Test accuracy Scores (ROCkeTS): protocol for a prospective longitudinal test accuracy study to validate new risk scores in women with symptoms of suspected ovarian cancer

    PubMed Central

    Sundar, Sudha; Rick, Caroline; Dowling, Francis; Au, Pui; Rai, Nirmala; Champaneria, Rita; Stobart, Hilary; Neal, Richard; Davenport, Clare; Mallett, Susan; Sutton, Andrew; Kehoe, Sean; Timmerman, Dirk; Bourne, Tom; Van Calster, Ben; Gentry-Maharaj, Aleksandra; Deeks, Jon

    2016-01-01

    Introduction Ovarian cancer (OC) is associated with non-specific symptoms such as bloating, making accurate diagnosis challenging: only 1 in 3 women with OC presents through primary care referral. National Institute for Health and Care Excellence guidelines recommends sequential testing with CA125 and routine ultrasound in primary care. However, these diagnostic tests have limited sensitivity or specificity. Improving accurate triage in women with vague symptoms is likely to improve mortality by streamlining referral and care pathways. The Refining Ovarian Cancer Test Accuracy Scores (ROCkeTS; HTA 13/13/01) project will derive and validate new tests/risk prediction models that estimate the probability of having OC in women with symptoms. This protocol refers to the prospective study only (phase III). Methods and analysis ROCkeTS comprises four parallel phases. The full ROCkeTS protocol can be found at http://www.birmingham.ac.uk/ROCKETS. Phase III is a prospective test accuracy study. The study will recruit 2450 patients from 15 UK sites. Recruited patients complete symptom and anxiety questionnaires, donate a serum sample and undergo ultrasound scored as per International Ovarian Tumour Analysis (IOTA) criteria. Recruitment is at rapid access clinics, emergency departments and elective clinics. Models to be evaluated include those based on ultrasound derived by the IOTA group and novel models derived from analysis of existing data sets. Estimates of sensitivity, specificity, c-statistic (area under receiver operating curve), positive predictive value and negative predictive value of diagnostic tests are evaluated and a calibration plot for models will be presented. ROCkeTS has received ethical approval from the NHS West Midlands REC (14/WM/1241) and is registered on the controlled trials website (ISRCTN17160843) and the National Institute of Health Research Cancer and Reproductive Health portfolios. PMID:27507231

  12. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association.

    PubMed

    el Galta, Rachid; Uitte de Willige, Shirley; de Visser, Marieke C H; Helmer, Quinta; Hsu, Li; Houwing-Duistermaat, Jeanine J

    2007-09-24

    In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study. We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.

  13. An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.

    ERIC Educational Resources Information Center

    Marco, Gary L.; And Others

    Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…

  14. Using College Admission Test Scores to Clarify High School Placement. Leading Indicator Spotlight

    ERIC Educational Resources Information Center

    Flug, Susanna

    2010-01-01

    In "Beyond Test Scores: Leading Indicators for Education," Foley and colleagues (2008) define leading indicators as those that "provide early signals of progress toward academic achievement" (p. 1) and stress that educators "need leading indicators to help them see the direction their efforts are going in and to take…

  15. Differences of wells scores accuracy, caprini scores and padua scores in deep vein thrombosis diagnosis

    NASA Astrophysics Data System (ADS)

    Gatot, D.; Mardia, A. I.

    2018-03-01

    Deep Vein Thrombosis (DVT) is the venous thrombus in lower limbs. Diagnosis is by using venography or ultrasound compression. However, these examinations are not available yet in some health facilities. Therefore many scoring systems are developed for the diagnosis of DVT. The scoring method is practical and safe to use in addition to efficacy, and effectiveness in terms of treatment and costs. The existing scoring systems are wells, caprini and padua score. There have been many studies comparing the accuracy of this score but not in Medan. Therefore, we are interested in comparative research of wells, capriniand padua score in Medan.An observational, analytical, case-control study was conducted to perform diagnostic tests on the wells, caprini and padua score to predict the risk of DVT. The study was at H. Adam Malik Hospital in Medan.From a total of 72 subjects, 39 people (54.2%) are men and the mean age are 53.14 years. Wells score, caprini score and padua score has a sensitivity of 80.6%; 61.1%, 50% respectively; specificity of 80.65; 66.7%; 75% respectively, and accuracy of 87.5%; 64.3%; 65.7% respectively.Wells score has better sensitivity, specificity and accuracy than caprini and padua score in diagnosing DVT.

  16. Correcting Two-Sample "z" and "t" Tests for Correlation: An Alternative to One-Sample Tests on Difference Scores

    ERIC Educational Resources Information Center

    Zimmerman, Donald W.

    2012-01-01

    In order to circumvent the influence of correlation in paired-samples and repeated measures experimental designs, researchers typically perform a one-sample Student "t" test on difference scores. That procedure entails some loss of power, because it employs N - 1 degrees of freedom instead of the 2N - 2 degrees of freedom of the…

  17. Loanwords and Vocabulary Size Test Scores: A Case of Different Estimates for Different L1 Learners

    ERIC Educational Resources Information Center

    Laufer, Batia; McLean, Stuart

    2016-01-01

    The article investigated how the inclusion of loanwords in vocabulary size tests affected the test scores of two L1 groups of EFL learners: Hebrew and Japanese. New BNC- and COCA-based vocabulary size tests were constructed in three modalities: word form recall, word form recognition, and word meaning recall. Depending on the test modality, the…

  18. Insights into Using "TOEIC"® Test Scores to Inform Human Resource Management Decisions. Research Report. ETS RR-17-48

    ERIC Educational Resources Information Center

    Oliveri, María Elena; Tannenbaum, Richard J.

    2017-01-01

    This report explores the ways in which human resource (HR) managers use "TOEIC"® scores to inform hiring, promotion, and training decisions in an international workplace. Two data sources were used (a) previously collected test users' testimonials that described managers' use of TOEIC scores to inform HR decisions and (b) test-use…

  19. Attention Problems and Stability of WISC-IV Scores Among Clinically Referred Children.

    PubMed

    Green Bartoi, Marla; Issner, Jaclyn Beth; Hetterscheidt, Lesley; January, Alicia M; Kuentzel, Jeffrey Garth; Barnett, Douglas

    2015-01-01

    We examined the stability of Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV) scores among 51 diverse, clinically referred 8- to 16-year-olds (M(age) = 11.24 years, SD = 2.36). Children were referred to and tested at an urban, university-based training clinic; 70% of eligible children completed follow-up testing 12 months to 40 months later (M = 22.05, SD = 5.94). Stability for index scores ranged from .58 (Processing Speed) to .81 (Verbal Comprehension), with a stability of .86 for Full-Scale IQ. Subtest score stability ranged from .35 (Letter-Number Sequencing) to .81 (Vocabulary). Indexes believed to be more susceptible to concentration (Processing Speed and Working Memory) had lower stability. We also examined attention problems as a potential moderating factor of WISC-IV index and subtest score stability. Children with attention problems had significantly lower stability for Digit Span and Matrix Reasoning subtests compared with children without attention problems. These results provide support for the temporal stability of the WISC-IV and also provide some support for the idea that attention problems contribute to children producing less stable IQ estimates when completing the WISC-IV. We hope our report encourages further examination of this hypothesis and its implications.

  20. Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography

    PubMed Central

    Mallett, Susan; Halligan, Steve; Collins, Gary S.; Altman, Doug G.

    2014-01-01

    Background Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. Methods In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Results Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. Conclusions The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests. PMID:25353643

  1. Exploration of analysis methods for diagnostic imaging tests: problems with ROC AUC and confidence scores in CT colonography.

    PubMed

    Mallett, Susan; Halligan, Steve; Collins, Gary S; Altman, Doug G

    2014-01-01

    Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.

  2. Comprehensive School Reform and Standardized Test Scores in Illinois Elementary and Middle Schools

    ERIC Educational Resources Information Center

    McEnroe, James D.

    2010-01-01

    The study examined the effects of the federally funded Comprehensive School Reform (CSR) program on student performance on mandated standardized tests. The study focused on the mathematics and reading scores of Illinois public elementary and middle and junior high school students. The federal CSR program provided Illinois schools with an annual…

  3. Depressive status explains a significant amount of the variance in COPD assessment test (CAT) scores

    PubMed Central

    Miravitlles, Marc; Molina, Jesús; Quintano, José Antonio; Campuzano, Anna; Pérez, Joselín; Roncero, Carlos

    2018-01-01

    Background COPD assessment test (CAT) is a short, easy-to-complete health status tool that has been incorporated into the multidimensional assessment of COPD in order to guide therapy; therefore, it is important to understand the factors determining CAT scores. Methods This is a post hoc analysis of a cross-sectional, observational study conducted in respiratory medicine departments and primary care centers in Spain with the aim of identifying the factors determining CAT scores, focusing particularly on the cognitive status measured by the Mini-Mental State Examination (MMSE) and levels of depression measured by the short Beck Depression Inventory (BDI). Results A total of 684 COPD patients were analyzed; 84.1% were men, the mean age of patients was 68.7 years, and the mean forced expiratory volume in 1 second (%) was 55.1%. Mean CAT score was 21.8. CAT scores correlated with the MMSE score (Pearson’s coefficient r=−0.371) and the BDI (r=0.620), both p<0.001. In the multivariate analysis, the usual COPD severity variables (age, dyspnea, lung function, and comorbidity) together with MMSE and BDI scores were significantly associated with CAT scores and explained 45% of the variability. However, a model including only MMSE and BDI scores explained up to 40% and BDI alone explained 38% of the CAT variance. Conclusion CAT scores are associated with clinical variables of severity of COPD. However, cognitive status and, in particular, the level of depression explain a larger percentage of the variance in the CAT scores than the usual COPD clinical severity variables. PMID:29563782

  4. Depressive status explains a significant amount of the variance in COPD assessment test (CAT) scores.

    PubMed

    Miravitlles, Marc; Molina, Jesús; Quintano, José Antonio; Campuzano, Anna; Pérez, Joselín; Roncero, Carlos

    2018-01-01

    COPD assessment test (CAT) is a short, easy-to-complete health status tool that has been incorporated into the multidimensional assessment of COPD in order to guide therapy; therefore, it is important to understand the factors determining CAT scores. This is a post hoc analysis of a cross-sectional, observational study conducted in respiratory medicine departments and primary care centers in Spain with the aim of identifying the factors determining CAT scores, focusing particularly on the cognitive status measured by the Mini-Mental State Examination (MMSE) and levels of depression measured by the short Beck Depression Inventory (BDI). A total of 684 COPD patients were analyzed; 84.1% were men, the mean age of patients was 68.7 years, and the mean forced expiratory volume in 1 second (%) was 55.1%. Mean CAT score was 21.8. CAT scores correlated with the MMSE score (Pearson's coefficient r =-0.371) and the BDI ( r =0.620), both p <0.001. In the multivariate analysis, the usual COPD severity variables (age, dyspnea, lung function, and comorbidity) together with MMSE and BDI scores were significantly associated with CAT scores and explained 45% of the variability. However, a model including only MMSE and BDI scores explained up to 40% and BDI alone explained 38% of the CAT variance. CAT scores are associated with clinical variables of severity of COPD. However, cognitive status and, in particular, the level of depression explain a larger percentage of the variance in the CAT scores than the usual COPD clinical severity variables.

  5. Meta-Analyses of the Relationship of Creative Achievement to both IQ and Divergent Thinking Test Scores

    ERIC Educational Resources Information Center

    Kim, Kyung Hee

    2008-01-01

    There is disagreement among researchers about whether IQ tests or divergent thinking (DT) tests are better predictors of creative achievement. Resolving this dispute is complicated by the fact that some research has shown a relationship between IQ and DT test scores (e.g., Runco & Albert, 1986; Wallach, 1970). The present study conducted…

  6. Specific algorithm method of scoring the Clock Drawing Test applied in cognitively normal elderly

    PubMed Central

    Mendes-Santos, Liana Chaves; Mograbi, Daniel; Spenciere, Bárbara; Charchat-Fichman, Helenice

    2015-01-01

    The Clock Drawing Test (CDT) is an inexpensive, fast and easily administered measure of cognitive function, especially in the elderly. This instrument is a popular clinical tool widely used in screening for cognitive disorders and dementia. The CDT can be applied in different ways and scoring procedures also vary. Objective The aims of this study were to analyze the performance of elderly on the CDT and evaluate inter-rater reliability of the CDT scored by using a specific algorithm method adapted from Sunderland et al. (1989). Methods We analyzed the CDT of 100 cognitively normal elderly aged 60 years or older. The CDT ("free-drawn") and Mini-Mental State Examination (MMSE) were administered to all participants. Six independent examiners scored the CDT of 30 participants to evaluate inter-rater reliability. Results and Conclusion A score of 5 on the proposed algorithm ("Numbers in reverse order or concentrated"), equivalent to 5 points on the original Sunderland scale, was the most frequent (53.5%). The CDT specific algorithm method used had high inter-rater reliability (p<0.01), and mean score ranged from 5.06 to 5.96. The high frequency of an overall score of 5 points may suggest the need to create more nuanced evaluation criteria, which are sensitive to differences in levels of impairment in visuoconstructive and executive abilities during aging. PMID:29213954

  7. Genetic variation of the growth hormone secretagogue receptor gene is associated with alcohol use disorders identification test scores and smoking.

    PubMed

    Suchankova, Petra; Nilsson, Staffan; von der Pahlen, Bettina; Santtila, Pekka; Sandnabba, Kenneth; Johansson, Ada; Jern, Patrick; Engel, Jörgen A; Jerlhag, Elisabet

    2016-03-01

    The multifaceted gut-brain peptide ghrelin and its receptor (GHSR-1a) are implicated in mechanisms regulating not only the energy balance but also the reward circuitry. In our pre-clinical models, we have shown that ghrelin increases whereas GHSR-1a antagonists decrease alcohol consumption and the motivation to consume alcohol in rodents. Moreover, ghrelin signaling is required for the rewarding properties of addictive drugs including alcohol and nicotine in rodents. Given the hereditary component underlying addictive behaviors and disorders, we sought to investigate whether single nucleotide polymorphisms (SNPs) located in the pre-proghrelin gene (GHRL) and GHSR-1a gene (GHSR) are associated with alcohol use, measured by the alcohol use disorders identification test (AUDIT) and smoking. Two SNPs located in GHRL, rs4684677 (Gln90Leu) and rs696217 (Leu72Met), and one in GHSR, rs2948694, were genotyped in a subset (n = 4161) of a Finnish population-based cohort, the Genetics of Sexuality and Aggression project. The effect of these SNPs on AUDIT scores and smoking was investigated using linear and logistic regressions, respectively. We found that the minor allele of the rs2948694 SNP was nominally associated with higher AUDIT scores (P = 0.0204, recessive model) and smoking (P = 0.0002, dominant model). Furthermore, post hoc analyses showed that this risk allele was also associated with increased likelihood of having high level of alcohol problems as determined by AUDIT scores ≥ 16 (P = 0.0043, recessive model). These convergent findings lend further support for the hypothesized involvement of ghrelin signaling in addictive disorders. © 2015 Society for the Study of Addiction.

  8. Genetic variation of the growth hormone secretagogue receptor gene is associated with alcohol use disorders identification test scores and smoking

    PubMed Central

    Nilsson, Staffan; von der Pahlen, Bettina; Santtila, Pekka; Sandnabba, Kenneth; Johansson, Ada; Jern, Patrick; Engel, Jörgen A.; Jerlhag, Elisabet

    2015-01-01

    Abstract The multifaceted gut‐brain peptide ghrelin and its receptor (GHSR‐1a) are implicated in mechanisms regulating not only the energy balance but also the reward circuitry. In our pre‐clinical models, we have shown that ghrelin increases whereas GHSR‐1a antagonists decrease alcohol consumption and the motivation to consume alcohol in rodents. Moreover, ghrelin signaling is required for the rewarding properties of addictive drugs including alcohol and nicotine in rodents. Given the hereditary component underlying addictive behaviors and disorders, we sought to investigate whether single nucleotide polymorphisms (SNPs) located in the pre‐proghrelin gene (GHRL) and GHSR‐1a gene (GHSR) are associated with alcohol use, measured by the alcohol use disorders identification test (AUDIT) and smoking. Two SNPs located in GHRL, rs4684677 (Gln90Leu) and rs696217 (Leu72Met), and one in GHSR, rs2948694, were genotyped in a subset (n = 4161) of a Finnish population‐based cohort, the Genetics of Sexuality and Aggression project. The effect of these SNPs on AUDIT scores and smoking was investigated using linear and logistic regressions, respectively. We found that the minor allele of the rs2948694 SNP was nominally associated with higher AUDIT scores (P = 0.0204, recessive model) and smoking (P = 0.0002, dominant model). Furthermore, post hoc analyses showed that this risk allele was also associated with increased likelihood of having high level of alcohol problems as determined by AUDIT scores ≥ 16 (P = 0.0043, recessive model). These convergent findings lend further support for the hypothesized involvement of ghrelin signaling in addictive disorders. PMID:26059200

  9. An Investigation of Calculator Use on Employment Tests of Mathematical Ability: Effects on Reliability, Validity, Test Scores, and Speed of Completion

    ERIC Educational Resources Information Center

    Bing, Mark N.; Stewart, Susan M.; Davison, H. Kristl

    2009-01-01

    Handheld calculators have been used on the job for more than 30 years, yet the degree to which these devices can affect performance on employment tests of mathematical ability has not been thoroughly examined. This study used a within-subjects research design (N = 167) to investigate the effects of calculator use on test score reliability, test…

  10. Developing Local Oral Reading Fluency Cut Scores for Predicting High-Stakes Test Performance

    ERIC Educational Resources Information Center

    Grapin, Sally L.; Kranzler, John H.; Waldron, Nancy; Joyce-Beaulieu, Diana; Algina, James

    2017-01-01

    This study evaluated the classification accuracy of a second grade oral reading fluency curriculum-based measure (R-CBM) in predicting third grade state test performance. It also compared the long-term classification accuracy of local and publisher-recommended R-CBM cut scores. Participants were 266 students who were divided into a calibration…

  11. Understanding the Role of "SES," Ethnicity, and Discipline Infractions in Students' Standardized Test Scores

    ERIC Educational Resources Information Center

    Koca, Fatih

    2017-01-01

    The goal of the current study is to examine the impact of students' social economic status, ethnicity, and discipline infractions on their standardized test scores in Indiana, the USA. Data from this study extracted from Indiana Department of Education. ISTEP is a criterion-referenced standardized test. It consists of items that assess a student's…

  12. Physiologic Dysfunction Scores and Cognitive Function Test Performance in United States Adults

    PubMed Central

    Kobrosly, Roni W; Seplaki, Christopher L; Jones, Courtney M; van Wijngaarden, Edwin

    2013-01-01

    Objective To investigate the relationship between a measure of cumulative physiologic dysfunction and specific domains of cognitive function. Methods We examined a summary score measuring physiological dysfunction, a multisystem measure of the body’s ability to effectively adapt to physical and psychological demands, in relation to cognitive function deficits in a population of 4511 adults aged 20 to 59 who participated in the third National Health and Nutrition Examination Survey (1988–1994). Measures of cognitive function comprised three domains: working memory, visuomotor speed, and perceptual-motor speed. ‘Physiologic dysfunction’ scores summarizing measures of cardiovascular, immunologic, kidney, and liver function were explored. We used multiple linear regression models to estimate associations between cognitive function measures and physiological dysfunction scores, adjusting for socioeconomic factors, test conditions, and self-reported health factors. Results We noted a dose-response relationship between physiologic dysfunction and working memory (coefficient = 0.207, 95% CI = (0.066, 0.348), p < 0.0001) that persisted after adjustment for all covariates (p = 0.03). We did not observe any significant relationships between dysfunction scores and visuomotor (p = 0.37) or perceptual-motor ability (p = 0.33). Conclusions Our findings suggest that multisystem physiologic dysfunction is associated with working memory. Future longitudinal studies are needed to clarify the underlying mechanisms and explore the persistency of this association into later life. We suggest that such studies should incorporate physiologic data, neuroendocrine parameters, and a wide range of specific cognitive domains. PMID:22155941

  13. Automated Essay Scoring versus Human Scoring: A Correlational Study

    ERIC Educational Resources Information Center

    Wang, Jinhao; Brown, Michelle Stallone

    2008-01-01

    The purpose of the current study was to analyze the relationship between automated essay scoring (AES) and human scoring in order to determine the validity and usefulness of AES for large-scale placement tests. Specifically, a correlational research design was used to examine the correlations between AES performance and human raters' performance.…

  14. Turkish version of the modified Constant-Murley score and standardized test protocol: reliability and validity.

    PubMed

    Çelik, Derya

    2016-01-01

    The Constant-Murley score (CMS) is widely used to evaluate disabilities associated with shoulder injuries, but it has been criticized for relying on imprecise terminology and a lack of standardized methodology. A modified guideline, therefore, was published in 2008 with several recommendations. This new version has not yet been translated or culturally adapted for Turkish-speaking populations. The purpose of this study was to translate and cross-culturally adapt the modified CMS and its test protocol, as well as define and measure its reliability and validity. The modified CMS was translated into Turkish, consistent with published methodological guidelines. The measurement properties of the Turkish version of the modified CMS were tested in 30 patients (12 males, 18 females; mean age: 59.5±13.5 years) with a variety of shoulder pathologies. Intraclass correlation coefficients (ICC) were used to estimate test-retest reliability. Construct validity was analyzed with the Turkish version of the American Shoulder and Elbow Surgeons (ASES) Standardized Shoulder Assessment Form and Short-Form Health Survey (SF-12). No difficulties were found in the translation process. The Turkish version of the modified CMS showed excellent test-retest reliability (ICC=0.86). The correlation coefficients between the Turkish version of the modified CMS and the ASES, SF-12-physical component score, and SF-12 mental component scores were found to be 0.48, 0.35, and 0.05, respectively. No floor or ceiling effects were found. The translation and cultural adaptation of the modified CMS and its standardized test protocol into Turkish were successful. The Turkish version of the modified CMS has sufficient reliability and validity to measure a variety of shoulder disorders for Turkish-speaking individuals.

  15. Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

    ERIC Educational Resources Information Center

    Yao, Lihua

    2012-01-01

    Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…

  16. Using Automated Essay Scores as an Anchor When Equating Constructed Response Writing Tests

    ERIC Educational Resources Information Center

    Almond, Russell G.

    2014-01-01

    Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common…

  17. Development and Validation of Scores from an Instrument Measuring Student Test-Taking Motivation

    ERIC Educational Resources Information Center

    Eklof, Hanna

    2006-01-01

    Using the expectancy-value model of achievement motivation as a basis, this study's purpose is to develop, apply, and validate scores from a self-report instrument measuring student test-taking motivation. Sampled evidence of construct validity for the present sample indicates that a number of the items in the instrument could be used as an…

  18. The Fight's Not Always Fixed: Using Literary Response to Transcend Standardized Test Scores

    ERIC Educational Resources Information Center

    Avila, JuliAnna

    2012-01-01

    In 2004, the National Endowment for the Arts (NEA) concluded that "literature reading is fading as a meaningful activity, especially among younger people." How can educators continue to teach students about the power of literary response when the priority is for them to achieve proficiency on standardized tests, whose scores can only be narrowly…

  19. Using a Concept-Grounded, Curriculum-Based Measure in Mathematics To Predict Statewide Test Scores for Middle School Students with LD.

    ERIC Educational Resources Information Center

    Helwig, Robert; Anderson, Lisbeth; Tindal, Gerald

    2002-01-01

    An 11-item math concept curriculum-based measure (CBM) was administered to 171 eighth grade students. Scores were correlated with scores from a computer adaptive test designed in conjunction with the state to approximate the official statewide mathematics achievement tests. Correlations for general education students and students with learning…

  20. Zertifikat Deutsch als Fremdsprache and the Oral Proficiency Interview: A Comparison of Test Scores and Examinations.

    ERIC Educational Resources Information Center

    Lalande, John F.; Schweckendiek, Jurgen

    1986-01-01

    Investigates what correlations might exist between an individual's score on the Zertifikat Deutsch als Fremdsprache and on the Oral Proficiency Interview. The tests themselves are briefly described. Results indicate that the two tests appear to correlate well in their evaluation of speaking skills. (SED)

  1. Linking Composite Scores: Effects of Anchor Test Length and Content Representativeness. Research Report. ETS RR-16-36

    ERIC Educational Resources Information Center

    Lin, Peng; Dorans, Neil; Weeks, Jonathan

    2016-01-01

    The nonequivalent groups with anchor test (NEAT) design is frequently used in test score equating or linking. One important assumption of the NEAT design is that the anchor test is a miniversion of the 2 tests to be equated/linked. When the content of the 2 tests is different, it is not possible for the anchor test to be adequately representative…

  2. A sup-score test for the cure fraction in mixture models for long-term survivors.

    PubMed

    Hsu, Wei-Wen; Todem, David; Kim, KyungMann

    2016-12-01

    The evaluation of cure fractions in oncology research under the well known cure rate model has attracted considerable attention in the literature, but most of the existing testing procedures have relied on restrictive assumptions. A common assumption has been to restrict the cure fraction to a constant under alternatives to homogeneity, thereby neglecting any information from covariates. This article extends the literature by developing a score-based statistic that incorporates covariate information to detect cure fractions, with the existing testing procedure serving as a special case. A complication of this extension, however, is that the implied hypotheses are not typical and standard regularity conditions to conduct the test may not even hold. Using empirical processes arguments, we construct a sup-score test statistic for cure fractions and establish its limiting null distribution as a functional of mixtures of chi-square processes. In practice, we suggest a simple resampling procedure to approximate this limiting distribution. Our simulation results show that the proposed test can greatly improve efficiency over tests that neglect the heterogeneity of the cure fraction under the alternative. The practical utility of the methodology is illustrated using ovarian cancer survival data with long-term follow-up from the surveillance, epidemiology, and end results registry. © 2016, The International Biometric Society.

  3. Rey's Auditory Verbal Learning Test scores can be predicted from whole brain MRI in Alzheimer's disease.

    PubMed

    Moradi, Elaheh; Hallikainen, Ilona; Hänninen, Tuomo; Tohka, Jussi

    2017-01-01

    Rey's Auditory Verbal Learning Test (RAVLT) is a powerful neuropsychological tool for testing episodic memory, which is widely used for the cognitive assessment in dementia and pre-dementia conditions. Several studies have shown that an impairment in RAVLT scores reflect well the underlying pathology caused by Alzheimer's disease (AD), thus making RAVLT an effective early marker to detect AD in persons with memory complaints. We investigated the association between RAVLT scores (RAVLT Immediate and RAVLT Percent Forgetting) and the structural brain atrophy caused by AD. The aim was to comprehensively study to what extent the RAVLT scores are predictable based on structural magnetic resonance imaging (MRI) data using machine learning approaches as well as to find the most important brain regions for the estimation of RAVLT scores. For this, we built a predictive model to estimate RAVLT scores from gray matter density via elastic net penalized linear regression model. The proposed approach provided highly significant cross-validated correlation between the estimated and observed RAVLT Immediate (R = 0.50) and RAVLT Percent Forgetting (R = 0.43) in a dataset consisting of 806 AD, mild cognitive impairment (MCI) or healthy subjects. In addition, the selected machine learning method provided more accurate estimates of RAVLT scores than the relevance vector regression used earlier for the estimation of RAVLT based on MRI data. The top predictors were medial temporal lobe structures and amygdala for the estimation of RAVLT Immediate and angular gyrus, hippocampus and amygdala for the estimation of RAVLT Percent Forgetting. Further, the conversion of MCI subjects to AD in 3-years could be predicted based on either observed or estimated RAVLT scores with an accuracy comparable to MRI-based biomarkers.

  4. Clinical score and rapid antigen detection test to guide antibiotic use for sore throats: randomised controlled trial of PRISM (primary care streptococcal management).

    PubMed

    Little, Paul; Hobbs, F D Richard; Moore, Michael; Mant, David; Williamson, Ian; McNulty, Cliodna; Cheng, Ying Edith; Leydon, Geraldine; McManus, Richard; Kelly, Joanne; Barnett, Jane; Glasziou, Paul; Mullee, Mark

    2013-10-10

    To determine the effect of clinical scores that predict streptococcal infection or rapid streptococcal antigen detection tests compared with delayed antibiotic prescribing. Open adaptive pragmatic parallel group randomised controlled trial. Primary care in United Kingdom. Patients aged ≥ 3 with acute sore throat. An internet programme randomised patients to targeted antibiotic use according to: delayed antibiotics (the comparator group for analyses), clinical score, or antigen test used according to clinical score. During the trial a preliminary streptococcal score (score 1, n=1129) was replaced by a more consistent score (score 2, n=631; features: fever during previous 24 hours; purulence; attends rapidly (within three days after onset of symptoms); inflamed tonsils; no cough/coryza (acronym FeverPAIN). Symptom severity reported by patients on a 7 point Likert scale (mean severity of sore throat/difficulty swallowing for days two to four after the consultation (primary outcome)), duration of symptoms, use of antibiotics. For score 1 there were no significant differences between groups. For score 2, symptom severity was documented in 80% (168/207 (81%) in delayed antibiotics group; 168/211 (80%) in clinical score group; 166/213 (78%) in antigen test group). Reported severity of symptoms was lower in the clinical score group (-0.33, 95% confidence interval -0.64 to -0.02; P=0.04), equivalent to one in three rating sore throat a slight versus moderate problem, with a similar reduction for the antigen test group (-0.30, -0.61 to -0.00; P=0.05). Symptoms rated moderately bad or worse resolved significantly faster in the clinical score group (hazard ratio 1.30, 95% confidence interval 1.03 to 1.63) but not the antigen test group (1.11, 0.88 to 1.40). In the delayed antibiotics group, 75/164 (46%) used antibiotics. Use of antibiotics in the clinical score group (60/161) was 29% lower (adjusted risk ratio 0.71, 95% confidence interval 0.50 to 0.95; P=0.02) and in the

  5. Effect of Mindfulness Meditation on Perceived Stress Scores and Autonomic Function Tests of Pregnant Indian Women.

    PubMed

    Muthukrishnan, Shobitha; Jain, Reena; Kohli, Sangeeta; Batra, Swaraj

    2016-04-01

    Various pregnancy complications like hypertension, preeclampsia have been strongly correlated with maternal stress. One of the connecting links between pregnancy complications and maternal stress is mind-body intervention which can be part of Complementary and Alternative Medicine (CAM). Biologic measures of stress during pregnancy may get reduced by such interventions. To evaluate the effect of Mindfulness meditation on perceived stress scores and autonomic function tests of pregnant Indian women. Pregnant Indian women of 12 weeks gestation were randomised to two treatment groups: Test group with Mindfulness meditation and control group with their usual obstetric care. The effect of Mindfulness meditation on perceived stress scores and cardiac sympathetic functions and parasympathetic functions (Heart rate variation with respiration, lying to standing ratio, standing to lying ratio and respiratory rate) were evaluated on pregnant Indian women. There was a significant decrease in perceived stress scores, a significant decrease of blood pressure response to cold pressor test and a significant increase in heart rate variability in the test group (p< 0.05, significant) which indicates that mindfulness meditation is a powerful modulator of the sympathetic nervous system and can thereby reduce the day-to-day perceived stress in pregnant women. The results of this study suggest that mindfulness meditation improves parasympathetic functions in pregnant women and is a powerful modulator of the sympathetic nervous system during pregnancy.

  6. Segregation and the Black-White Test Score Gap. NBER Working Paper No. 12988

    ERIC Educational Resources Information Center

    Vigdor, Jacob; Ludwig, Jens

    2007-01-01

    The mid-1980s witnessed breaks in two important trends related to race and schooling. School segregation, which had been declining, began a period of relative stasis. Black-white test score gaps, which had also been declining, also stagnated. The notion that these two phenomena may be related is also supported by basic cross-sectional evidence. We…

  7. Adults with poor reading skills: How lexical knowledge interacts with scores on standardized reading comprehension tests

    PubMed Central

    McKoon, Gail; Ratcliff, Roger

    2016-01-01

    Millions of adults in the United States lack the necessary literacy skills for most living wage jobs. For students from adult learning classes, we used a lexical decision task to measure their knowledge of words and we used a decision-making model (Ratcliff’s, 1978, diffusion model) to abstract the mechanisms underlying their performance from their RTs and accuracy. We also collected scores for each participant on standardized IQ tests and standardized reading tests used commonly in the education literature. We found significant correlations between the model’s estimates of the strengths with which words are represented in memory and scores for some of the standardized tests but not others. The findings point to the feasibility and utility of combining a test of word knowledge, lexical decision, that is well-established in psycholinguistic research, a decision-making model that supplies information about underlying mechanisms, and standardized tests. The goal for future research is to use this combination of approaches to understand better how basic processes relate to standardized tests with the eventual aim of understanding what these tests are measuring and what the specific difficulties are for individual, low-literacy adults. PMID:26550803

  8. Adults with poor reading skills: How lexical knowledge interacts with scores on standardized reading comprehension tests.

    PubMed

    McKoon, Gail; Ratcliff, Roger

    2016-01-01

    Millions of adults in the United States lack the necessary literacy skills for most living wage jobs. For students from adult learning classes, we used a lexical decision task to measure their knowledge of words and we used a decision-making model (Ratcliff's, 1978, diffusion model) to abstract the mechanisms underlying their performance from their RTs and accuracy. We also collected scores for each participant on standardized IQ tests and standardized reading tests used commonly in the education literature. We found significant correlations between the model's estimates of the strengths with which words are represented in memory and scores for some of the standardized tests but not others. The findings point to the feasibility and utility of combining a test of word knowledge, lexical decision, that is well-established in psycholinguistic research, a decision-making model that supplies information about underlying mechanisms, and standardized tests. The goal for future research is to use this combination of approaches to understand better how basic processes relate to standardized tests with the eventual aim of understanding what these tests are measuring and what the specific difficulties are for individual, low-literacy adults. Copyright © 2015. Published by Elsevier B.V.

  9. Automated Scoring of Speaking Tasks in the Test of English-for-Teaching ("TEFT"™). Research Report. ETS RR-15-31

    ERIC Educational Resources Information Center

    Zechner, Klaus; Chen, Lei; Davis, Larry; Evanini, Keelan; Lee, Chong Min; Leong, Chee Wee; Wang, Xinhao; Yoon, Su-Youn

    2015-01-01

    This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching ("TEFT"™) within the "ELTeach"™ framework.The test consists of items for all four language modalities:…

  10. Predicting performance and injury resilience from movement quality and fitness scores in a basketball team over 2 years.

    PubMed

    McGill, Stuart M; Andersen, Jordan T; Horne, Arthur D

    2012-07-01

    The purpose of this study was to see if specific tests of fitness and movement quality could predict injury resilience and performance in a team of basketball players over 2 years (2 playing seasons). It was hypothesized that, in a basketball population, movement and fitness scores would predict performance scores and that movement and fitness scores would predict injury resilience. A basketball team from a major American university (N = 14) served as the test population in this longitudinal trial. Variables linked to fitness, movement ability, speed, strength, and agility were measured together with some National Basketball Association (NBA) combine tests. Dependent variables of performance indicators (such as games and minutes played, points scored, assists, rebounds, steal, and blocks) and injury reports were tracked for the subsequent 2 years. Results showed that better performance was linked with having a stiffer torso, more mobile hips, weaker left grip strength, and a longer standing long jump, to name a few. Of the 3 NBA combine tests administered here, only a faster lane agility time had significant links with performance. Some movement qualities and torso endurance were not linked. No patterns with injury emerged. These observations have implications for preseason testing and subsequent training programs in an attempt to reduce future injury and enhance playing performance.

  11. Refining Ovarian Cancer Test accuracy Scores (ROCkeTS): protocol for a prospective longitudinal test accuracy study to validate new risk scores in women with symptoms of suspected ovarian cancer.

    PubMed

    Sundar, Sudha; Rick, Caroline; Dowling, Francis; Au, Pui; Snell, Kym; Rai, Nirmala; Champaneria, Rita; Stobart, Hilary; Neal, Richard; Davenport, Clare; Mallett, Susan; Sutton, Andrew; Kehoe, Sean; Timmerman, Dirk; Bourne, Tom; Van Calster, Ben; Gentry-Maharaj, Aleksandra; Menon, Usha; Deeks, Jon

    2016-08-09

    Ovarian cancer (OC) is associated with non-specific symptoms such as bloating, making accurate diagnosis challenging: only 1 in 3 women with OC presents through primary care referral. National Institute for Health and Care Excellence guidelines recommends sequential testing with CA125 and routine ultrasound in primary care. However, these diagnostic tests have limited sensitivity or specificity. Improving accurate triage in women with vague symptoms is likely to improve mortality by streamlining referral and care pathways. The Refining Ovarian Cancer Test Accuracy Scores (ROCkeTS; HTA 13/13/01) project will derive and validate new tests/risk prediction models that estimate the probability of having OC in women with symptoms. This protocol refers to the prospective study only (phase III). ROCkeTS comprises four parallel phases. The full ROCkeTS protocol can be found at http://www.birmingham.ac.uk/ROCKETS. Phase III is a prospective test accuracy study. The study will recruit 2450 patients from 15 UK sites. Recruited patients complete symptom and anxiety questionnaires, donate a serum sample and undergo ultrasound scored as per International Ovarian Tumour Analysis (IOTA) criteria. Recruitment is at rapid access clinics, emergency departments and elective clinics. Models to be evaluated include those based on ultrasound derived by the IOTA group and novel models derived from analysis of existing data sets. Estimates of sensitivity, specificity, c-statistic (area under receiver operating curve), positive predictive value and negative predictive value of diagnostic tests are evaluated and a calibration plot for models will be presented. ROCkeTS has received ethical approval from the NHS West Midlands REC (14/WM/1241) and is registered on the controlled trials website (ISRCTN17160843) and the National Institute of Health Research Cancer and Reproductive Health portfolios. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted

  12. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Utah

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Utah's test score trends through 2008-09. In 2004, the mean scale score on the state 4th grade reading test was 167 for non-Title I students and 164 for Title I students. In 2009 the mean scale score in 4th grade reading was 168 for non-Title I students and 164 for Title I students. Between 2004 and 2009, the mean scale score…

  13. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Colorado

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Colorado's test score trends through 2008-09. In 2003, the mean scale score on the state 4th grade reading test was 598 for non-Title I students and 558 for Title I students. In 2009, the mean scale score in 4th grade reading was 599 for non-Title I students and 556 for Title I students. Between 2003 and 2009, the mean scale…

  14. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Maryland

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Maryland's test score trends through 2008-09. In 2004, 82% of non-Title I 4th graders and 61% of Title I 4th graders scored at the proficient level on the state reading test. In 2009, 90% of non-Title I 4th graders and 78% of Title I 4th graders scored at the proficient level in reading. Between 2004 and 2009, the percentage…

  15. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Delaware

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Delaware's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 474 for non-Title I students and 464 for Title I students. In 2009, the mean scale score in 4th grade reading was 478 for non-Title I students and 467 for Title I students. Between 2006 and 2009, the mean scale…

  16. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Massachusetts

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Massachusetts's test score trends through 2008-09. In 2006, 59% of non-Title I 4th graders and 29% of Title I 4th graders scored at the proficient level on the state reading test. In 2009, 64% of non-Title I 4th graders and 31% of Title I 4th graders scored at the proficient level in reading. Between 2006 and 2009, the…

  17. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Missouri

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Missouri's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 661 for non-Title I students and 642 for Title I students. In 2009, the mean scale score in 4th grade reading was 661 for non-Title I students and 648 for Title I students. Between 2006 and 2009, there was no…

  18. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? Kentucky

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles Kentucky's test score trends through 2008-09. In 2007, the mean scale score on the state 4th grade reading test was 455 for non-Title I students and 451 for Title I students. In 2009, the mean scale score in 4th grade reading was 455 for non-Title I students and 451 for Title I students. Between 2007 and 2009, the mean scale…

  19. Impossible Scores Resulting in Zero Frequencies in the Anchor Test: Impact on Smoothing and Equating. Research Report. ETS RR-08-10

    ERIC Educational Resources Information Center

    Puhan, Gautam; vonDavier, Alina; Gupta, Shaloo

    2008-01-01

    Equating under the external anchor design is frequently conducted using scaled scores on the anchor test. However, scaled scores often lead to the unique problem of creating zero frequencies in the score distribution because there may not always be a one-to-one correspondence between raw and scaled scores. For example, raw scores of 17 and 18 may…

  20. Might the Rorschach be a projective test after all? Social projection of an undesired trait alters Rorschach Oral Dependency scores.

    PubMed

    Bornstein, Robert F

    2007-06-01

    The degree to which projection plays a role in Rorschach (Rorschach, 1921/1942) responding remains controversial, in part because extant data have yielded inconclusive results. In this investigation, I examined the impact of social projection on Rorschach Oral Dependency (ROD) scores using methods adapted from social cognition research. In Study 1, I prescreened 85 college students (40 women and 45 men) with the ROD scale and a widely used self-report measure of dependency, the Interpersonal Dependency Inventory (IDI; Hirschfeld et al., 1977). Results show that informing participants who scored low on the IDI that they were in fact highly dependent led to significant increases in ROD scores; I did not obtain parallel ROD increases for participants who scored high on the IDI or for participants who received low-dependent feedback. In Study 2, I examined a separate sample of 80 prescreened college students (40 women and 40 men) and showed that providing low self-report participants an opportunity to attribute dependency to a fictional target person prior to Rorschach responding attenuated the impact of high-dependent feedback on ROD scores. These results suggest that projection played a role in at least one domain of Rorschach responding. I discuss theoretical, clinical, and empirical implications of these results.

  1. Establishing the Validity of TOEIC Bridge™ Test Scores for Students in Colombia, Chile, and Ecuador. Research Report. ETS RR-08-58

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Feng, Ying; Saldivia, Luis; Powers, Donald E.; Ginuta, Anthony; Simpson, Annabelle; Weng, Vincent

    2008-01-01

    The validity of TOEIC Bridge™ scores as a measure of English language skill was examined from the standpoint of a unified concept of test validity. In this study, more than 6,000 test takers in 3 Latin American countries (Chile, Colombia, and Ecuador) took 1 form of the TOEIC Bridge test, and their scores were compared to additional information…

  2. Rugby versus Soccer in South Africa: Content Familiarity Contributes to Cross-Cultural Differences in Cognitive Test Scores

    ERIC Educational Resources Information Center

    Malda, Maike; van de Vijver, Fons J. R.; Temane, Q. Michael

    2010-01-01

    In this study, cross-cultural differences in cognitive test scores are hypothesized to depend on a test's cultural complexity (Cultural Complexity Hypothesis: CCH), here conceptualized as its content familiarity, rather than on its cognitive complexity (Spearman's Hypothesis: SH). The content familiarity of tests assessing short-term memory,…

  3. Speech perception scores in cochlear implant recipients: An analysis of ceiling effects in the CUNY sentence test (Quiet) in post-lingually deafened cochlear implant recipients.

    PubMed

    Ebrahimi-Madiseh, Azadeh; Eikelboom, Robert H; Jayakody, Dona Mp; Atlas, Marcus D

    2016-01-01

    To evaluate the clinical utility of the City University of New York sentence test in a cohort of post-lingually deafened cochlear implants recipients over time. 117 post-lingually deafened, Australian English-speaking CI recipients aged between 23 and 98 years (M = 66 years; SD = 15.09) were recruited. CUNY sentence test scores in quiet were collated and analysed at two cut-offs, 95% and 100%, as ceiling scores. CUNY sentence scores ranged from 4% to 100% (M = 86.75; SD = 20.65), with 38.8% of participants scoring 95% and 16.5% of participants reaching the 100% scores. The percentage of participants reaching the 95% and 100% ceiling scores increased over time (6 and 12 months post-implantation). The distribution of all post-operative CUNY test scores skewed to the right with 82% of test scores reaching above 90%. This study demonstrates that the CUNY test cannot be used as a valid tool to measure the speech perception skills of post-lingually deafened CI recipients over time. This may be overcome by using adaptive test protocols or linguistically, cognitively or contextually demanding test materials. The high percentage of CI recipients achieving ceiling scores for the CUNY sentence test in quiet at 3 months post-implantation, questions the validity of using CUNY in CI assessment test battery and limits its application for use in longitudinal studies evaluating CI outcomes. Further studies are required to examine different methods to overcome this problem.

  4. Guided-Inquiry Lessons Raise Scores on the Sixth Grade Georgia Science Test

    NASA Astrophysics Data System (ADS)

    Page, Purlie M.

    At the local level, G Middle School has the highest district-wide percentage of 6th grade science students who are not meeting standards. It is imperative that G middle school take corrective action to reduce the number of students failing to meet state science standards. Dewey's theory of conceptual framework, which involves knowledge constructed on a person's personal experience and mind activity through active forms of learning, guided this study. The goal of the study was to determine whether inquiry-based science modules produce greater 6th grade science achievement, as measured by an equivalent instrument of the science section of the Georgia Criterion-Referenced Competency Test, when compared to traditional instruction among eastern Georgia 6th graders. The sample consisted of 230 students in the nonintervention group and 119 students in the intervention group. All students were from intact classes. At the end of the intervention, an independent t test was conducted to analyze the scores. According to the study t test, (t = 12.33, df = 304.56, p < 0.05), the difference between the means was statistically significant. This project's potential impact on social change includes increasing student motivation towards, comprehension of, and interest in science concepts. At the local level, these inquiry lessons can be shared with science teachers across grade levels and within the district to improve county-wide science scores. An increase in student interest and comprehension of science concepts could ultimately lead to the United States producing more students in the fields of science, technology, engineering, and mathematics (STEM) education.

  5. Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) scores generated from the MMPI-2 and MMPI-2-RF test booklets: internal structure comparability in a sample of criminal defendants.

    PubMed

    Tarescavage, Anthony M; Alosco, Michael L; Ben-Porath, Yossef S; Wood, Arcangela; Luna-Jones, Lynn

    2015-04-01

    We investigated the internal structure comparability of Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) scores derived from the MMPI-2 and MMPI-2-RF booklets in a sample of 320 criminal defendants (229 males and 54 females). After exclusion of invalid protocols, the final sample consisted of 96 defendants who were administered the MMPI-2-RF booklet and 83 who completed the MMPI-2. No statistically significant differences in MMPI-2-RF invalidity rates were observed between the two forms. Individuals in the final sample who completed the MMPI-2-RF did not statistically differ on demographics or referral question from those who were administered the MMPI-2 booklet. Independent t tests showed no statistically significant differences between MMPI-2-RF scores generated with the MMPI-2 and MMPI-2-RF booklets on the test's substantive scales. Statistically significant small differences were observed on the revised Variable Response Inconsistency (VRIN-r) and True Response Inconsistency (TRIN-r) scales. Cronbach's alpha and standard errors of measurement were approximately equal between the booklets for all MMPI-2-RF scales. Finally, MMPI-2-RF intercorrelations produced from the two forms yielded mostly small and a few medium differences, indicating that discriminant validity and test structure are maintained. Overall, our findings reflect the internal structure comparability of MMPI-2-RF scale scores generated from MMPI-2 and MMPI-2-RF booklets. Implications of these results and limitations of these findings are discussed. © The Author(s) 2014.

  6. Modified scoring criteria for the RBANS figures.

    PubMed

    Duff, Kevin; Leber, W R; Patton, Doyle E; Schoenberg, Mike R; Mold, James W; Scott, James G; Adams, Russell L

    2007-01-01

    Visual construction and memory tasks are routinely used in neuropsychological assessment, but their subjective scoring criteria can negatively affect the reliability of these instruments. The current study examined the standard scoring criteria for the Figure Copy and Recall subtests of the RBANS and compared them to a modified set of scoring criteria in two samples. In both a large community dwelling sample of older adults and in a mixed clinical sample, the original scoring criteria consistently led to lower scores than the modified criteria. Inter-rater reliability was high for the modified scoring criteria, and no age effects were found with the modified scoring criteria. In both samples, the modified scoring criteria led to Figure Copy scores that more closely approximated other performances on the RBANS compared to the standard criteria, whereas both scoring systems led to plausible Figure Recall scores. Despite these results, the present study cannot identify one scoring criterion as the "better," but only points out the significant differences between them. Such differences can have important clinical implications, and practitioners and researchers who utilize the RBANS with patient samples should be cautious when interpreting low scores on Figure Copy and Recall if the standard criteria are used.

  7. Medical devices; ovarian adnexal mass assessment score test system; labeling; black box restrictions. Final rule.

    PubMed

    2011-12-30

    The Food and Drug Administration (FDA) is amending the regulation classifying ovarian adnexal mass assessment score test systems to restrict these devices so that a prescribed warning statement that addresses a risk identified in the special controls guidance document must be in a black box and must appear in all labeling, advertising, and promotional material. The black box warning mitigates the risk to health associated with off-label use as a screening test, stand-alone diagnostic test, or as a test to determine whether or not to proceed with surgery.

  8. The effect of constructivist teaching strategies on science test scores of middle school students

    NASA Astrophysics Data System (ADS)

    Vaca, James L., Jr.

    International studies show that the United States is lagging behind other industrialized countries in science proficiency. The studies revealed how American students showed little significant gain on standardized tests in science between 1995 and 2005. Little information is available regarding how reform in American teaching strategies in science could improve student performance on standardized testing. The purpose of this quasi-experimental quantitative study using a pretest/posttest control group design was to examine how the use of a hands-on, constructivist teaching approach with low achieving eighth grade science students affected student achievement on the 2007 Ohio Eighth Grade Science Achievement Test posttest (N = 76). The research question asked how using constructivist teaching strategies in the science classroom affected student performance on standardized tests. Two independent samples of 38 students each consisting of low achieving science students as identified by seventh grade science scores and scores on the Ohio Eighth Grade Science Half-Length Practice Test pretest were used. Four comparisons were made between the control group receiving traditional classroom instruction and the experimental group receiving constructivist instruction including: (a) pretest/posttest standard comparison, (b) comparison of the number of students who passed the posttest, (c) comparison of the six standards covered on the posttest, (d) posttest's sample means comparison. A Mann-Whitney U Test revealed that there was no significant difference between the independent sample distributions for the control group and the experimental group. These findings contribute to positive social change by investigating science teaching strategies that could be used in eighth grade science classes to improve student achievement in science.

  9. An Investigation of the Relationship Between Readiness Test Scores for Kindergarten Children and Achievement Scores Obtained at the End of Grades One and Two. S.S.T.A. Research Centre Report No. 62.

    ERIC Educational Resources Information Center

    Warkentin, Lena

    The primary purpose of this study was to investigate the relationship between Metropolitan Readiness Test (MRT) scores in kindergarten (MRTK) and grade one (MRT1) with the reading scores of the Canadian Tests of Basic Skills (CTBS) at the end of grades one (CTBSR1) and two (CTBSR2). A secondary purpose of the study was to determine whether the…

  10. Psychometric Evaluation of the Lower Extremity Computerized Adaptive Test, the Modified Harris Hip Score, and the Hip Outcome Score.

    PubMed

    Hung, Man; Hon, Shirley D; Cheng, Christine; Franklin, Jeremy D; Aoki, Stephen K; Anderson, Mike B; Kapron, Ashley L; Peters, Christopher L; Pelt, Christopher E

    2014-12-01

    The applicability and validity of many patient-reported outcome measures in the high-functioning population are not well understood. To compare the psychometric properties of the modified Harris Hip Score (mHHS), the Hip Outcome Score activities of daily living subscale (HOS-ADL) and sports (HOS-sports), and the Lower Extremity Computerized Adaptive Test (LE CAT). The hypotheses was that all instruments would perform well but that the LE CAT would show superiority psychometrically because a combination of CAT and a large item bank allows for a high degree of measurement precision. Cohort study (diagnosis); Level of evidence, 2. Data were collected from 472 advanced-age, active participants from the Huntsman World Senior Games in 2012. Validity evidences were examined through item fit, dimensionality, monotonicity, local independence, differential item functioning, person raw score to measure correlation, and instrument coverage (ie, ceiling and floor effects), and reliability evidences were examined through Cronbach alpha and person separation index. All instruments demonstrated good item fit, unidimensionality, monotonicity, local independence, and person raw score to measure correlations. The HOS-ADL had high ceiling effects of 36.02%, and the mHHS had ceiling effects of 27.54%. The LE CAT had ceiling effects of 8.47%, and the HOS-sports had no ceiling effects. None of the instruments had any floor effects. The mHHS had a very low Cronbach alpha of 0.41 and an extremely low person separation index of 0.08. Reliabilities for the LE CAT were excellent and for the HOS-ADL and HOS-sports were good. The LE CAT showed better psychometric properties overall than the HOS-ADL, HOS-sports, and mHHS for the senior population. The mHHS demonstrated pronounced ceiling effects and poor reliabilities that should be of concern. The high ceiling effects for the HOS-ADL were also of concern. The LE CAT was superior in all psychometric aspects examined in this study. Future

  11. Psychometric Evaluation of the Lower Extremity Computerized Adaptive Test, the Modified Harris Hip Score, and the Hip Outcome Score

    PubMed Central

    Hung, Man; Hon, Shirley D.; Cheng, Christine; Franklin, Jeremy D.; Aoki, Stephen K.; Anderson, Mike B.; Kapron, Ashley L.; Peters, Christopher L.; Pelt, Christopher E.

    2014-01-01

    Background: The applicability and validity of many patient-reported outcome measures in the high-functioning population are not well understood. Purpose: To compare the psychometric properties of the modified Harris Hip Score (mHHS), the Hip Outcome Score activities of daily living subscale (HOS-ADL) and sports (HOS-sports), and the Lower Extremity Computerized Adaptive Test (LE CAT). The hypotheses was that all instruments would perform well but that the LE CAT would show superiority psychometrically because a combination of CAT and a large item bank allows for a high degree of measurement precision. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: Data were collected from 472 advanced-age, active participants from the Huntsman World Senior Games in 2012. Validity evidences were examined through item fit, dimensionality, monotonicity, local independence, differential item functioning, person raw score to measure correlation, and instrument coverage (ie, ceiling and floor effects), and reliability evidences were examined through Cronbach alpha and person separation index. Results: All instruments demonstrated good item fit, unidimensionality, monotonicity, local independence, and person raw score to measure correlations. The HOS-ADL had high ceiling effects of 36.02%, and the mHHS had ceiling effects of 27.54%. The LE CAT had ceiling effects of 8.47%, and the HOS-sports had no ceiling effects. None of the instruments had any floor effects. The mHHS had a very low Cronbach alpha of 0.41 and an extremely low person separation index of 0.08. Reliabilities for the LE CAT were excellent and for the HOS-ADL and HOS-sports were good. Conclusion: The LE CAT showed better psychometric properties overall than the HOS-ADL, HOS-sports, and mHHS for the senior population. The mHHS demonstrated pronounced ceiling effects and poor reliabilities that should be of concern. The high ceiling effects for the HOS-ADL were also of concern. The LE CAT was superior

  12. End of Course Grades and Standardized Test Scores: Are Grades Predictive of Student Achievement?

    ERIC Educational Resources Information Center

    Ricketts, Christine R.

    2010-01-01

    This study examined the extent to which end-of-course grades are predictive of Virginia Standards of Learning test scores in nine high school content areas. It also analyzed the impact of the variables school cluster attended, gender, ethnicity, disability status, Limited English Proficiency status, and socioeconomic status on the relationship…

  13. Test Scores, Dropout Rates, and Transfer Rates as Alternative Indicators of High School Performance

    ERIC Educational Resources Information Center

    Rumberger, Russell W.; Palardy, Gregory J.

    2005-01-01

    This study investigated the relationships among several different indicators of high school performance: test scores, dropout rates, transfer rates, and attrition rates. Hierarchical linear models were used to analyze panel data from a sample of 14,199 students who took part in the National Education Longitudinal Survey of 1988. The results…

  14. Evaluation of Score Interpretive Information from the Perspective of Failed and Passed Test-Takers.

    ERIC Educational Resources Information Center

    Shannon, Gregory A.

    Candidates who had taken examinations for certification required by the American Production and Inventory Control Society (APICS) were surveyed to acquire feedback about the effectiveness of score interpretive information given to test takers. Those sampled included 488 passers and 389 failers of the Inventory Management (IM) examination and 457…

  15. A risk score for predicting coronary artery disease in women with angina pectoris and abnormal stress test finding.

    PubMed

    Lo, Monica Y; Bonthala, Nirupama; Holper, Elizabeth M; Banks, Kamakki; Murphy, Sabina A; McGuire, Darren K; de Lemos, James A; Khera, Amit

    2013-03-15

    Women with angina pectoris and abnormal stress test findings commonly have no epicardial coronary artery disease (CAD) at catheterization. The aim of the present study was to develop a risk score to predict obstructive CAD in such patients. Data were analyzed from 337 consecutive women with angina pectoris and abnormal stress test findings who underwent cardiac catheterization at our center from 2003 to 2007. Forward selection multivariate logistic regression analysis was used to identify the independent predictors of CAD, defined by ≥50% diameter stenosis in ≥1 epicardial coronary artery. The independent predictors included age ≥55 years (odds ratio 2.3, 95% confidence interval 1.3 to 4.0), body mass index <30 kg/m(2) (odds ratio 1.9, 95% confidence interval 1.1 to 3.1), smoking (odds ratio 2.6, 95% confidence interval 1.4 to 4.8), low high-density lipoprotein cholesterol (odds ratio 2.9, 95% confidence interval 1.5 to 5.5), family history of premature CAD (odds ratio 2.4, 95% confidence interval 1.0 to 5.7), lateral abnormality on stress imaging (odds ratio 2.8, 95% confidence interval 1.5 to 5.5), and exercise capacity <5 metabolic equivalents (odds ratio 2.4, 95% confidence interval 1.1 to 5.6). Assigning each variable 1 point summed to constitute a risk score, a graded association between the score and prevalent CAD (ptrend <0.001). The risk score demonstrated good discrimination with a cross-validated c-statistic of 0.745 (95% confidence interval 0.70 to 0.79), and an optimized cutpoint of a score of ≤2 included 62% of the subjects and had a negative predictive value of 80%. In conclusion, a simple clinical risk score of 7 characteristics can help differentiate those more or less likely to have CAD among women with angina pectoris and abnormal stress test findings. This tool, if validated, could help to guide testing strategies in women with angina pectoris. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Predictive validity of the classroom strategies scale-observer form on statewide testing scores: an initial investigation.

    PubMed

    Reddy, Linda A; Fabiano, Gregory A; Dudek, Christopher M; Hsu, Louis

    2013-12-01

    The present study examined the validity of a teacher observation measure, the Classroom Strategies Scale--Observer Form (CSS), as a predictor of student performance on statewide tests of mathematics and English language arts. The CSS is a teacher practice observational measure that assesses evidence-based instructional and behavioral management practices in elementary school. A series of two-level hierarchical generalized linear models were fitted to data of a sample of 662 third- through fifth-grade students to assess whether CSS Part 2 Instructional Strategy and Behavioral Management Strategy scale discrepancy scores (i.e., ∑ |recommended frequency--frequency ratings|) predicted statewide mathematics and English language arts proficiency scores when percentage of minority students in schools was controlled. Results indicated that the Instructional Strategy scale discrepancy scores significantly predicted mathematics and English language arts proficiency scores: Relatively larger discrepancies on observer ratings of what teachers did versus what should have been done were associated with lower proficiency scores. Results offer initial evidence of the predictive validity of the CSS Part 2 Instructional Strategy discrepancy scores on student academic outcomes. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  17. ETS Psychometric Contributions: Focus on Test Scores. Research Report. ETS RR-13-15. ETS R&D Scientific and Policy Contributions Series. ETS SPC-13-03

    ERIC Educational Resources Information Center

    Moses, Tim

    2013-01-01

    The purpose of this report is to review ETS psychometric contributions that focus on test scores. Two major sections review contributions based on assessing test scores' measurement characteristics and other contributions about using test scores as predictors in correlational and regression relationships. An additional section reviews additional…

  18. GalaxyDock BP2 score: a hybrid scoring function for accurate protein-ligand docking

    NASA Astrophysics Data System (ADS)

    Baek, Minkyung; Shin, Woong-Hee; Chung, Hwan Won; Seok, Chaok

    2017-07-01

    Protein-ligand docking is a useful tool for providing atomic-level understanding of protein functions in nature and design principles for artificial ligands or proteins with desired properties. The ability to identify the true binding pose of a ligand to a target protein among numerous possible candidate poses is an essential requirement for successful protein-ligand docking. Many previously developed docking scoring functions were trained to reproduce experimental binding affinities and were also used for scoring binding poses. However, in this study, we developed a new docking scoring function, called GalaxyDock BP2 Score, by directly training the scoring power of binding poses. This function is a hybrid of physics-based, empirical, and knowledge-based score terms that are balanced to strengthen the advantages of each component. The performance of the new scoring function exhibits significant improvement over existing scoring functions in decoy pose discrimination tests. In addition, when the score is used with the GalaxyDock2 protein-ligand docking program, it outperformed other state-of-the-art docking programs in docking tests on the Astex diverse set, the Cross2009 benchmark set, and the Astex non-native set. GalaxyDock BP2 Score and GalaxyDock2 with this score are freely available at http://galaxy.seoklab.org/softwares/galaxydock.html.

  19. Prognostic Implications of Dual Platelet Reactivity Testing in Acute Coronary Syndrome.

    PubMed

    de Carvalho, Leonardo P; Fong, Alan; Troughton, Richard; Yan, Bryan P; Chin, Chee-Tang; Poh, Sock-Cheng; Mejin, Melissa; Huang, Nancy; Seneviratna, Aruni; Lee, Chi-Hang; Low, Adrian F; Tan, Huay-Cheem; Chan, Siew-Pang; Frampton, Christopher; Richards, A Mark; Chan, Mark Y

    2018-02-01

    Studies on platelet reactivity (PR) testing commonly test PR only after percutaneous coronary intervention (PCI) has been performed. There are few data on pre- and post-PCI testing. Data on simultaneous testing of aspirin and adenosine diphosphate antagonist response are conflicting. We investigated the prognostic value of combined serial assessments of high on-aspirin PR (HASPR) and high on-adenosine diphosphate receptor antagonist PR (HADPR) in patients with acute coronary syndrome (ACS). HASPR and HADPR were assessed in 928 ACS patients before (initial test) and 24 hours after (final test) coronary angiography, with or without revascularization. Patients with HASPR on the initial test, compared with those without, had significantly higher intraprocedural thrombotic events (IPTE) (8.6 vs. 1.2%, p  ≤ 0.001) and higher 30-day major adverse cardiovascular and cerebrovascular events (MACCE; 5.2 vs. 2.3%, p  = 0.05), but not 12-month MACCE (13.0 vs. 15.1%, p  = 0.50). Patients with initial HADPR, compared with those without, had significantly higher IPTE (4.4 vs. 0.9%, p  = 0.004), but not 30-day (3.5 vs. 2.3%, p  = 0.32) or 12-month MACCE (14.0 vs. 12.5%, p  = 0.54). The c-statistic of the Global Registry of Acute Coronary Events (GRACE) score alone, GRACE score + ASPR test and GRACE score + ADPR test for discriminating 30-day MACCE was 0.649, 0.803 and 0.757, respectively. Final ADPR was associated with 30-day MACCE among patients with intermediate-to-high GRACE score (adjusted odds ratio [OR]: 4.50, 95% confidence interval [CI]: 1.14-17.66), but not low GRACE score (adjusted OR: 1.19, 95% CI: 0.13-10.79). In conclusion, both HASPR and HADPR predict ischaemic events in ACS. This predictive utility is time-dependent and risk-dependent. Schattauer GmbH Stuttgart.

  20. Opportunity to Learn: Investigating Possible Predictors for Pre-Course "Test Of Astronomy STandards" TOAST Scores

    ERIC Educational Resources Information Center

    Berryhill, Katie J.; Slater, Timothy F.

    2017-01-01

    As discipline-based astronomy education researchers become more interested in experimentally testing innovative teaching strategies to enhance learning in undergraduate introductory astronomy survey courses ("ASTRO 101"), scholars are placing increased attention toward better understanding factors impacting student gain scores on the…

  1. Longitudinal Improvement in Balance Error Scoring System Scores among NCAA Division-I Football Athletes.

    PubMed

    Mathiasen, Ross; Hogrefe, Christopher; Harland, Kari; Peterson, Andrew; Smoot, M Kyle

    2018-02-15

    The Balance Error Scoring System (BESS) is a commonly used concussion assessment tool. Recent studies have questioned the stability and reliability of baseline BESS scores. The purpose of this longitudinal prospective cohort study is to examine differences in yearly baseline BESS scores in athletes participating on an NCAA Division-I football team. NCAA Division-I freshman football athletes were videotaped performing the BESS test at matriculation and after 1 year of participation in the football program. Twenty-three athletes were enrolled in year 1 of the study, and 25 athletes were enrolled in year 2. Those athletes enrolled in year 1 were again videotaped after year 2 of the study. The paired t-test was used to assess for change in score over time for the firm surface, foam surface, and the cumulative BESS score. Additionally, inter- and intrarater reliability values were calculated. Cumulative errors on the BESS significantly decreased from a mean of 20.3 at baseline to 16.8 after 1 year of participation. The mean number of errors following the second year of participation was 15.0. Inter-rater reliability for the cumulative score ranged from 0.65 to 0.75. Intrarater reliability was 0.81. After 1 year of participation, there is a statistically and clinically significant improvement in BESS scores in an NCAA Division-I football program. Although additional improvement in BESS scores was noted after a second year of participation, it did not reach statistical significance. Football athletes should undergo baseline BESS testing at least yearly if the BESS is to be optimally useful as a diagnostic test for concussion.

  2. Poisson Approximation-Based Score Test for Detecting Association of Rare Variants.

    PubMed

    Fang, Hongyan; Zhang, Hong; Yang, Yaning

    2016-07-01

    Genome-wide association study (GWAS) has achieved great success in identifying genetic variants, but the nature of GWAS has determined its inherent limitations. Under the common disease rare variants (CDRV) hypothesis, the traditional association analysis methods commonly used in GWAS for common variants do not have enough power for detecting rare variants with a limited sample size. As a solution to this problem, pooling rare variants by their functions provides an efficient way for identifying susceptible genes. Rare variant typically have low frequencies of minor alleles, and the distribution of the total number of minor alleles of the rare variants can be approximated by a Poisson distribution. Based on this fact, we propose a new test method, the Poisson Approximation-based Score Test (PAST), for association analysis of rare variants. Two testing methods, namely, ePAST and mPAST, are proposed based on different strategies of pooling rare variants. Simulation results and application to the CRESCENDO cohort data show that our methods are more powerful than the existing methods. © 2016 John Wiley & Sons Ltd/University College London.

  3. Utilizing the Six Realms of Meaning in Improving Campus Standardized Test Scores through Team Teaching and Strategic Planning

    ERIC Educational Resources Information Center

    Stevenson, Rosnisha D.; Kritsonis, William Allan

    2009-01-01

    This article will seek to utilize Dr. William Allan Kritsonis' book "Ways of Knowing Through the Realms of Meaning" (2007) as a framework to improve a campus's standardized test scores, more specifically, their TAKS (Texas Assessment of Knowledge and Skills) scores. Many campuses have an improvement plan, also known as a Campus…

  4. Integrating GIS in the Middle School Curriculum: Impacts on Diverse Students' Standardized Test Scores

    ERIC Educational Resources Information Center

    Goldstein, Donna; Alibrandi, Marsha

    2013-01-01

    This case study conducted with 1,425 middle school students in Palm Beach County, Florida, included a treatment group receiving GIS instruction (256) and a control group without GIS instruction (1,169). Quantitative analyses on standardized test scores indicated that inclusion of GIS in middle school curriculum had a significant effect on student…

  5. The Impact of Cooperative Learning on Critical Thinking Test Scores of Associate's Degree Graduates in Southwest Virginia

    ERIC Educational Resources Information Center

    Hodges, James Gregory

    2013-01-01

    This study examined the impact that the teaching technique known as cooperative learning had on the changes between pre- and post-test scores on all sub-categories ("induction, deduction, analysis, evaluation, inference", and "total composite") associated with the "California Critical Thinking Skills Test" (CCTST) for…

  6. The Validity of ITBS Reading Comprehension Test Scores for Learning Disabled and Non Learning Disabled Students under Extended-Time Conditions.

    ERIC Educational Resources Information Center

    Huesman, Ronald L., Jr.; Frisbie, David A.

    This study investigated the effect of extended-time limits in terms of performance levels and score comparability for reading comprehension scores on the Iowa Tests of Basic Skills (ITBS). The first part of the study compared the average reading comprehension scores on the ITBS of 61 sixth-graders with learning disabilities and 397 non learning…

  7. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? North Carolina

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles North Carolina's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade math test was 351 for non-Title I students and 347 for Title I students. In 2009, the mean scale score in 4th grade math was 354 for non-Title I students and 350 for Title I students. Between 2006 and 2009, the mean scale…

  8. State Test Score Trends through 2008-09, Part 4: Is Achievement Improving and Are Gaps Narrowing for Title I Students? New Hampshire

    ERIC Educational Resources Information Center

    Center on Education Policy, 2011

    2011-01-01

    This paper profiles New Hampshire's test score trends through 2008-09. In 2006, the mean scale score on the state 4th grade reading test was 445 for non-Title I students and 438 for Title I students. In 2009, the mean scale score in 4th grade reading was 448 for non-Title I students and 441 for Title I students. Between 2006 and 2009, the mean…

  9. Science Teacher Efficacy and Outcome Expectancy as Predictors of Students' End-of-Instruction (EOI) Biology I Test Scores

    ERIC Educational Resources Information Center

    Angle, Julie; Moseley, Christine

    2009-01-01

    The purpose of this study was to compare teacher efficacy beliefs of secondary Biology I teachers whose students' mean scores on the statewide End-of-Instruction (EOI) Biology I test met or exceeded the state academic proficiency level (Proficient Group) to teacher efficacy beliefs of secondary Biology I teachers whose students' mean scores on the…

  10. The Causes and Consequences of Test Score Manipulation: Evidence from the New York Regents Examinations. CEPA Working Paper No. 16-08

    ERIC Educational Resources Information Center

    Dee, Thomas S.; Dobbie, Will; Jacob, Brian A.; Rockoff, Jonah

    2016-01-01

    In this paper, we show that the design and decentralized, school-based scoring of New York's high school exit exams--the Regents Examinations--led to the systematic manipulation of test sores just below important proficiency cutoffs. Our estimates suggest that teachers inflate approximately 40 percent of test scores near the proficiency cutoffs.…

  11. Do classroom ventilation rates in California elementary schools influence standardized test scores? Results from a prospective study.

    PubMed

    Mendell, M J; Eliseeva, E A; Davies, M M; Lobscheid, A

    2016-08-01

    Limited evidence has associated lower ventilation rates (VRs) in schools with reduced student learning or achievement. We analyzed longitudinal data collected over two school years from 150 classrooms in 28 schools within three California school districts. We estimated daily classroom VRs from real-time indoor carbon dioxide measured by web-connected sensors. School districts provided individual-level scores on standard tests in Math and English, and classroom-level demographic data. Analyses assessing learning effects used two VR metrics: average VRs for 30 days prior to tests, and proportion of prior daily VRs above specified thresholds during the year. We estimated relationships between scores and VR metrics in multivariate models with generalized estimating equations. All school districts had median school-year VRs below the California VR standard. Most models showed some positive associations of VRs with test scores; however, estimates varied in magnitude and few 95% confidence intervals excluded the null. Combined-district models estimated statistically significant increases of 0.6 points (P = 0.01) on English tests for each 10% increase in prior 30-day VRs. Estimated increases in Math were of similar magnitude but not statistically significant. Findings suggest potential small positive associations between classroom VRs and learning. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

  12. Permanent Income and the Black-White Test Score Gap. NBER Working Paper No. 17610

    ERIC Educational Resources Information Center

    Rothstein, Jesse; Wozny, Nathan

    2011-01-01

    Analysts often examine the black-white test score gap conditional on family income. Typically only a current income measure is available. We argue that the gap conditional on permanent income is of greater interest, and we describe a method for identifying this gap using an auxiliary data set to estimate the relationship between current and…

  13. The Black-White Test Score Gap through Third Grade. NBER Working Paper No. 11049

    ERIC Educational Resources Information Center

    Fryer, Roland G.; Levitt, Steven D.

    2005-01-01

    This paper describes basic facts regarding the black-white test score gap over the first four years of school. Black children enter school substantially behind their white counterparts in reading and math, but including a small number of covariates erases the gap. Over the first four years of school, however, blacks lose substantial ground…

  14. A robust method using propensity score stratification for correcting verification bias for binary tests

    PubMed Central

    He, Hua; McDermott, Michael P.

    2012-01-01

    Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. PMID:21856650

  15. A Brief Look at: Test Scores and the Standard Error of Measurement. E&R Report No. 10.13

    ERIC Educational Resources Information Center

    Holdzkom, David; Sumner, Brian; McMillen, Brad

    2010-01-01

    In the context of standardized testing, the standard error of measurement (SEM) is a measure of the factors other than the student's actual knowledge of the tested material that may affect the student's test score. Such factors may include distractions in the testing environment, fatigue, hunger, or even luck. This means that a student's observed…

  16. Associations between cadmium exposure and neurocognitive test scores in a cross-sectional study of US adults.

    PubMed

    Ciesielski, Timothy; Bellinger, David C; Schwartz, Joel; Hauser, Russ; Wright, Robert O

    2013-02-05

    Low-level environmental cadmium exposure and neurotoxicity has not been well studied in adults. Our goal was to evaluate associations between neurocognitive exam scores and a biomarker of cumulative cadmium exposure among adults in the Third National Health and Nutrition Examination Survey (NHANES III). NHANES III is a nationally representative cross-sectional survey of the U.S. population conducted between 1988 and 1994. We analyzed data from a subset of participants, age 20-59, who participated in a computer-based neurocognitive evaluation. There were four outcome measures: the Simple Reaction Time Test (SRTT: visual motor speed), the Symbol Digit Substitution Test (SDST: attention/perception), the Serial Digit Learning Test (SDLT) trials-to-criterion, and the SDLT total-error-score (SDLT-tests: learning recall/short-term memory). We fit multivariable-adjusted models to estimate associations between urinary cadmium concentrations and test scores. 5662 participants underwent neurocognitive screening, and 5572 (98%) of these had a urinary cadmium level available. Prior to multivariable-adjustment, higher urinary cadmium concentration was associated with worse performance in each of the 4 outcomes. After multivariable-adjustment most of these relationships were not significant, and age was the most influential variable in reducing the association magnitudes. However among never-smokers with no known occupational cadmium exposure the relationship between urinary cadmium and SDST score (attention/perception) was significant: a 1 μg/L increase in urinary cadmium corresponded to a 1.93% (95%CI: 0.05, 3.81) decrement in performance. These results suggest that higher cumulative cadmium exposure in adults may be related to subtly decreased performance in tasks requiring attention and perception, particularly among those adults whose cadmium exposure is primarily though diet (no smoking or work based cadmium exposure). This association was observed among exposure levels

  17. Analyzing Test-Taking Behavior: Decision Theory Meets Psychometric Theory.

    PubMed

    Budescu, David V; Bo, Yuanchao

    2015-12-01

    We investigate the implications of penalizing incorrect answers to multiple-choice tests, from the perspective of both test-takers and test-makers. To do so, we use a model that combines a well-known item response theory model with prospect theory (Kahneman and Tversky, Prospect theory: An analysis of decision under risk, Econometrica 47:263-91, 1979). Our results reveal that when test-takers are fully informed of the scoring rule, the use of any penalty has detrimental effects for both test-takers (they are always penalized in excess, particularly those who are risk averse and loss averse) and test-makers (the bias of the estimated scores, as well as the variance and skewness of their distribution, increase as a function of the severity of the penalty).

  18. Comparison of baseline and post-concussion ImPACT test scores in young athletes with stimulant-treated and untreated ADHD.

    PubMed

    Gardner, Ryan M; Yengo-Kahn, Aaron; Bonfield, Christopher M; Solomon, Gary S

    2017-02-01

    Baseline and post-concussion neurocognitive testing is useful in managing concussed athletes. Attention deficit hyperactivity disorder (ADHD) and stimulant medications are recognized as potential modifiers of performance on neurocognitive testing by the Concussion in Sport Group. Our goal was to assess whether individuals with ADHD perform differently on post-concussion testing and if this difference is related to the use of stimulants. Retrospective case-control study in which 4373 athletes underwent baseline and post-concussion testing using the ImPACT battery. 277 athletes self-reported a history of ADHD, of which, 206 reported no stimulant treatment and 69 reported stimulant treatment. Each group was matched with participants reporting no history of ADHD or stimulant use on several biopsychosocial characteristics. Non-parametric tests were used to assess ImPACT composite score differences between groups. Participants with ADHD had worse verbal memory, visual memory, visual motor speed, and reaction time scores than matched controls at baseline and post-concussion, all with p ≤ .001 and |r|≥ 0.100. Athletes without stimulant treatment had lower verbal memory, visual memory, visual motor speed, and reaction time scores than controls at baseline (p ≤ 0.01, |r|≥ 0.100 [except verbal memory, r = -0.088]) and post-concussion (p = 0.000, |r|> 0.100). Athletes with stimulant treatment had lower verbal memory (Baseline: p = 0.047, r = -0.108; Post-concussion: p = 0.023, r = -0.124) and visual memory scores (Baseline: p = 0.013, r = -0.134; Post-concussion: p = 0.003, r = -0.162) but equivalent visual motor speed and reaction time scores versus controls at baseline and post-concussion. ADHD-specific baseline and post-concussion neuropsychological profiles, as well as stimulant medication status, may need to be considered when interpreting ImPACT test results. Further investigation into the effects of ADHD and stimulant use on recovery from

  19. The effect of an intervention program on functional movement screen test scores in mixed martial arts athletes.

    PubMed

    Bodden, Jamie G; Needham, Robert A; Chockalingam, Nachiappan

    2015-01-01

    This study assessed the basic fundamental movements of mixed martial arts (MMA) athletes using the functional movement screen (FMS) assessment and determined if an intervention program was successful at improving results. Participants were placed into 1 of the 2 groups: intervention and control groups. The intervention group was required to complete a corrective exercise program 4 times per week, and all participants were asked to continue their usual MMA training routine. A mid-intervention FMS test was included to examine if successful results were noticed sooner than the 8-week period. Results highlighted differences in FMS test scores between the control group and intervention group (p = 0.006). Post hoc testing revealed a significant increase in the FMS score of the intervention group between weeks 0 and 8 (p = 0.00) and weeks 0 and 4 (p = 0.00) and no significant increase between weeks 4 and 8 (p = 1.00). A χ analysis revealed that the intervention group participants were more likely to have an FMS score >14 than participants in the control group at week 4 (χ = 7.29, p < 0.01) and week 8 (χ = 5.2, p ≤ 0.05). Finally, a greater number of participants in the intervention group were free from asymmetry at week 4 and week 8 compared with the initial test period. The results of the study suggested that a 4-week intervention program was sufficient at improving FMS scores. Most if not all, the movements covered on the FMS relate to many aspects of MMA training. The knowledge that the FMS can identify movement dysfunctions and, furthermore, the fact that the issues can be improved through a standardized intervention program could be advantageous to MMA coaches, thus, providing the opportunity to adapt and implement new additions to training programs.

  20. Gender Differences in Factor Scores of Anxiety and Depression among Australian University Students: Implications for Counselling Interventions

    ERIC Educational Resources Information Center

    Bitsika, Vicki; Sharpley, Chris F.; Melham, Therese C.

    2010-01-01

    Anxiety and depression inventory scores from 200 male and female university students attending a private university in Australia were examined for their factor structure. Once established, the two sets of factors were tested for gender-based differences, revealing that females were more likely than males to report symptomatology associated with…

  1. Increasing the reliability of the fluid/crystallized difference score from the Kaufman Adolescent and Adult Intelligence Test with reliable component analysis.

    PubMed

    Caruso, J C

    2001-06-01

    The unreliability of difference scores is a well documented phenomenon in the social sciences and has led researchers and practitioners to interpret differences cautiously, if at all. In the case of the Kaufman Adult and Adolescent Intelligence Test (KAIT), the unreliability of the difference between the Fluid IQ and the Crystallized IQ is due to the high correlation between the two scales. The consequences of the lack of precision with which differences are identified are wide confidence intervals and unpowerful significance tests (i.e., large differences are required to be declared statistically significant). Reliable component analysis (RCA) was performed on the subtests of the KAIT in order to address these problems. RCA is a new data reduction technique that results in uncorrelated component scores with maximum proportions of reliable variance. Results indicate that the scores defined by RCA have discriminant and convergent validity (with respect to the equally weighted scores) and that differences between the scores, derived from a single testing session, were more reliable than differences derived from equal weighting for each age group (11-14 years, 15-34 years, 35-85+ years). This reliability advantage results in narrower confidence intervals around difference scores and smaller differences required for statistical significance.

  2. Payload test philosophy. [implications of STS development at Goddard Space Flight Center

    NASA Technical Reports Server (NTRS)

    Arman, A.

    1979-01-01

    The implications of STS development for payload testing at the Goddard Space Flight Center are reviewed. The biggest impact of STS may be that instead of testing the entire payload, most of the testing may have to be limited to the subsystem or subassembly level. Particular consideration is given to the Goddard protoflight concept in which the test is geared to the design qualification levels, the test durations being those that are expected during the actual launch sequence.

  3. Testing the radiosurgery-based arteriovenous malformation score and the modified Spetzler-Martin grading system to predict radiosurgical outcome.

    PubMed

    Andrade-Souza, Yuri M; Zadeh, Gelareh; Ramani, Meera; Scora, Daryl; Tsao, May N; Schwartz, Michael L

    2005-10-01

    The aim of this study was to validate the radiosurgery-based arteriovenous malformation (AVM) score and the modified Spetzler-Martin grading system to predict radiosurgical outcome. One hundred thirty-six patients with brain AVMs were randomly selected. These patients had undergone a linear accelerator radiosurgical procedure at a single center between 1989 and 2000. Patients were divided into four groups according to an AVM score, which was calculated from the lesion volume, lesion location, and patient age (Group 1, AVM score <1; Group 2, AVM score 1-1.49; Group 3, AVM score 1.5-2; and Group 4, AVM score >2). Patients with a Spetzler-Martin Grade III AVM were divided into Grades IIIA (lesion >3 cm) and IIIB (lesion <3 cm). Sixty-two female (45.6%) and 74 male (54.4%) patients with a median age of 37.5 years (mean 37.5 years, range 5-77 years) were followed up for a median of 40 months. The median tumor margin dose was 15 Gy (mean 17.23 Gy, range 15-25 Gy). The proportions of excellent outcomes according to the AVM score were as follows: 91.7% for Group 1, 74.1% for Group 2, 60% for Group 3, and 33.3% for Group 4 (chi-square test, degrees of freedom (df) = 3, p < 0.001). Based on the modified Spetzler-Martin system, Grade I lesions had 88.9% excellent results; Grade II, 69.6%; Grade IIIB, 61.5%; and Grades IIIA and IV, 44.8% (chi-square test, df = 3, p = 0.047). The radiosurgery-based AVM score can be used accurately to predict excellent results following a single radiosurgical treatment for AVM. The modified Spetzler-Martin system can also predict radiosurgical results for AVMs, thus making it possible to use this system while deciding between surgery and radiosurgery.

  4. Interpreting Linked Psychomotor Performance Scores

    ERIC Educational Resources Information Center

    Looney, Marilyn A.

    2013-01-01

    Given that equating/linking applications are now appearing in kinesiology literature, this article provides an overview of the different types of linked test scores: equated, concordant, and predicted. It also addresses the different types of evidence required to determine whether the scores from two different field tests (measuring the same…

  5. Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains. NCEE 2010-4004

    ERIC Educational Resources Information Center

    Schochet, Peter Z.; Chiang, Hanley S.

    2010-01-01

    This paper addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using realistic performance measurement system schemes based on hypothesis testing, we develop error rate formulas based on OLS and Empirical Bayes estimators.…

  6. Your move: The effect of chess on mathematics test scores.

    PubMed

    Rosholm, Michael; Mikkelsen, Mai Bjørnskov; Gumede, Kamilla

    2017-01-01

    We analyse the effect of substituting a weekly mathematics lesson in primary school grades 1-3 with a lesson in mathematics based on chess instruction. We use data from the City of Aarhus in Denmark, combining test score data with a comprehensive data set obtained from administrative registers. We use two different methodological approaches to identify and estimate treatment effects and we tend to find positive effects, indicating that knowledge acquired through chess play can be transferred to the domain of mathematics. We also find larger impacts for unhappy children and children who are bored in school, perhaps because chess instruction facilitates learning by providing an alternative approach to mathematics for these children. The results are encouraging and suggest that chess may be an important and effective tool for improving mathematical capacity in young students.

  7. A comparison of three developmental stage scoring systems.

    PubMed

    Dawson, Theo Linda

    2002-01-01

    In social psychological research the stage metaphor has fallen into disfavor due to concerns about bias, reliability, and validity. To address some of these issues, I employ a multidimensional partial credit analysis comparing moral judgment interviews scored with the Standard Issue Scoring System (SISS) (Colby and Kohlberg, 1987b), evaluative reasoning interviews scored with the Good Life Scoring System (GLSS) (Armon, 1984b), and Good Education interviews scored with the Hierarchical Complexity Scoring System (HCSS) (Commons, Danaher, Miller, and Dawson, 2000). A total of 209 participants between the ages of 5 and 86 were interviewed. The multidimensional model reveals that even though the scoring systems rely upon different criteria and the data were collected using different methods and scored by different teams of raters, the SISS, GLSS, and HCSS all appear to measure the same latent variable. The HCSS exhibits more internal consistency than the SISS and GLSS, and solves some methodological problems introduced by the content dependency of the SISS and GLSS. These results and their implications are elaborated.

  8. A test of prospect theory.

    PubMed

    Feeny, David; Eng, Ken

    2005-01-01

    Prospect theory (PT) hypothesizes that people judge states relative to a reference point, usually assumed to be their current health. States better than the reference point are valued on a concave portion of the utility function; worse states are valued on a convex portion. Using prospectively collected utility scores, the objective is to test empirically implications of PT. Osteoarthritis (OA) patients undergoing total hip arthroplasty periodically provided standard gamble scores for three OA hypothetical states describing mild, moderate, and severe OA as well as their subjectively defined current state (SDCS). Our hypothesis was that most patients improved between the pre- and postsurgery assessments. According to PT, scores for hypothetical states previously > SDCS but now < SDCS should be lower at the postsurgery assessment. Fourteen patients met the criteria for testing the hypothesis. Predictions were confirmed for 0 patients; there was no change or mixed results for 6 patients (42.9 percent); and scores moved in the direction opposite to that predicted by PT for 8 patients (57.1 percent). In general, the direction and magnitude of the changes in hypothetical-state scores do not conform to the predictions of PT.

  9. Determinants of Academic Attainment in the United States: A Quantile Regression Analysis of Test Scores

    ERIC Educational Resources Information Center

    Haile, Getinet Astatike; Nguyen, Anh Ngoc

    2008-01-01

    We investigate the determinants of high school students' academic attainment in mathematics, reading and science in the United States; focusing particularly on possible differential impacts of ethnicity and family background across the distribution of test scores. Using data from the NELS2000 and employing quantile regression, we find two…

  10. Genetic Testing and Its Implications: Human Genetics Researchers Grapple with Ethical Issues.

    ERIC Educational Resources Information Center

    Rabino, Isaac

    2003-01-01

    Contributes systematic data on the attitudes of scientific experts who engage in human genetics research about the pros, cons, and ethical implications of genetic testing. Finds that they are highly supportive of voluntary testing and the right to know one's genetic heritage. Calls for greater genetic literacy. (Contains 87 references.) (Author/NB)

  11. Evaluation of Different Scoring Rules for a Noncognitive Test in Development. Research Report. ETS RR-16-03

    ERIC Educational Resources Information Center

    Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal

    2016-01-01

    In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…

  12. Visual-Constructional Ability in Individuals with Severe Obesity: Rey Complex Figure Test Accuracy and the Q-Score.

    PubMed

    Sargénius, Hanna L; Bylsma, Frederick W; Lydersen, Stian; Hestad, Knut

    2017-01-01

    The aims of this study were to investigate visual-construction and organizational strategy among individuals with severe obesity, as measured by the Rey Complex Figure Test (RCFT), and to examine the validity of the Q-score as a measure for the quality of performance on the RCFT. Ninety-six non-demented morbidly obese (MO) patients and 100 healthy controls (HC) completed the RCFT. Their performance was calculated by applying the standard scoring criteria. The quality of the copying process was evaluated per the directions of the Q-score scoring system. Results revealed that the MO did not perform significantly lower than the HC on Copy accuracy (mean difference -0.302, CI -1.374 to 0.769, p = 0.579). In contrast, the groups did statistically differ from each other, with MO performing poorer than the HC on the Q-score (mean -1.784, CI -3.237 to -0.331, p = 0.016) and the Unit points (mean -1.409, CI -2.291 to -0.528, p = 0.002), but not on the Order points score (mean -0.351, CI -0.994 to 0.293, p = 0.284). Differences on the Unit score and the Q-score were slightly reduced when adjusting for gender, age, and education. This study presents evidence supporting the presence of inefficiency in visuospatial constructional ability among MO patients. We believe we have found an indication that the Q-score captures a wider range of cognitive processes that are not described by traditional scoring methods. Rather than considering accuracy and placement of the different elements only, the Q-score focuses more on how the subject has approached the task.

  13. The Score-Boosting Game.

    ERIC Educational Resources Information Center

    Popham, W. James

    2000-01-01

    Teachers everywhere are playing the score-boosting game to raise scores on mandated standardized achievement tests, although five nationally recognized assessments compare student performance instead of measuring classroom learning. Since curriculum standards are often vague and misaligned with assessments, teachers sprinkle instruction with…

  14. The validity and reliability of the Thai version of the Kujala score for patients with patellofemoral pain syndrome.

    PubMed

    Apivatgaroon, Adinun; Angthong, Chayanin; Sanguanjit, Prakasit; Chernchujit, Bancha

    2016-10-01

    To develop a Thai version of the Kujala score and show the evaluation of the validity and reliability of the score. The Thai version of the Kujala score was developed using the forward-backward translation protocol. The 49 PFPS patients answered the Thai version of questionnaires including the Kujala score, Short Form-36 (SF-36) and International Knee Documentation Committee (IKDC) Subjective Knee Form. The validity between the scores has been tested. The reliability was assessed using test-retest reliability and internal consistency. The Thai version of the Kujala score showed a good correlation with Thai IKDC Subjective Knee Form (Pearson's correlation coefficient; r = 0.74: p < 0.01) and moderate correlation with the Thai SF-36 subscales of physical component summary, total score and role physical (r = 0.586, 0.571 and 0.524, respectively: p < 0.01). The test-retest reliability was excellent with an intra-class correlation coefficient of 0.908 (p < 0.001; 95% CI [0.842-0.947]). The internal consistency was strong with Cronbach's alpha of 0.952 (p < 0.001). No floor and ceiling effects were observed. The Thai version of the Kujala score has shown good validity and reliability. This score can be effectively used for evaluating Thai patients with patellofemoral pain syndrome. Implications for Rehabilitation The Kujala score is a self-administered questionnaire for patients with patellofemoral pain syndrome (PFPS). The validity and reliability of the Thai version of Kujala are compatible with other versions (Turkish, Chinese and Persian version). The Thai version of Kujala has been shown to have validity and reliability in Thai PFPS patients and can be used for clinical evaluation and also in the research work.

  15. Effects of Programmed Learning Sequences on the Mathematics Test Scores of Bermudian Middle School Students

    ERIC Educational Resources Information Center

    Tully, Derek; Dunn, Rita; Hlawaty, Heide

    2006-01-01

    This research compared the effects of a Programmed Learning Sequence (PLS) (Dunn & Dunn, 1993) versus Traditional Teaching (TT) on 100 sixth-grade Bermudian students' test scores on a Fractions Unit. Fifty-three males' and forty-seven females' learning styles were identified with the "Learning Style Inventory" (LSI) (Dunn, Dunn,…

  16. Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

    ERIC Educational Resources Information Center

    van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

    2007-01-01

    The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…

  17. Single- versus Double-Scoring of Trend Responses in Trend Score Equating with Constructed-Response Tests. Research Report. ETS RR-10-12

    ERIC Educational Resources Information Center

    Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam

    2010-01-01

    This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…

  18. Score Increase and Partial-Credit Validity When Administering Multiple-Choice Tests Using an Answer-Until-Correct Format

    ERIC Educational Resources Information Center

    Slepkov, Aaron D.; Vreugdenhil, Andrew J.; Shiell, Ralph C.

    2016-01-01

    There are numerous benefits to answer-until-correct (AUC) approaches to multiple-choice testing, not the least of which is the straightforward allotment of partial credit. However, the benefits of granting partial credit can be tempered by the inevitable increase in test scores and by fears that such increases are further contaminated by a large…

  19. Animal source foods have a positive impact on the primary school test scores of Kenyan schoolchildren in a cluster-randomised, controlled feeding intervention trial.

    PubMed

    Hulett, Judie L; Weiss, Robert E; Bwibo, Nimrod O; Galal, Osman M; Drorbaugh, Natalie; Neumann, Charlotte G

    2014-03-14

    Micronutrient deficiencies and suboptimal energy intake are widespread in rural Kenya, with detrimental effects on child growth and development. Sporadic school feeding programmes rarely include animal source foods (ASF). In the present study, a cluster-randomised feeding trial was undertaken to determine the impact of snacks containing ASF on district-wide, end-term standardised school test scores and nutrient intake. A total of twelve primary schools were randomly assigned to one of three isoenergetic feeding groups (a local plant-based stew (githeri) with meat, githeri plus whole milk or githeri with added oil) or a control group receiving no intervention feeding. After the initial term that served as baseline, children were fed at school for five consecutive terms over two school years from 1999 to 2001. Longitudinal analysis was used controlling for average energy intake, school attendance, and baseline socio-economic status, age, sex and maternal literacy. Children in the Meat group showed significantly greater improvements in test scores than those in all the other groups, and the Milk group showed significantly greater improvements in test scores than the Plain Githeri (githeri+oil) and Control groups. Compared with the Control group, the Meat group showed significant improvements in test scores in Arithmetic, English, Kiembu, Kiswahili and Geography. The Milk group showed significant improvements compared with the Control group in test scores in English, Kiswahili, Geography and Science. Folate, Fe, available Fe, energy per body weight, vitamin B₁₂, Zn and riboflavin intake were significant contributors to the change in test scores. The greater improvements in test scores of children receiving ASF indicate improved academic performance, which can result in greater academic achievement.

  20. Does Weight Affect Children's Test Scores and Teacher Assessments Differently?

    ERIC Educational Resources Information Center

    Zavodny, Madeline

    2013-01-01

    The prevalence of childhood overweight and obesity increased dramatically in the United States during the past three decades. This increase has adverse public health implications, but its implication for children's academic outcomes is less clear. This paper uses data from five waves of the Early Childhood Longitudinal Study-Kindergarten to…

  1. Penicillin skin testing: potential implications for antimicrobial stewardship.

    PubMed

    Unger, Nathan R; Gauthier, Timothy P; Cheung, Linda W

    2013-08-01

    As the progression of multidrug-resistant organisms and lack of novel antibiotics move us closer toward a potential postantibiotic era, it is paramount to preserve the longevity of current therapeutic agents. Moreover, novel interventions for antimicrobial stewardship programs are integral to combating antimicrobial resistance worldwide. One unique method that may decrease the use of second-line antibiotics (e.g., fluoroquinolones, vancomycin) while facilitating access to a preferred β-lactam regimen in numerous health care settings is a penicillin skin test. Provided that up to 10% of patients have a reported penicillin allergy, of whom ~10% have true IgE-mediated hypersensitivity, significant potential exists to utilize a penicillin skin test to safely identify those who may receive penicillin or a β-lactam antibiotic. In this article, we provide information on the background, associated costs, currently available literature, pharmacists' role, antimicrobial stewardship implications, potential barriers, and misconceptions, as well as future directions associated with the penicillin skin test. © 2013 Pharmacotherapy Publications, Inc.

  2. Associations between cadmium exposure and neurocognitive test scores in a cross-sectional study of US adults

    PubMed Central

    2013-01-01

    Background Low-level environmental cadmium exposure and neurotoxicity has not been well studied in adults. Our goal was to evaluate associations between neurocognitive exam scores and a biomarker of cumulative cadmium exposure among adults in the Third National Health and Nutrition Examination Survey (NHANES III). Methods NHANES III is a nationally representative cross-sectional survey of the U.S. population conducted between 1988 and 1994. We analyzed data from a subset of participants, age 20–59, who participated in a computer-based neurocognitive evaluation. There were four outcome measures: the Simple Reaction Time Test (SRTT: visual motor speed), the Symbol Digit Substitution Test (SDST: attention/perception), the Serial Digit Learning Test (SDLT) trials-to-criterion, and the SDLT total-error-score (SDLT-tests: learning recall/short-term memory). We fit multivariable-adjusted models to estimate associations between urinary cadmium concentrations and test scores. Results 5662 participants underwent neurocognitive screening, and 5572 (98%) of these had a urinary cadmium level available. Prior to multivariable-adjustment, higher urinary cadmium concentration was associated with worse performance in each of the 4 outcomes. After multivariable-adjustment most of these relationships were not significant, and age was the most influential variable in reducing the association magnitudes. However among never-smokers with no known occupational cadmium exposure the relationship between urinary cadmium and SDST score (attention/perception) was significant: a 1 μg/L increase in urinary cadmium corresponded to a 1.93% (95%CI: 0.05, 3.81) decrement in performance. Conclusions These results suggest that higher cumulative cadmium exposure in adults may be related to subtly decreased performance in tasks requiring attention and perception, particularly among those adults whose cadmium exposure is primarily though diet (no smoking or work based cadmium exposure). This

  3. Supplemental Educational Services and Student Test Score Gains: Evidence from a Large, Urban School District

    ERIC Educational Resources Information Center

    Springer, Matthew G.; Pepper, Matthew J.; Ghosh-Dastidar, Bonnie

    2014-01-01

    This study examines the effect of supplemental education services (SES) on student test score gains and whether particular subgroups of students benefit more from NCLB tutoring services. Our sample includes information on students enrolled in third through eighth grades nested in 121 elementary and middle schools over a five-year period comprising…

  4. Comparing Standardized Test Scores among Arts-Integrated and Non-Arts Integrated Schools in Central Mississippi

    ERIC Educational Resources Information Center

    Dean, Darlene

    2014-01-01

    The topic of arts integration creates continuing dialog among educators and arts advocates. This study examined the degree to which student achievement was affected when arts education is limited or eliminated from schools to meet the mandates of NCLB (2001) legislation. Standardized test scores from 12 schools in Central Mississippi were used to…

  5. Classroom Organizational Structure in Fifth Grade Math Classrooms and the Effect on Standardized Test Scores

    ERIC Educational Resources Information Center

    Lane, Dallas Marie

    2017-01-01

    The purpose of this study was to determine if there is a relationship between the classroom organizational structure and MCT2 test scores of fifth-grade math students. The researcher gained insight regarding which structure teachers believe is most beneficial to them and students, and whether or not their belief of classroom organizational…

  6. Your move: The effect of chess on mathematics test scores

    PubMed Central

    Rosholm, Michael; Mikkelsen, Mai Bjørnskov; Gumede, Kamilla

    2017-01-01

    We analyse the effect of substituting a weekly mathematics lesson in primary school grades 1–3 with a lesson in mathematics based on chess instruction. We use data from the City of Aarhus in Denmark, combining test score data with a comprehensive data set obtained from administrative registers. We use two different methodological approaches to identify and estimate treatment effects and we tend to find positive effects, indicating that knowledge acquired through chess play can be transferred to the domain of mathematics. We also find larger impacts for unhappy children and children who are bored in school, perhaps because chess instruction facilitates learning by providing an alternative approach to mathematics for these children. The results are encouraging and suggest that chess may be an important and effective tool for improving mathematical capacity in young students. PMID:28494023

  7. The Missing Data Assumptions of the NEAT Design and Their Implications for Test Equating

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Holland, Paul W.

    2010-01-01

    The Non-Equivalent groups with Anchor Test (NEAT) design involves "missing data" that are "missing by design." Three nonlinear observed score equating methods used with a NEAT design are the "frequency estimation equipercentile equating" (FEEE), the "chain equipercentile equating" (CEE), and the "item-response-theory observed-score-equating" (IRT…

  8. Use of Verbal Descriptors, Thermal Scores and Electrical Pulp Testing Scores as Predictors of Tooth Pain Before and After Application of Benzocaine Gels into Cavities of Teeth with Pulpitis

    PubMed Central

    Gangarosa, Louis P.; Ciarlone, Alfred E.; Neaverth, Elmer J.; Johnston, Carey A.; Snowden, J. Douglas; Thompson, William O.

    1989-01-01

    A double-blind pilot study was conducted on 27 consenting human volunteers who had irreversible pulpitis associated with persistent toothache pain from open carious lesions. Formulations tested contained either 0, 10%, or 20% benzocaine and were identified only by a numbered code. Before the experiment started, a small amount of a known 5% benzocaine gel was placed for 1 minute on the tongue of each patient to assure a sensation of numbness within the oral cavity. Then the test tooth was washed with a gentle stream of warm water and dried with gauze. A randomly selected test medication was placed into the open cavity and around the gingival margins for 5 minutes. Pre- and posttreatment tests were conducted at the following timed intervals: 0, 5, 15, 30, 45, 60, 75 and 90 minutes. The tests included degree of pain (rated: 0 = none, 1 = mild, 2 = moderate, 3 = severe); electrical pulp testing (EPT) by a modified, voltage-ramping instrument; and ice water testing (0.5 mL directed quickly onto sound enamel of the tooth and rated: 0 to 4, with 4 being intolerable). After testing, or when pain returned to baseline, endodontic procedures were performed. There was a significant increase (p < 0.032, Fisher exact test) in subjects obtaining pain relief, rated by verbal descriptors, from the benzocaine gels (14 out of 18 improved) compared to placebo (3 out of 9 improved). It was concluded that: 1) benzocaine gels are effective formulations for temporary relief of toothache pain, 2) there were no statistical differences in EPT scores between teeth having pulpitis and control teeth, 3) there were no correlations between direction of EPT scores and pain relief, 4) cold water testing was a good predictor of whether or not a tooth had pulpitis, and 5) changes in cold water testing scores after treatment could not be correlated to relief of pain according to verbal descriptors. The effectiveness of benzocaine in relieving toothache pain verifies previous studies; however, a

  9. Discordant HIV Test Results: Implications on Perinatal and Haemotransfusion Screening for HIV Infection, Cape Coast, Ghana.

    PubMed

    Tetteh, Ato Kwamena; Agyarko, Edward

    2017-01-01

    Screening results of 488 pregnant women aged 15-44 years whose blood samples had been tested on-site, using First Response® HIV 1/2, and confirmed with INNO-LIA™ HIV I/II Score were used. Of this total, 178 were reactive (HIV I, 154; HIV II, 2; and HIV I and HIV II, 22). Of the 154 HIV I-reactive samples, 104 were confirmed to be HIV I-positive and 2 were confirmed to be HIV II-positive, while 48 were confirmed to be negative [false positive rate = 17.44% (13.56-21.32)]. The two HIV II samples submitted were confirmed to be negative with the confirmatory test. For the 22 HIV I and HIV II samples, 7 were confirmed to be HIV I-positive and 1 was confirmed to be HIV I- and HIV II-positive, while 14 were confirmed to be negative. Of the 310 nonreactive samples, 6 were confirmed to be HIV I-positive and 1 was confirmed to be HIV II-positive [false negative rate = 5.79% (1.63-8.38)], while 303 were negative. False negative outcomes will remain unconfirmed, with no management options for the client. False negative rate of 5.79% requires attention, as its resultant implications on control of HIV/AIDS could be dire.

  10. High Scores but Low Skills

    ERIC Educational Resources Information Center

    Liu, Liqun; Neilson, William S.

    2011-01-01

    In this paper college admissions are based on test scores and students can exert two types of effort: real learning and exam preparation. The former improves skills but the latter is more effective in raising test scores. In this setting the students with the lowest skills are no longer the ones with the lowest aptitude, but instead are the ones…

  11. How Much Do Test Scores Vary among School Districts? New Estimates Using Population Data, 2009-2015. CEPA Working Paper No. 17-02

    ERIC Educational Resources Information Center

    Fahle, Erin M.; Reardon, Sean F.

    2017-01-01

    This paper provides the first population-based evidence on how much standardized test scores vary among public school districts within each state and how segregation explains that variation. Using roughly 300 million standardized test score records in math and ELA for grades 3 through 8 from every U.S. public school district during the 2008-09 to…

  12. An evidence-based approach to the creation of normative data: base rates of impaired scores within a brief neuropsychological battery argue for age corrections, but against corrections for medical conditions.

    PubMed

    O'Connell, Megan E; Tuokko, Holly; Voll, Stacey; Simard, Martine; Griffith, Lauren E; Taler, Vanessa; Wolfson, Christina; Kirkland, Susan; Raina, Parminder

    We detail a new approach to the creation of normative data for neuropsychological tests. The traditional approach to normative data creation is to make demographic adjustments based on observations of correlations between single neuropsychological tests and selected demographic variables. We argue, however, that this does not describe the implications for clinical practice, such as increased likelihood of misclassification of cognitive impairment, nor does it elucidate the impact on decision-making with a neuropsychological battery. We propose base rate analyses; specifically, differential base rates of impaired scores between theoretical and actual base rates as the basis for decisions to create demographic adjustments within normative data. Differential base rates empirically describe the potential clinical implications of failing to create an appropriate normative group. We demonstrate this approach with data from a short telephone-administered neuropsychological battery given to a large, neurologically healthy sample aged 45-85 years old. We explored whether adjustments for age and medical conditions were warranted based on differential base rates of spuriously impaired scores. Theoretical base rates underestimated the frequency of impaired scores in older adults and overestimated the frequency of impaired scores in younger adults, providing an evidence base for the creation of age-corrected normative data. In contrast, the number of medical conditions (numerous cardiovascular, hormonal, and metabolic conditions) was not related to differential base rates of impaired scores. Despite a small correlation between number of medical conditions and each neuropsychological variable, normative adjustments for number of medical conditions does not appear warranted. Implications for creation of normative data are discussed.

  13. 21 CFR 1210.18 - Scoring.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Scoring. 1210.18 Section 1210.18 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) REGULATIONS UNDER... MILK ACT Inspection and Testing § 1210.18 Scoring. Scoring of sanitary conditions required by §§ 1210...

  14. 21 CFR 1210.18 - Scoring.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Scoring. 1210.18 Section 1210.18 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) REGULATIONS UNDER... MILK ACT Inspection and Testing § 1210.18 Scoring. Scoring of sanitary conditions required by §§ 1210...

  15. Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

    ERIC Educational Resources Information Center

    Haberman, Shelby J.

    2011-01-01

    Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

  16. The Mote In Thy Brother's Eye, and The Beam in Thine Own: Predicting One's Own and Others' Personality Test Scores.

    ERIC Educational Resources Information Center

    Furnham, Adrian; Henderson, Monika

    1983-01-01

    Examined the similarity between subjects' (N=63) ratings of themselves and others, on various tests of personality. Results revealed that subjects correctly estimated several of their own scores, but only two scores of another person. They believed themselves to be similar to their friend, thereby showing attributional errors. (JAC)

  17. Clinical Implications of Oscillatory Lung Function during Methacholine Bronchoprovocation Testing of Preschool Children

    PubMed Central

    Choi, Sun Hee; Sheen, Youn Ho; Kim, Mi Ae; Baek, Ji Hyeon; Baek, Hey Sung; Lee, Seung Jin; Yoon, Jung Won; Rha, Yeong Ho

    2017-01-01

    Objective To investigate the repeatability and safety of measuring impulse oscillation system (IOS) parameters and the point of wheezing during bronchoprovocation testing of preschool children. Methods Two sets of methacholine challenge were conducted in 36 asthma children. The test was discontinued if there was a significant change in reactance (Xrs5) and resistance (Rrs5) at 5 Hz (Condition 1) or respiratory distress due to airway obstruction (Condition 2). The repeatability of PC80_Xrs5, PC30_Rrs5, and wheezing (PCw) was assessed. The changes in Z-scores and SD-indexes from prebaseline (before testing) to postbaseline (after bronchodilator) were determined. Results For PC30_Rrs5, PC80_Xrs5, and PCw for subjects, PC80_Xrs5 showed the highest repeatability. Fifteen of 70 tests met Condition 2. The changes from pre- and postbaseline values varied significantly for Rrs5 and Xrs5. Excluding subjects with Z-scores higher than 2SD, we were able to detect 97.1% of bronchial hyperresponsiveness during methacholine challenge based on the change in Rrs5 or Xrs5. A change in IOS parameters was associated with wheezing at all frequencies. Conclusion Xrs5 and Rrs5 have repeatability comparable with FEV1, and Xrs5 is more reliable than Rrs5. Clinicians can safely perform a challenge test by measuring the changes in Rrs5, Xrs5, and Z-scores from the prebaseline values. PMID:28740854

  18. The utility of pre-test clinical scoring for clinical diagnosis of heparin-induced thrombocytopenia in cardiac surgery patients of a tertiary care centre in north India.

    PubMed

    Sachan, D; Gupta, N; Agarwal, P; Chaudhary, R

    2011-08-01

    Heparin-induced thrombocytopenia (HIT) should be diagnosed clinically as well as by laboratory assays for timely recognition, prevention and management of complications. To evaluate the clinical utility of pre-test clinical scoring system in combination with two immunoassays for the diagnosis of HIT in cardiac surgery patients. A total of 100 consecutive patients undergoing cardiac surgery were studied. Pre-test clinical scoring was carried out in patients with thrombocytopenia and further tested by two immunoassays, i.e., Heparin platelet factor 4 (H-PF4) enzyme-linked immunosorbent assay (ELISA) and particle gel immunoassay (PaGIA). Of the 100 patients studied, 42 patients developed thrombocytopenia post-operatively. On pre-test clinical scoring, low T-score was observed in 6 patients, intermediate in 28 and high score in 8 patients, whereas 19 patients (45.2%) were positive by H-PF4 ELISA and 10 (23.8%) by PaGIA for H-PF4 antibody. The difference in the incidence of clinically significant HIT antibodies in the three categories was statistically significant. A good correlation was also observed with ELISA optical density, T-scoring and PaGIA. Pre-test clinical scoring correlates well with the development of H-PF4 antibodies which are incriminated in the causation of thrombotic complications in patients with HIT. We also propose a protocol for diagnosing patients with clinical suspicion of HIT using pre-test clinical scoring and immunoassay. © 2011 The Authors. Transfusion Medicine © 2011 British Blood Transfusion Society.

  19. A job-related fitness test for the Dutch police.

    PubMed

    Strating, M; Bakker, R H; Dijkstra, G J; Lemmink, K A P M; Groothoff, J W

    2010-06-01

    The variety of tasks that characterize police work highlights the importance of being in good physical condition. To take a first step at standardizing the administration of a job-related test to assess a person's ability to perform the physical demands of the core tasks of police work. The principal research questions were: are test scores related to gender, age and function and are test scores related to body mass index (BMI) and the number of hours of physical exercise? Data of 6999 police officers, geographically spread over all parts of The Netherlands, who completed a physical competence test over a 1 year period were analysed. Women performed the test significantly more slowly than men. The mean test score was also related to age; the older a person the longer it took to complete the test. A higher BMI was associated with less hours of body exercise a week and a slower test performance, both in women and men. The differences in individual test scores, based on gender and age, have implications for future strategy within the police force. From a viewpoint of 'same job, same standard' one has to accept that test-score differences may lead to the exclusion of certain staff. However, from a viewpoint of 'diversity as a business issue', one may have to accept that on average, both female and older police officers are physically less tailored to their jobs than their male and younger colleagues.

  20. "Score Choice": A Tempest in a Teapot?

    ERIC Educational Resources Information Center

    Hoover, Eric

    2009-01-01

    A new option that allows students to choose which of their test scores to send to colleges has generated renewed criticism of the College Board. College Board officials tout the option, called Score Choice, as a way to ease test taker anxiety. Some prominent admissions officials have publicly described Score Choice as a sales tactic that will…